Spaces:

thenativefox
/

RAG

Running

RAG

File size: 3,119 Bytes

939262b

====================      INFERENCE - MEMORY - RESULT       ====================
Model Name             Batch Size     Seq Length      Memory in MB
bert-base                  8               8             1330
bert-base                  8               32            1330
bert-base                  8              128            1330
bert-base                  8              512            1770
bert-384-hid              8               8             1330
bert-384-hid              8               32            1330
bert-384-hid              8              128            1330
bert-384-hid              8              512            1540
bert-6-lay                 8               8             1330
bert-6-lay                 8               32            1330
bert-6-lay                 8              128            1330
bert-6-lay                 8              512            1540

====================        ENVIRONMENT INFORMATION         ====================

transformers_version: 2.11.0
framework: Tensorflow
use_xla: False
framework_version: 2.2.0
python_version: 3.6.10
system: Linux
cpu: x86_64
architecture: 64bit
date: 2020-06-29
time: 09:38:15.487125
fp16: False
use_multiprocessing: True
only_pretrain_model: False
cpu_ram_mb: 32088
use_gpu: True
num_gpus: 1
gpu: TITAN RTX
gpu_ram_mb: 24217
gpu_power_watts: 280.0
gpu_performance_state: 2
use_tpu: False

Again, inference time and required memory for inference are measured, but this time for customized configurations
of the BertModel class. This feature can especially be helpful when deciding for which configuration the model
should be trained.
Benchmark best practices
This section lists a couple of best practices one should be aware of when benchmarking a model.

Currently, only single device benchmarking is supported. When benchmarking on GPU, it is recommended that the user
  specifies on which device the code should be run by setting the CUDA_VISIBLE_DEVICES environment variable in the
  shell, e.g. export CUDA_VISIBLE_DEVICES=0 before running the code.
The option no_multi_processing should only be set to True for testing and debugging. To ensure accurate
  memory measurement it is recommended to run each memory benchmark in a separate process by making sure
  no_multi_processing is set to True.
One should always state the environment information when sharing the results of a model benchmark. Results can vary
  heavily between different GPU devices, library versions, etc., so that benchmark results on their own are not very
  useful for the community.

Sharing your benchmark
Previously all available core models (10 at the time) have been benchmarked for inference time, across many different
settings: using PyTorch, with and without TorchScript, using TensorFlow, with and without XLA. All of those tests were
done across CPUs (except for TensorFlow XLA) and GPUs.
The approach is detailed in the following blogpost and the results are
available here.
With the new benchmark tools, it is easier than ever to share your benchmark results with the community

PyTorch Benchmarking Results.
TensorFlow Benchmarking Results.