Spaces:
Running
Running
==================== INFERENCE - MEMORY - RESULT ==================== | |
Model Name Batch Size Seq Length Memory in MB | |
bert-base 8 8 1330 | |
bert-base 8 32 1330 | |
bert-base 8 128 1330 | |
bert-base 8 512 1770 | |
bert-384-hid 8 8 1330 | |
bert-384-hid 8 32 1330 | |
bert-384-hid 8 128 1330 | |
bert-384-hid 8 512 1540 | |
bert-6-lay 8 8 1330 | |
bert-6-lay 8 32 1330 | |
bert-6-lay 8 128 1330 | |
bert-6-lay 8 512 1540 | |
==================== ENVIRONMENT INFORMATION ==================== | |
transformers_version: 2.11.0 | |
framework: Tensorflow | |
use_xla: False | |
framework_version: 2.2.0 | |
python_version: 3.6.10 | |
system: Linux | |
cpu: x86_64 | |
architecture: 64bit | |
date: 2020-06-29 | |
time: 09:38:15.487125 | |
fp16: False | |
use_multiprocessing: True | |
only_pretrain_model: False | |
cpu_ram_mb: 32088 | |
use_gpu: True | |
num_gpus: 1 | |
gpu: TITAN RTX | |
gpu_ram_mb: 24217 | |
gpu_power_watts: 280.0 | |
gpu_performance_state: 2 | |
use_tpu: False | |
Again, inference time and required memory for inference are measured, but this time for customized configurations | |
of the BertModel class. This feature can especially be helpful when deciding for which configuration the model | |
should be trained. | |
Benchmark best practices | |
This section lists a couple of best practices one should be aware of when benchmarking a model. | |
Currently, only single device benchmarking is supported. When benchmarking on GPU, it is recommended that the user | |
specifies on which device the code should be run by setting the CUDA_VISIBLE_DEVICES environment variable in the | |
shell, e.g. export CUDA_VISIBLE_DEVICES=0 before running the code. | |
The option no_multi_processing should only be set to True for testing and debugging. To ensure accurate | |
memory measurement it is recommended to run each memory benchmark in a separate process by making sure | |
no_multi_processing is set to True. | |
One should always state the environment information when sharing the results of a model benchmark. Results can vary | |
heavily between different GPU devices, library versions, etc., so that benchmark results on their own are not very | |
useful for the community. | |
Sharing your benchmark | |
Previously all available core models (10 at the time) have been benchmarked for inference time, across many different | |
settings: using PyTorch, with and without TorchScript, using TensorFlow, with and without XLA. All of those tests were | |
done across CPUs (except for TensorFlow XLA) and GPUs. | |
The approach is detailed in the following blogpost and the results are | |
available here. | |
With the new benchmark tools, it is easier than ever to share your benchmark results with the community | |
PyTorch Benchmarking Results. | |
TensorFlow Benchmarking Results. |