Spaces:
Running
Running
File size: 3,119 Bytes
939262b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
==================== INFERENCE - MEMORY - RESULT ==================== Model Name Batch Size Seq Length Memory in MB bert-base 8 8 1330 bert-base 8 32 1330 bert-base 8 128 1330 bert-base 8 512 1770 bert-384-hid 8 8 1330 bert-384-hid 8 32 1330 bert-384-hid 8 128 1330 bert-384-hid 8 512 1540 bert-6-lay 8 8 1330 bert-6-lay 8 32 1330 bert-6-lay 8 128 1330 bert-6-lay 8 512 1540 ==================== ENVIRONMENT INFORMATION ==================== transformers_version: 2.11.0 framework: Tensorflow use_xla: False framework_version: 2.2.0 python_version: 3.6.10 system: Linux cpu: x86_64 architecture: 64bit date: 2020-06-29 time: 09:38:15.487125 fp16: False use_multiprocessing: True only_pretrain_model: False cpu_ram_mb: 32088 use_gpu: True num_gpus: 1 gpu: TITAN RTX gpu_ram_mb: 24217 gpu_power_watts: 280.0 gpu_performance_state: 2 use_tpu: False Again, inference time and required memory for inference are measured, but this time for customized configurations of the BertModel class. This feature can especially be helpful when deciding for which configuration the model should be trained. Benchmark best practices This section lists a couple of best practices one should be aware of when benchmarking a model. Currently, only single device benchmarking is supported. When benchmarking on GPU, it is recommended that the user specifies on which device the code should be run by setting the CUDA_VISIBLE_DEVICES environment variable in the shell, e.g. export CUDA_VISIBLE_DEVICES=0 before running the code. The option no_multi_processing should only be set to True for testing and debugging. To ensure accurate memory measurement it is recommended to run each memory benchmark in a separate process by making sure no_multi_processing is set to True. One should always state the environment information when sharing the results of a model benchmark. Results can vary heavily between different GPU devices, library versions, etc., so that benchmark results on their own are not very useful for the community. Sharing your benchmark Previously all available core models (10 at the time) have been benchmarked for inference time, across many different settings: using PyTorch, with and without TorchScript, using TensorFlow, with and without XLA. All of those tests were done across CPUs (except for TensorFlow XLA) and GPUs. The approach is detailed in the following blogpost and the results are available here. With the new benchmark tools, it is easier than ever to share your benchmark results with the community PyTorch Benchmarking Results. TensorFlow Benchmarking Results. |