Spaces:

thenativefox
/

RAG

Running

RAG / openai_text-embedding-ada-002 /fixed_chunks /_benchmarks.txt_chunk_3.txt

thenativefox

Added split files and tables

939262b 10 months ago

3.12 kB

	==================== INFERENCE - MEMORY - RESULT ====================
	Model Name Batch Size Seq Length Memory in MB
	bert-base 8 8 1330
	bert-base 8 32 1330
	bert-base 8 128 1330
	bert-base 8 512 1770
	bert-384-hid 8 8 1330
	bert-384-hid 8 32 1330
	bert-384-hid 8 128 1330
	bert-384-hid 8 512 1540
	bert-6-lay 8 8 1330
	bert-6-lay 8 32 1330
	bert-6-lay 8 128 1330
	bert-6-lay 8 512 1540

	==================== ENVIRONMENT INFORMATION ====================

	transformers_version: 2.11.0
	framework: Tensorflow
	use_xla: False
	framework_version: 2.2.0
	python_version: 3.6.10
	system: Linux
	cpu: x86_64
	architecture: 64bit
	date: 2020-06-29
	time: 09:38:15.487125
	fp16: False
	use_multiprocessing: True
	only_pretrain_model: False
	cpu_ram_mb: 32088
	use_gpu: True
	num_gpus: 1
	gpu: TITAN RTX
	gpu_ram_mb: 24217
	gpu_power_watts: 280.0
	gpu_performance_state: 2
	use_tpu: False

	Again, inference time and required memory for inference are measured, but this time for customized configurations
	of the BertModel class. This feature can especially be helpful when deciding for which configuration the model
	should be trained.
	Benchmark best practices
	This section lists a couple of best practices one should be aware of when benchmarking a model.

	Currently, only single device benchmarking is supported. When benchmarking on GPU, it is recommended that the user
	specifies on which device the code should be run by setting the CUDA_VISIBLE_DEVICES environment variable in the
	shell, e.g. export CUDA_VISIBLE_DEVICES=0 before running the code.
	The option no_multi_processing should only be set to True for testing and debugging. To ensure accurate
	memory measurement it is recommended to run each memory benchmark in a separate process by making sure
	no_multi_processing is set to True.
	One should always state the environment information when sharing the results of a model benchmark. Results can vary
	heavily between different GPU devices, library versions, etc., so that benchmark results on their own are not very
	useful for the community.

	Sharing your benchmark
	Previously all available core models (10 at the time) have been benchmarked for inference time, across many different
	settings: using PyTorch, with and without TorchScript, using TensorFlow, with and without XLA. All of those tests were
	done across CPUs (except for TensorFlow XLA) and GPUs.
	The approach is detailed in the following blogpost and the results are
	available here.
	With the new benchmark tools, it is easier than ever to share your benchmark results with the community

	PyTorch Benchmarking Results.
	TensorFlow Benchmarking Results.