Learn About Config
OpenCompass uses the OpenMMLab modern style configuration files. If you are familiar with the OpenMMLab style configuration files, you can directly refer to A Pure Python style Configuration File (Beta) to understand the differences between the new-style and original configuration files. If you have not encountered OpenMMLab style configuration files before, I will explain the usage of configuration files using a simple example. Make sure you have installed the latest version of MMEngine to support the new-style configuration files.
Basic Format
OpenCompass configuration files are in Python format, following basic Python syntax. Each configuration item is specified by defining variables. For example, when defining a model, we use the following configuration:
# model_cfg.py
from opencompass.models import HuggingFaceCausalLM
models = [
dict(
type=HuggingFaceCausalLM,
path='huggyllama/llama-7b',
model_kwargs=dict(device_map='auto'),
tokenizer_path='huggyllama/llama-7b',
tokenizer_kwargs=dict(padding_side='left', truncation_side='left'),
max_seq_len=2048,
max_out_len=50,
run_cfg=dict(num_gpus=8, num_procs=1),
)
]
When reading the configuration file, use Config.fromfile
from MMEngine for parsing:
>>> from mmengine.config import Config
>>> cfg = Config.fromfile('./model_cfg.py')
>>> print(cfg.models[0])
{'type': HuggingFaceCausalLM, 'path': 'huggyllama/llama-7b', 'model_kwargs': {'device_map': 'auto'}, ...}
Inheritance Mechanism
OpenCompass configuration files use Python's import mechanism for file inheritance. Note that when inheriting
configuration files, we need to use the read_base
context manager.
# inherit.py
from mmengine.config import read_base
with read_base():
from .model_cfg import models # Inherits the 'models' from model_cfg.py
Parse the configuration file using Config.fromfile
:
>>> from mmengine.config import Config
>>> cfg = Config.fromfile('./inherit.py')
>>> print(cfg.models[0])
{'type': HuggingFaceCausalLM, 'path': 'huggyllama/llama-7b', 'model_kwargs': {'device_map': 'auto'}, ...}
Evaluation Configuration Example
# configs/llama7b.py
from mmengine.config import read_base
with read_base():
# Read the required dataset configurations directly from the preset dataset configurations
from .datasets.piqa.piqa_ppl import piqa_datasets
from .datasets.siqa.siqa_gen import siqa_datasets
# Concatenate the datasets to be evaluated into the datasets field
datasets = [*piqa_datasets, *siqa_datasets]
# Evaluate models supported by HuggingFace's `AutoModelForCausalLM` using `HuggingFaceCausalLM`
from opencompass.models import HuggingFaceCausalLM
models = [
dict(
type=HuggingFaceCausalLM,
# Initialization parameters for `HuggingFaceCausalLM`
path='huggyllama/llama-7b',
tokenizer_path='huggyllama/llama-7b',
tokenizer_kwargs=dict(padding_side='left', truncation_side='left'),
max_seq_len=2048,
# Common parameters for all models, not specific to HuggingFaceCausalLM's initialization parameters
abbr='llama-7b', # Model abbreviation for result display
max_out_len=100, # Maximum number of generated tokens
batch_size=16,
run_cfg=dict(num_gpus=1), # Run configuration for specifying resource requirements
)
]
Dataset Configuration File Example
In the above example configuration file, we directly inherit the dataset-related configurations. Next, we will use the PIQA dataset configuration file as an example to demonstrate the meanings of each field in the dataset configuration file. If you do not intend to modify the prompt for model testing or add new datasets, you can skip this section.
The PIQA dataset configuration file is as follows. It is a configuration for evaluating based on perplexity (PPL) and does not use In-Context Learning.
from opencompass.openicl.icl_prompt_template import PromptTemplate
from opencompass.openicl.icl_retriever import ZeroRetriever
from opencompass.openicl.icl_inferencer import PPLInferencer
from opencompass.openicl.icl_evaluator import AccEvaluator
from opencompass.datasets import HFDataset
# Reading configurations
# The loaded dataset is usually organized as dictionaries, specifying the input fields used to form the prompt
# and the output field used as the answer in each sample
piqa_reader_cfg = dict(
input_columns=['goal', 'sol1', 'sol2'],
output_column='label',
test_split='validation',
)
# Inference configurations
piqa_infer_cfg = dict(
# Prompt generation configuration
prompt_template=dict(
type=PromptTemplate,
# Prompt template, the template format matches the inferencer type specified later
# Here, to calculate PPL, we need to specify the prompt template for each answer
template={
0: 'The following makes sense: \nQ: {goal}\nA: {sol1}\n',
1: 'The following makes sense: \nQ: {goal}\nA: {sol2}\n'
}),
# In-Context example configuration, specifying `ZeroRetriever` here, which means not using in-context example.
retriever=dict(type=ZeroRetriever),
# Inference method configuration
# - PPLInferencer uses perplexity (PPL) to obtain answers
# - GenInferencer uses the model's generated results to obtain answers
inferencer=dict(type=PPLInferencer))
# Metric configuration, using Accuracy as the evaluation metric
piqa_eval_cfg = dict(evaluator=dict(type=AccEvaluator))
# Dataset configuration, where all the above variables are parameters for this configuration
# It is a list used to specify the configurations of different evaluation subsets of a dataset.
piqa_datasets = [
dict(
type=HFDataset,
path='piqa',
reader_cfg=piqa_reader_cfg,
infer_cfg=piqa_infer_cfg,
eval_cfg=piqa_eval_cfg)
For detailed configuration of the Prompt generation configuration, you can refer to the Prompt Template.
Advanced Evaluation Configuration
In OpenCompass, we support configuration options such as task partitioner and runner for more flexible and efficient utilization of computational resources.
By default, we use size-based partitioning for inference tasks. You can specify the sample number threshold
for task partitioning using --max-partition-size
when starting the task. Additionally, we use local
resources for inference and evaluation tasks by default. If you want to use Slurm cluster resources, you can
use the --slurm
parameter and the --partition
parameter to specify the Slurm runner backend when starting
the task.
Furthermore, if the above functionalities do not meet your requirements for task partitioning and runner backend configuration, you can provide more detailed configurations in the configuration file. Please refer to Efficient Evaluation for more information.