Learn About Config

OpenCompass uses the OpenMMLab modern style configuration files. If you are familiar with the OpenMMLab style configuration files, you can directly refer to A Pure Python style Configuration File (Beta) to understand the differences between the new-style and original configuration files. If you have not encountered OpenMMLab style configuration files before, I will explain the usage of configuration files using a simple example. Make sure you have installed the latest version of MMEngine to support the new-style configuration files.

Basic Format

OpenCompass configuration files are in Python format, following basic Python syntax. Each configuration item is specified by defining variables. For example, when defining a model, we use the following configuration:

# model_cfg.py
from opencompass.models import HuggingFaceCausalLM

models = [
    dict(
        type=HuggingFaceCausalLM,
        path='huggyllama/llama-7b',
        model_kwargs=dict(device_map='auto'),
        tokenizer_path='huggyllama/llama-7b',
        tokenizer_kwargs=dict(padding_side='left', truncation_side='left'),
        max_seq_len=2048,
        max_out_len=50,
        run_cfg=dict(num_gpus=8, num_procs=1),
    )
]

When reading the configuration file, use Config.fromfile from MMEngine for parsing:

>>> from mmengine.config import Config
>>> cfg = Config.fromfile('./model_cfg.py')
>>> print(cfg.models[0])
{'type': HuggingFaceCausalLM, 'path': 'huggyllama/llama-7b', 'model_kwargs': {'device_map': 'auto'}, ...}

Inheritance Mechanism

OpenCompass configuration files use Python's import mechanism for file inheritance. Note that when inheriting configuration files, we need to use the read_base context manager.

# inherit.py
from mmengine.config import read_base

with read_base():
    from .model_cfg import models  # Inherits the 'models' from model_cfg.py

Parse the configuration file using Config.fromfile:

>>> from mmengine.config import Config
>>> cfg = Config.fromfile('./inherit.py')
>>> print(cfg.models[0])
{'type': HuggingFaceCausalLM, 'path': 'huggyllama/llama-7b', 'model_kwargs': {'device_map': 'auto'}, ...}

Evaluation Configuration Example

# configs/llama7b.py
from mmengine.config import read_base

with read_base():
    # Read the required dataset configurations directly from the preset dataset configurations
    from .datasets.piqa.piqa_ppl import piqa_datasets
    from .datasets.siqa.siqa_gen import siqa_datasets

# Concatenate the datasets to be evaluated into the datasets field
datasets = [*piqa_datasets, *siqa_datasets]

# Evaluate models supported by HuggingFace's `AutoModelForCausalLM` using `HuggingFaceCausalLM`
from opencompass.models import HuggingFaceCausalLM

models = [
    dict(
        type=HuggingFaceCausalLM,
        # Initialization parameters for `HuggingFaceCausalLM`
        path='huggyllama/llama-7b',
        tokenizer_path='huggyllama/llama-7b',
        tokenizer_kwargs=dict(padding_side='left', truncation_side='left'),
        max_seq_len=2048,
        # Common parameters for all models, not specific to HuggingFaceCausalLM's initialization parameters
        abbr='llama-7b',            # Model abbreviation for result display
        max_out_len=100,            # Maximum number of generated tokens
        batch_size=16,
        run_cfg=dict(num_gpus=1),   # Run configuration for specifying resource requirements
    )
]

Dataset Configuration File Example

In the above example configuration file, we directly inherit the dataset-related configurations. Next, we will use the PIQA dataset configuration file as an example to demonstrate the meanings of each field in the dataset configuration file. If you do not intend to modify the prompt for model testing or add new datasets, you can skip this section.

The PIQA dataset configuration file is as follows. It is a configuration for evaluating based on perplexity (PPL) and does not use In-Context Learning.

from opencompass.openicl.icl_prompt_template import PromptTemplate
from opencompass.openicl.icl_retriever import ZeroRetriever
from opencompass.openicl.icl_inferencer import PPLInferencer
from opencompass.openicl.icl_evaluator import AccEvaluator
from opencompass.datasets import HFDataset

# Reading configurations
# The loaded dataset is usually organized as dictionaries, specifying the input fields used to form the prompt
# and the output field used as the answer in each sample
piqa_reader_cfg = dict(
    input_columns=['goal', 'sol1', 'sol2'],
    output_column='label',
    test_split='validation',
)

# Inference configurations
piqa_infer_cfg = dict(
    # Prompt generation configuration
    prompt_template=dict(
        type=PromptTemplate,
        # Prompt template, the template format matches the inferencer type specified later
        # Here, to calculate PPL, we need to specify the prompt template for each answer
        template={
            0: 'The following makes sense: \nQ: {goal}\nA: {sol1}\n',
            1: 'The following makes sense: \nQ: {goal}\nA: {sol2}\n'
        }),
    # In-Context example configuration, specifying `ZeroRetriever` here, which means not using in-context example.
    retriever=dict(type=ZeroRetriever),
    # Inference method configuration
    #   - PPLInferencer uses perplexity (PPL) to obtain answers
    #   - GenInferencer uses the model's generated results to obtain answers
    inferencer=dict(type=PPLInferencer))

# Metric configuration, using Accuracy as the evaluation metric
piqa_eval_cfg = dict(evaluator=dict(type=AccEvaluator))

# Dataset configuration, where all the above variables are parameters for this configuration
# It is a list used to specify the configurations of different evaluation subsets of a dataset.
piqa_datasets = [
    dict(
        type=HFDataset,
        path='piqa',
        reader_cfg=piqa_reader_cfg,
        infer_cfg=piqa_infer_cfg,
        eval_cfg=piqa_eval_cfg)

For detailed configuration of the Prompt generation configuration, you can refer to the Prompt Template.

Advanced Evaluation Configuration

In OpenCompass, we support configuration options such as task partitioner and runner for more flexible and efficient utilization of computational resources.

By default, we use size-based partitioning for inference tasks. You can specify the sample number threshold for task partitioning using --max-partition-size when starting the task. Additionally, we use local resources for inference and evaluation tasks by default. If you want to use Slurm cluster resources, you can use the --slurm parameter and the --partition parameter to specify the Slurm runner backend when starting the task.

Furthermore, if the above functionalities do not meet your requirements for task partitioning and runner backend configuration, you can provide more detailed configurations in the configuration file. Please refer to Efficient Evaluation for more information.