File size: 7,281 Bytes
256a159
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
# Learn About Config

OpenCompass uses the OpenMMLab modern style configuration files. If you are familiar with the OpenMMLab style
configuration files, you can directly refer to
[A Pure Python style Configuration File (Beta)](https://mmengine.readthedocs.io/en/latest/advanced_tutorials/config.html#a-pure-python-style-configuration-file-beta)
to understand the differences between the new-style and original configuration files. If you have not
encountered OpenMMLab style configuration files before, I will explain the usage of configuration files using
a simple example. Make sure you have installed the latest version of MMEngine to support the
new-style configuration files.

## Basic Format

OpenCompass configuration files are in Python format, following basic Python syntax. Each configuration item
is specified by defining variables. For example, when defining a model, we use the following configuration:

```python
# model_cfg.py
from opencompass.models import HuggingFaceCausalLM

models = [
    dict(
        type=HuggingFaceCausalLM,
        path='huggyllama/llama-7b',
        model_kwargs=dict(device_map='auto'),
        tokenizer_path='huggyllama/llama-7b',
        tokenizer_kwargs=dict(padding_side='left', truncation_side='left'),
        max_seq_len=2048,
        max_out_len=50,
        run_cfg=dict(num_gpus=8, num_procs=1),
    )
]
```

When reading the configuration file, use `Config.fromfile` from MMEngine for parsing:

```python
>>> from mmengine.config import Config
>>> cfg = Config.fromfile('./model_cfg.py')
>>> print(cfg.models[0])
{'type': HuggingFaceCausalLM, 'path': 'huggyllama/llama-7b', 'model_kwargs': {'device_map': 'auto'}, ...}
```

## Inheritance Mechanism

OpenCompass configuration files use Python's import mechanism for file inheritance. Note that when inheriting
configuration files, we need to use the `read_base` context manager.

```python
# inherit.py
from mmengine.config import read_base

with read_base():
    from .model_cfg import models  # Inherits the 'models' from model_cfg.py
```

Parse the configuration file using `Config.fromfile`:

```python
>>> from mmengine.config import Config
>>> cfg = Config.fromfile('./inherit.py')
>>> print(cfg.models[0])
{'type': HuggingFaceCausalLM, 'path': 'huggyllama/llama-7b', 'model_kwargs': {'device_map': 'auto'}, ...}
```

## Evaluation Configuration Example

```python
# configs/llama7b.py
from mmengine.config import read_base

with read_base():
    # Read the required dataset configurations directly from the preset dataset configurations
    from .datasets.piqa.piqa_ppl import piqa_datasets
    from .datasets.siqa.siqa_gen import siqa_datasets

# Concatenate the datasets to be evaluated into the datasets field
datasets = [*piqa_datasets, *siqa_datasets]

# Evaluate models supported by HuggingFace's `AutoModelForCausalLM` using `HuggingFaceCausalLM`
from opencompass.models import HuggingFaceCausalLM

models = [
    dict(
        type=HuggingFaceCausalLM,
        # Initialization parameters for `HuggingFaceCausalLM`
        path='huggyllama/llama-7b',
        tokenizer_path='huggyllama/llama-7b',
        tokenizer_kwargs=dict(padding_side='left', truncation_side='left'),
        max_seq_len=2048,
        # Common parameters for all models, not specific to HuggingFaceCausalLM's initialization parameters
        abbr='llama-7b',            # Model abbreviation for result display
        max_out_len=100,            # Maximum number of generated tokens
        batch_size=16,
        run_cfg=dict(num_gpus=1),   # Run configuration for specifying resource requirements
    )
]
```

## Dataset Configuration File Example

In the above example configuration file, we directly inherit the dataset-related configurations. Next, we will
use the PIQA dataset configuration file as an example to demonstrate the meanings of each field in the dataset
configuration file. If you do not intend to modify the prompt for model testing or add new datasets, you can
skip this section.

The PIQA dataset [configuration file](https://github.com/open-compass/opencompass/blob/main/configs/datasets/piqa/piqa_ppl_1cf9f0.py) is as follows.
It is a configuration for evaluating based on perplexity (PPL) and does not use In-Context Learning.

```python
from opencompass.openicl.icl_prompt_template import PromptTemplate
from opencompass.openicl.icl_retriever import ZeroRetriever
from opencompass.openicl.icl_inferencer import PPLInferencer
from opencompass.openicl.icl_evaluator import AccEvaluator
from opencompass.datasets import HFDataset

# Reading configurations
# The loaded dataset is usually organized as dictionaries, specifying the input fields used to form the prompt
# and the output field used as the answer in each sample
piqa_reader_cfg = dict(
    input_columns=['goal', 'sol1', 'sol2'],
    output_column='label',
    test_split='validation',
)

# Inference configurations
piqa_infer_cfg = dict(
    # Prompt generation configuration
    prompt_template=dict(
        type=PromptTemplate,
        # Prompt template, the template format matches the inferencer type specified later
        # Here, to calculate PPL, we need to specify the prompt template for each answer
        template={
            0: 'The following makes sense: \nQ: {goal}\nA: {sol1}\n',
            1: 'The following makes sense: \nQ: {goal}\nA: {sol2}\n'
        }),
    # In-Context example configuration, specifying `ZeroRetriever` here, which means not using in-context example.
    retriever=dict(type=ZeroRetriever),
    # Inference method configuration
    #   - PPLInferencer uses perplexity (PPL) to obtain answers
    #   - GenInferencer uses the model's generated results to obtain answers
    inferencer=dict(type=PPLInferencer))

# Metric configuration, using Accuracy as the evaluation metric
piqa_eval_cfg = dict(evaluator=dict(type=AccEvaluator))

# Dataset configuration, where all the above variables are parameters for this configuration
# It is a list used to specify the configurations of different evaluation subsets of a dataset.
piqa_datasets = [
    dict(
        type=HFDataset,
        path='piqa',
        reader_cfg=piqa_reader_cfg,
        infer_cfg=piqa_infer_cfg,
        eval_cfg=piqa_eval_cfg)
```

For detailed configuration of the **Prompt generation configuration**, you can refer to the [Prompt Template](../prompt/prompt_template.md).

## Advanced Evaluation Configuration

In OpenCompass, we support configuration options such as task partitioner and runner for more flexible and
efficient utilization of computational resources.

By default, we use size-based partitioning for inference tasks. You can specify the sample number threshold
for task partitioning using `--max-partition-size` when starting the task. Additionally, we use local
resources for inference and evaluation tasks by default. If you want to use Slurm cluster resources, you can
use the `--slurm` parameter and the `--partition` parameter to specify the Slurm runner backend when starting
the task.

Furthermore, if the above functionalities do not meet your requirements for task partitioning and runner
backend configuration, you can provide more detailed configurations in the configuration file. Please refer to
[Efficient Evaluation](./evaluation.md) for more information.