|
# InfiAgent-DABench |
|
This example is used to solve the InfiAgent-DABench using Data Interpreter (DI), and obtains 94.93% accuracy using gpt-4o. |
|
|
|
## Dataset download |
|
``` |
|
cd /examples/di/InfiAgent-DABench |
|
git clone https://github.com/InfiAgent/InfiAgent.git |
|
mv InfiAgent/examples/DA-Agent/data ./ |
|
``` |
|
## Special note: |
|
When doing DABench testing, you need to set the ExecuteNbCode() init to: |
|
``` |
|
class ExecuteNbCode(Action): |
|
"""execute notebook code block, return result to llm, and display it.""" |
|
|
|
nb: NotebookNode |
|
nb_client: NotebookClient |
|
console: Console |
|
interaction: str |
|
timeout: int = 600 |
|
|
|
def __init__( |
|
self, |
|
nb=nbformat.v4.new_notebook(), |
|
timeout=600, |
|
): |
|
super().__init__( |
|
nb=nbformat.v4.new_notebook(),#nb, |
|
nb_client=NotebookClient(nb, timeout=timeout), |
|
timeout=timeout, |
|
console=Console(), |
|
interaction=("ipython" if self.is_ipython() else "terminal"), |
|
) |
|
``` |
|
The path of ExecuteNbCode() is: |
|
``` |
|
metagpt.actions.di.execute_nb_code |
|
``` |
|
Instead of using the original nb initialization by default. |
|
## How to run |
|
``` |
|
python run_InfiAgent-DABench_single.py --id x # run a task, x represents the id of the question you want to test |
|
python run_InfiAgent-DABench_all.py # Run all tasks serially |
|
python run_InfiAgent-DABench.py --k x # Run all tasks in parallel, x represents the number of parallel tasks at a time |
|
``` |