Chris Ellerson commited on
Commit
68ed57f
Β·
1 Parent(s): c0edb03

initial commit of agent with score of 60

Browse files
.env.example ADDED
@@ -0,0 +1 @@
 
 
1
+ XAI_API_KEY=your-groq-api-key-here
.gitignore ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ .env
2
+ env.example
3
+ gaia-env/*
4
+ /gaia-env
5
+ project_planning.md
6
+ projectdescription.md
7
+ test_agent.py
8
+ test_groq_api.py
9
+ test_groq_api_with_dotenv.py
10
+ test_results.json
11
+ test_xai_api.py
12
+ update_groq_key.py
README.md CHANGED
@@ -1,13 +1,163 @@
1
  ---
2
- title: AgentCourseFinalProject
3
- emoji: 🌍
4
- colorFrom: purple
5
- colorTo: green
6
  sdk: gradio
7
- sdk_version: 5.26.0
8
  app_file: app.py
9
  pinned: false
10
- short_description: Agent Code for Final Project
 
 
11
  ---
12
 
13
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: GAIA Agent for Hugging Face Agents Course
3
+ emoji: πŸ•΅πŸ»β€β™‚οΈ
4
+ colorFrom: indigo
5
+ colorTo: indigo
6
  sdk: gradio
7
+ sdk_version: 5.25.2
8
  app_file: app.py
9
  pinned: false
10
+ hf_oauth: true
11
+ # optional, default duration is 8 hours/480 minutes. Max duration is 30 days/43200 minutes.
12
+ hf_oauth_expiration_minutes: 480
13
  ---
14
 
15
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
16
+
17
+ # GAIA Agent for Hugging Face Agents Course
18
+
19
+ This project implements a powerful intelligent agent using the SmolAgents framework to tackle the GAIA benchmark questions for the Hugging Face Agents course final assessment.
20
+
21
+ ## Project Overview
22
+
23
+ The GAIA benchmark consists of challenging questions that require an agent to use various tools, including web search, file processing, and reasoning capabilities. This agent is designed to:
24
+
25
+ 1. Receive questions from the GAIA API
26
+ 2. Process and understand the questions
27
+ 3. Use appropriate tools to find answers
28
+ 4. Format and return precise answers
29
+
30
+ ## Features
31
+
32
+ - **SmolAgents Integration**: Uses CodeAgent for flexible problem-solving with Python code execution
33
+ - **Multi-Model Support**:
34
+ - Compatible with Hugging Face models
35
+ - OpenAI models (GPT-4o and others)
36
+ - X.AI's Grok models
37
+ - Anthropic, Cohere, and Mistral models via LiteLLM
38
+ - **Enhanced Tool Suite**:
39
+ - Web search via DuckDuckGo
40
+ - Python interpreter for code execution
41
+ - File handling (reading, saving, downloading)
42
+ - Data analysis for CSV and Excel files
43
+ - Image processing with OCR capabilities (when available)
44
+ - **Flexible Environment Configuration**:
45
+ - Easy setup via environment variables or .env file
46
+ - Fallback mechanisms for missing dependencies
47
+ - Support for both local and secure E2B code execution
48
+ - **Answer Processing**:
49
+ - Special handling for reversed text questions
50
+ - Precise answer formatting for benchmark submission
51
+ - Automatic cleanup of model responses for exact matching
52
+ - **Interactive UI**: Gradio interface for running the agent and submitting answers
53
+
54
+ ## Setup
55
+
56
+ ### Prerequisites
57
+
58
+ - Python 3.8+
59
+ - Hugging Face account
60
+ - API keys for your preferred models (HuggingFace, OpenAI, X.AI, etc.)
61
+
62
+ ### Installation
63
+
64
+ 1. Clone this repository
65
+ 2. Install the required dependencies:
66
+
67
+ ```bash
68
+ pip install -r requirements.txt
69
+ ```
70
+
71
+ 3. Copy the example environment file and add your API keys:
72
+
73
+ ```bash
74
+ cp env.example .env
75
+ # Edit .env with your API keys and configuration
76
+ ```
77
+
78
+ ### Configuration
79
+
80
+ Configure the agent by setting these environment variables or editing the `.env` file:
81
+
82
+ #### API Keys
83
+ ```
84
+ HUGGINGFACEHUB_API_TOKEN=your_huggingface_token_here
85
+ OPENAI_API_KEY=your_openai_key_here
86
+ XAI_API_KEY=your_xai_api_key_here # For X.AI/Grok models
87
+ ```
88
+
89
+ #### Agent Configuration
90
+ ```
91
+ AGENT_MODEL_TYPE=OpenAIServerModel # HfApiModel, InferenceClientModel, LiteLLMModel, OpenAIServerModel
92
+ AGENT_MODEL_ID=gpt-4o # Model ID depends on the model type
93
+ AGENT_TEMPERATURE=0.2
94
+ AGENT_EXECUTOR_TYPE=local # local or e2b for secure execution
95
+ AGENT_VERBOSE=true # Set to true for detailed logging
96
+ ```
97
+
98
+ #### Advanced Configuration
99
+ ```
100
+ AGENT_PROVIDER=hf-inference # Provider for InferenceClientModel
101
+ AGENT_TIMEOUT=120 # Timeout in seconds for API calls
102
+ AGENT_API_BASE=https://api.groq.com/openai/v1 # For X.AI when using OpenAIServerModel
103
+ ```
104
+
105
+ ## Usage
106
+
107
+ ### Running the Agent
108
+
109
+ Launch the Gradio interface with:
110
+
111
+ ```bash
112
+ python app.py
113
+ ```
114
+
115
+ Then:
116
+ 1. Log in to your Hugging Face account using the button in the interface
117
+ 2. Click "Run Evaluation & Submit All Answers"
118
+
119
+ ### Testing
120
+
121
+ To test the agent with sample questions before running the full evaluation:
122
+
123
+ ```bash
124
+ python test_agent.py
125
+ ```
126
+
127
+ For more focused testing with specific APIs:
128
+
129
+ ```bash
130
+ python test_groq_api.py # Test X.AI/Groq API integration
131
+ python test_xai_api.py # Test X.AI API integration
132
+ ```
133
+
134
+ ## Project Structure
135
+
136
+ - `app.py`: Main application with Gradio interface
137
+ - `core_agent.py`: Agent implementation with SmolAgents framework
138
+ - `api_integration.py`: Client for interacting with GAIA API
139
+ - `test_agent.py`: Testing script with sample questions
140
+ - `test_groq_api.py` & `test_xai_api.py`: API-specific test scripts
141
+ - `update_groq_key.py`: Utility for updating API keys
142
+ - `project_planning.md`: Development roadmap and progress tracking
143
+ - `requirements.txt`: Project dependencies
144
+
145
+ ## Tools Implementation
146
+
147
+ The agent includes several custom tools:
148
+
149
+ 1. **save_and_read_file**: Save content to a temporary file and return the path
150
+ 2. **download_file_from_url**: Download a file from a URL and save it locally
151
+ 3. **extract_text_from_image**: OCR for extracting text from images (requires pytesseract)
152
+ 4. **analyze_csv_file**: Load and analyze CSV files using pandas
153
+ 5. **analyze_excel_file**: Load and analyze Excel files using pandas
154
+
155
+ ## Resources
156
+
157
+ - [GAIA Benchmark Information](https://huggingface.co/spaces/gaia-benchmark/leaderboard)
158
+ - [SmolAgents Documentation](https://huggingface.co/docs/smolagents/en/index)
159
+ - [Hugging Face Agents Course](https://huggingface.co/agents-course)
160
+
161
+ ## License
162
+
163
+ This project is licensed under the MIT License.
__pycache__/api_integration.cpython-311.pyc ADDED
Binary file (3 kB). View file
 
__pycache__/core_agent.cpython-311.pyc ADDED
Binary file (19.8 kB). View file
 
api_integration.py ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import requests
2
+ from typing import List, Dict, Any
3
+ from core_agent import GAIAAgent
4
+
5
+ class GAIAApiClient:
6
+ def __init__(self, api_url="https://agents-course-unit4-scoring.hf.space"):
7
+ self.api_url = api_url
8
+ self.questions_url = f"{api_url}/questions"
9
+ self.submit_url = f"{api_url}/submit"
10
+ self.files_url = f"{api_url}/files"
11
+
12
+ def get_questions(self) -> List[Dict[str, Any]]:
13
+ """Fetch all evaluation questions"""
14
+ response = requests.get(self.questions_url)
15
+ response.raise_for_status()
16
+ return response.json()
17
+
18
+ def get_random_question(self) -> Dict[str, Any]:
19
+ """Fetch a single random question"""
20
+ response = requests.get(f"{self.api_url}/random-question")
21
+ response.raise_for_status()
22
+ return response.json()
23
+
24
+ def get_file(self, task_id: str) -> bytes:
25
+ """Download a file for a specific task"""
26
+ response = requests.get(f"{self.files_url}/{task_id}")
27
+ response.raise_for_status()
28
+ return response.content
29
+
30
+ def submit_answers(self, username: str, agent_code: str, answers: List[Dict[str, Any]]) -> Dict[str, Any]:
31
+ """Submit agent answers and get score"""
32
+ data = {
33
+ "username": username,
34
+ "agent_code": agent_code,
35
+ "answers": answers
36
+ }
37
+ response = requests.post(self.submit_url, json=data)
38
+ response.raise_for_status()
39
+ return response.json()
app.py ADDED
@@ -0,0 +1,279 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import gradio as gr
3
+ import requests
4
+ import inspect
5
+ import pandas as pd
6
+ from core_agent import GAIAAgent
7
+
8
+ # (Keep Constants as is)
9
+ # --- Constants ---
10
+ DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
11
+
12
+ # --- Basic Agent Definition ---
13
+ # ----- THIS IS WERE YOU CAN BUILD WHAT YOU WANT ------
14
+ class BasicAgent:
15
+ def __init__(self):
16
+ print("BasicAgent initialized.")
17
+
18
+ # Initialize the GAIAAgent with local execution
19
+ try:
20
+ # Load environment variables if dotenv is available
21
+ try:
22
+ import dotenv
23
+ dotenv.load_dotenv()
24
+ print("Loaded environment variables from .env file")
25
+ except ImportError:
26
+ print("python-dotenv not installed, continuing with environment as is")
27
+
28
+ # Try to load API keys from environment
29
+ api_key = os.getenv("XAI_API_KEY") or os.getenv("OPENAI_API_KEY") or os.getenv("HUGGINGFACEHUB_API_TOKEN")
30
+
31
+ # If we have at least one API key, use a model-based approach
32
+ if api_key:
33
+ # Default model parameters
34
+ model_type = os.getenv("AGENT_MODEL_TYPE", "OpenAIServerModel")
35
+ model_id = os.getenv("AGENT_MODEL_ID", "gpt-4o")
36
+
37
+ if os.getenv("XAI_API_KEY"):
38
+ # Use X.AI API with OpenAIServerModel
39
+ self.gaia_agent = GAIAAgent(
40
+ model_type="OpenAIServerModel",
41
+ model_id="grok-3-latest", # X.AI's model
42
+ api_key=os.getenv("XAI_API_KEY"),
43
+ api_base="https://api.x.ai/v1", # X.AI's endpoint, not Groq
44
+ temperature=0.2,
45
+ executor_type="local",
46
+ verbose=False
47
+ )
48
+ print("Using OpenAIServerModel with X.AI API")
49
+ elif model_type == "HfApiModel" and os.getenv("HUGGINGFACEHUB_API_TOKEN"):
50
+ # Use Hugging Face API
51
+ self.gaia_agent = GAIAAgent(
52
+ model_type="HfApiModel",
53
+ model_id=model_id,
54
+ api_key=os.getenv("HUGGINGFACEHUB_API_TOKEN"),
55
+ temperature=0.2,
56
+ executor_type="local",
57
+ verbose=False
58
+ )
59
+ print(f"Using HfApiModel with model_id: {model_id}")
60
+ else:
61
+ # Default to OpenAI API
62
+ self.gaia_agent = GAIAAgent(
63
+ model_type="OpenAIServerModel",
64
+ model_id=model_id,
65
+ api_key=os.getenv("OPENAI_API_KEY"),
66
+ temperature=0.2,
67
+ executor_type="local",
68
+ verbose=False
69
+ )
70
+ print(f"Using OpenAIServerModel with model_id: {model_id}")
71
+ else:
72
+ # No API keys available, use a local model setup with minimal dependencies
73
+ self.gaia_agent = GAIAAgent(
74
+ model_type="HfApiModel",
75
+ model_id="gpt2", # Simple model for basic testing
76
+ temperature=0.2,
77
+ executor_type="local",
78
+ verbose=False
79
+ )
80
+ print("Warning: No API keys found. Using a basic local execution setup.")
81
+
82
+ except Exception as e:
83
+ print(f"Error initializing GAIAAgent: {e}")
84
+ self.gaia_agent = None
85
+ print("WARNING: Failed to initialize agent. Falling back to basic responses.")
86
+
87
+ def __call__(self, question: str) -> str:
88
+ print(f"Agent received question (first 50 chars): {question[:50]}...")
89
+
90
+ # Check if we have a functioning GAIA agent
91
+ if self.gaia_agent:
92
+ try:
93
+ # Process the question using the GAIA agent
94
+ answer = self.gaia_agent.answer_question(question)
95
+ print(f"Agent generated answer: {answer[:50]}..." if len(answer) > 50 else f"Agent generated answer: {answer}")
96
+ return answer
97
+ except Exception as e:
98
+ print(f"Error processing question: {e}")
99
+ # Fall back to a simple response on error
100
+ return "An error occurred while processing your question. Please check the agent logs for details."
101
+ else:
102
+ # We don't have a valid agent, provide a basic response
103
+ return "The agent is not properly initialized. Please check your API keys and configuration."
104
+
105
+ def run_and_submit_all( profile: gr.OAuthProfile | None):
106
+ """
107
+ Fetches all questions, runs the BasicAgent on them, submits all answers,
108
+ and displays the results.
109
+ """
110
+ # --- Determine HF Space Runtime URL and Repo URL ---
111
+ space_id = os.getenv("SPACE_ID") # Get the SPACE_ID for sending link to the code
112
+
113
+ if profile:
114
+ username= f"{profile.username}"
115
+ print(f"User logged in: {username}")
116
+ else:
117
+ print("User not logged in.")
118
+ return "Please Login to Hugging Face with the button.", None
119
+
120
+ api_url = DEFAULT_API_URL
121
+ questions_url = f"{api_url}/questions"
122
+ submit_url = f"{api_url}/submit"
123
+
124
+ # 1. Instantiate Agent ( modify this part to create your agent)
125
+ try:
126
+ agent = BasicAgent()
127
+ except Exception as e:
128
+ print(f"Error instantiating agent: {e}")
129
+ return f"Error initializing agent: {e}", None
130
+ # In the case of an app running as a hugging Face space, this link points toward your codebase ( usefull for others so please keep it public)
131
+ agent_code = f"https://huggingface.co/spaces/{space_id}/tree/main"
132
+ print(agent_code)
133
+
134
+ # 2. Fetch Questions
135
+ print(f"Fetching questions from: {questions_url}")
136
+ try:
137
+ response = requests.get(questions_url, timeout=15)
138
+ response.raise_for_status()
139
+ questions_data = response.json()
140
+ if not questions_data:
141
+ print("Fetched questions list is empty.")
142
+ return "Fetched questions list is empty or invalid format.", None
143
+ print(f"Fetched {len(questions_data)} questions.")
144
+ except requests.exceptions.RequestException as e:
145
+ print(f"Error fetching questions: {e}")
146
+ return f"Error fetching questions: {e}", None
147
+ except requests.exceptions.JSONDecodeError as e:
148
+ print(f"Error decoding JSON response from questions endpoint: {e}")
149
+ print(f"Response text: {response.text[:500]}")
150
+ return f"Error decoding server response for questions: {e}", None
151
+ except Exception as e:
152
+ print(f"An unexpected error occurred fetching questions: {e}")
153
+ return f"An unexpected error occurred fetching questions: {e}", None
154
+
155
+ # 3. Run your Agent
156
+ results_log = []
157
+ answers_payload = []
158
+ print(f"Running agent on {len(questions_data)} questions...")
159
+ for item in questions_data:
160
+ task_id = item.get("task_id")
161
+ question_text = item.get("question")
162
+ if not task_id or question_text is None:
163
+ print(f"Skipping item with missing task_id or question: {item}")
164
+ continue
165
+ try:
166
+ submitted_answer = agent(question_text)
167
+ answers_payload.append({"task_id": task_id, "submitted_answer": submitted_answer})
168
+ results_log.append({"Task ID": task_id, "Question": question_text, "Submitted Answer": submitted_answer})
169
+ except Exception as e:
170
+ print(f"Error running agent on task {task_id}: {e}")
171
+ results_log.append({"Task ID": task_id, "Question": question_text, "Submitted Answer": f"AGENT ERROR: {e}"})
172
+
173
+ if not answers_payload:
174
+ print("Agent did not produce any answers to submit.")
175
+ return "Agent did not produce any answers to submit.", pd.DataFrame(results_log)
176
+
177
+ # 4. Prepare Submission
178
+ submission_data = {"username": username.strip(), "agent_code": agent_code, "answers": answers_payload}
179
+ status_update = f"Agent finished. Submitting {len(answers_payload)} answers for user '{username}'..."
180
+ print(status_update)
181
+
182
+ # 5. Submit
183
+ print(f"Submitting {len(answers_payload)} answers to: {submit_url}")
184
+ try:
185
+ response = requests.post(submit_url, json=submission_data, timeout=60)
186
+ response.raise_for_status()
187
+ result_data = response.json()
188
+ final_status = (
189
+ f"Submission Successful!\n"
190
+ f"User: {result_data.get('username')}\n"
191
+ f"Overall Score: {result_data.get('score', 'N/A')}% "
192
+ f"({result_data.get('correct_count', '?')}/{result_data.get('total_attempted', '?')} correct)\n"
193
+ f"Message: {result_data.get('message', 'No message received.')}"
194
+ )
195
+ print("Submission successful.")
196
+ results_df = pd.DataFrame(results_log)
197
+ return final_status, results_df
198
+ except requests.exceptions.HTTPError as e:
199
+ error_detail = f"Server responded with status {e.response.status_code}."
200
+ try:
201
+ error_json = e.response.json()
202
+ error_detail += f" Detail: {error_json.get('detail', e.response.text)}"
203
+ except requests.exceptions.JSONDecodeError:
204
+ error_detail += f" Response: {e.response.text[:500]}"
205
+ status_message = f"Submission Failed: {error_detail}"
206
+ print(status_message)
207
+ results_df = pd.DataFrame(results_log)
208
+ return status_message, results_df
209
+ except requests.exceptions.Timeout:
210
+ status_message = "Submission Failed: The request timed out."
211
+ print(status_message)
212
+ results_df = pd.DataFrame(results_log)
213
+ return status_message, results_df
214
+ except requests.exceptions.RequestException as e:
215
+ status_message = f"Submission Failed: Network error - {e}"
216
+ print(status_message)
217
+ results_df = pd.DataFrame(results_log)
218
+ return status_message, results_df
219
+ except Exception as e:
220
+ status_message = f"An unexpected error occurred during submission: {e}"
221
+ print(status_message)
222
+ results_df = pd.DataFrame(results_log)
223
+ return status_message, results_df
224
+
225
+
226
+ # --- Build Gradio Interface using Blocks ---
227
+ with gr.Blocks() as demo:
228
+ gr.Markdown("# Basic Agent Evaluation Runner")
229
+ gr.Markdown(
230
+ """
231
+ **Instructions:**
232
+
233
+ 1. Please clone this space, then modify the code to define your agent's logic, the tools, the necessary packages, etc ...
234
+ 2. Log in to your Hugging Face account using the button below. This uses your HF username for submission.
235
+ 3. Click 'Run Evaluation & Submit All Answers' to fetch questions, run your agent, submit answers, and see the score.
236
+
237
+ ---
238
+ **Disclaimers:**
239
+ Once clicking on the "submit button, it can take quite some time ( this is the time for the agent to go through all the questions).
240
+ This space provides a basic setup and is intentionally sub-optimal to encourage you to develop your own, more robust solution. For instance for the delay process of the submit button, a solution could be to cache the answers and submit in a seperate action or even to answer the questions in async.
241
+ """
242
+ )
243
+
244
+ gr.LoginButton()
245
+
246
+ run_button = gr.Button("Run Evaluation & Submit All Answers")
247
+
248
+ status_output = gr.Textbox(label="Run Status / Submission Result", lines=5, interactive=False)
249
+ # Removed max_rows=10 from DataFrame constructor
250
+ results_table = gr.DataFrame(label="Questions and Agent Answers", wrap=True)
251
+
252
+ run_button.click(
253
+ fn=run_and_submit_all,
254
+ outputs=[status_output, results_table]
255
+ )
256
+
257
+ if __name__ == "__main__":
258
+ print("\n" + "-"*30 + " App Starting " + "-"*30)
259
+ # Check for SPACE_HOST and SPACE_ID at startup for information
260
+ space_host_startup = os.getenv("SPACE_HOST")
261
+ space_id_startup = os.getenv("SPACE_ID") # Get SPACE_ID at startup
262
+
263
+ if space_host_startup:
264
+ print(f"βœ… SPACE_HOST found: {space_host_startup}")
265
+ print(f" Runtime URL should be: https://{space_host_startup}.hf.space")
266
+ else:
267
+ print("ℹ️ SPACE_HOST environment variable not found (running locally?).")
268
+
269
+ if space_id_startup: # Print repo URLs if SPACE_ID is found
270
+ print(f"βœ… SPACE_ID found: {space_id_startup}")
271
+ print(f" Repo URL: https://huggingface.co/spaces/{space_id_startup}")
272
+ print(f" Repo Tree URL: https://huggingface.co/spaces/{space_id_startup}/tree/main")
273
+ else:
274
+ print("ℹ️ SPACE_ID environment variable not found (running locally?). Repo URL cannot be determined.")
275
+
276
+ print("-"*(60 + len(" App Starting ")) + "\n")
277
+
278
+ print("Launching Gradio Interface for Basic Agent Evaluation...")
279
+ demo.launch(debug=True, share=False)
core_agent.py ADDED
@@ -0,0 +1,492 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from smolagents import (
2
+ CodeAgent,
3
+ DuckDuckGoSearchTool,
4
+ HfApiModel,
5
+ LiteLLMModel,
6
+ OpenAIServerModel,
7
+ PythonInterpreterTool,
8
+ tool,
9
+ InferenceClientModel
10
+ )
11
+ from typing import List, Dict, Any, Optional
12
+ import os
13
+ import tempfile
14
+ import re
15
+ import json
16
+ import requests
17
+ from urllib.parse import urlparse
18
+
19
+ @tool
20
+ def save_and_read_file(content: str, filename: Optional[str] = None) -> str:
21
+ """
22
+ Save content to a temporary file and return the path.
23
+ Useful for processing files from the GAIA API.
24
+
25
+ Args:
26
+ content: The content to save to the file
27
+ filename: Optional filename, will generate a random name if not provided
28
+
29
+ Returns:
30
+ Path to the saved file
31
+ """
32
+ temp_dir = tempfile.gettempdir()
33
+ if filename is None:
34
+ temp_file = tempfile.NamedTemporaryFile(delete=False)
35
+ filepath = temp_file.name
36
+ else:
37
+ filepath = os.path.join(temp_dir, filename)
38
+
39
+ # Write content to the file
40
+ with open(filepath, 'w') as f:
41
+ f.write(content)
42
+
43
+ return f"File saved to {filepath}. You can read this file to process its contents."
44
+
45
+ @tool
46
+ def download_file_from_url(url: str, filename: Optional[str] = None) -> str:
47
+ """
48
+ Download a file from a URL and save it to a temporary location.
49
+
50
+ Args:
51
+ url: The URL to download from
52
+ filename: Optional filename, will generate one based on URL if not provided
53
+
54
+ Returns:
55
+ Path to the downloaded file
56
+ """
57
+ try:
58
+ # Parse URL to get filename if not provided
59
+ if not filename:
60
+ path = urlparse(url).path
61
+ filename = os.path.basename(path)
62
+ if not filename:
63
+ # Generate a random name if we couldn't extract one
64
+ import uuid
65
+ filename = f"downloaded_{uuid.uuid4().hex[:8]}"
66
+
67
+ # Create temporary file
68
+ temp_dir = tempfile.gettempdir()
69
+ filepath = os.path.join(temp_dir, filename)
70
+
71
+ # Download the file
72
+ response = requests.get(url, stream=True)
73
+ response.raise_for_status()
74
+
75
+ # Save the file
76
+ with open(filepath, 'wb') as f:
77
+ for chunk in response.iter_content(chunk_size=8192):
78
+ f.write(chunk)
79
+
80
+ return f"File downloaded to {filepath}. You can now process this file."
81
+ except Exception as e:
82
+ return f"Error downloading file: {str(e)}"
83
+
84
+ @tool
85
+ def extract_text_from_image(image_path: str) -> str:
86
+ """
87
+ Extract text from an image using pytesseract (if available).
88
+
89
+ Args:
90
+ image_path: Path to the image file
91
+
92
+ Returns:
93
+ Extracted text or error message
94
+ """
95
+ try:
96
+ # Try to import pytesseract
97
+ import pytesseract
98
+ from PIL import Image
99
+
100
+ # Open the image
101
+ image = Image.open(image_path)
102
+
103
+ # Extract text
104
+ text = pytesseract.image_to_string(image)
105
+
106
+ return f"Extracted text from image:\n\n{text}"
107
+ except ImportError:
108
+ return "Error: pytesseract is not installed. Please install it with 'pip install pytesseract' and ensure Tesseract OCR is installed on your system."
109
+ except Exception as e:
110
+ return f"Error extracting text from image: {str(e)}"
111
+
112
+ @tool
113
+ def analyze_csv_file(file_path: str, query: str) -> str:
114
+ """
115
+ Analyze a CSV file using pandas and answer a question about it.
116
+
117
+ Args:
118
+ file_path: Path to the CSV file
119
+ query: Question about the data
120
+
121
+ Returns:
122
+ Analysis result or error message
123
+ """
124
+ try:
125
+ import pandas as pd
126
+
127
+ # Read the CSV file
128
+ df = pd.read_csv(file_path)
129
+
130
+ # Run various analyses based on the query
131
+ result = f"CSV file loaded with {len(df)} rows and {len(df.columns)} columns.\n"
132
+ result += f"Columns: {', '.join(df.columns)}\n\n"
133
+
134
+ # Add summary statistics
135
+ result += "Summary statistics:\n"
136
+ result += str(df.describe())
137
+
138
+ return result
139
+ except ImportError:
140
+ return "Error: pandas is not installed. Please install it with 'pip install pandas'."
141
+ except Exception as e:
142
+ return f"Error analyzing CSV file: {str(e)}"
143
+
144
+ @tool
145
+ def analyze_excel_file(file_path: str, query: str) -> str:
146
+ """
147
+ Analyze an Excel file using pandas and answer a question about it.
148
+
149
+ Args:
150
+ file_path: Path to the Excel file
151
+ query: Question about the data
152
+
153
+ Returns:
154
+ Analysis result or error message
155
+ """
156
+ try:
157
+ import pandas as pd
158
+
159
+ # Read the Excel file
160
+ df = pd.read_excel(file_path)
161
+
162
+ # Run various analyses based on the query
163
+ result = f"Excel file loaded with {len(df)} rows and {len(df.columns)} columns.\n"
164
+ result += f"Columns: {', '.join(df.columns)}\n\n"
165
+
166
+ # Add summary statistics
167
+ result += "Summary statistics:\n"
168
+ result += str(df.describe())
169
+
170
+ return result
171
+ except ImportError:
172
+ return "Error: pandas and openpyxl are not installed. Please install them with 'pip install pandas openpyxl'."
173
+ except Exception as e:
174
+ return f"Error analyzing Excel file: {str(e)}"
175
+
176
+ class GAIAAgent:
177
+ def __init__(
178
+ self,
179
+ model_type: str = "HfApiModel",
180
+ model_id: Optional[str] = None,
181
+ api_key: Optional[str] = None,
182
+ api_base: Optional[str] = None,
183
+ temperature: float = 0.2,
184
+ executor_type: str = "local", # Changed from use_e2b to executor_type
185
+ additional_imports: List[str] = None,
186
+ additional_tools: List[Any] = None,
187
+ system_prompt: Optional[str] = None, # We'll still accept this parameter but not use it directly
188
+ verbose: bool = False,
189
+ provider: Optional[str] = None, # Add provider for InferenceClientModel
190
+ timeout: Optional[int] = None # Add timeout for InferenceClientModel
191
+ ):
192
+ """
193
+ Initialize a GAIAAgent with specified configuration
194
+
195
+ Args:
196
+ model_type: Type of model to use (HfApiModel, LiteLLMModel, OpenAIServerModel, InferenceClientModel)
197
+ model_id: ID of the model to use
198
+ api_key: API key for the model provider
199
+ api_base: Base URL for API calls
200
+ temperature: Temperature for text generation
201
+ executor_type: Type of executor for code execution ('local' or 'e2b')
202
+ additional_imports: Additional Python modules to allow importing
203
+ additional_tools: Additional tools to provide to the agent
204
+ system_prompt: Custom system prompt to use (not directly used, kept for backward compatibility)
205
+ verbose: Enable verbose logging
206
+ provider: Provider for InferenceClientModel (e.g., "hf-inference")
207
+ timeout: Timeout in seconds for API calls
208
+ """
209
+ # Set verbosity
210
+ self.verbose = verbose
211
+ self.system_prompt = system_prompt # Store for potential future use
212
+
213
+ # Initialize model based on configuration
214
+ if model_type == "HfApiModel":
215
+ if api_key is None:
216
+ api_key = os.getenv("HUGGINGFACEHUB_API_TOKEN")
217
+ if not api_key:
218
+ raise ValueError("No Hugging Face token provided. Please set HUGGINGFACEHUB_API_TOKEN environment variable or pass api_key parameter.")
219
+
220
+ if self.verbose:
221
+ print(f"Using Hugging Face token: {api_key[:5]}...")
222
+
223
+ self.model = HfApiModel(
224
+ model_id=model_id or "meta-llama/Llama-3-70B-Instruct",
225
+ token=api_key,
226
+ temperature=temperature
227
+ )
228
+ elif model_type == "InferenceClientModel":
229
+ if api_key is None:
230
+ api_key = os.getenv("HUGGINGFACEHUB_API_TOKEN")
231
+ if not api_key:
232
+ raise ValueError("No Hugging Face token provided. Please set HUGGINGFACEHUB_API_TOKEN environment variable or pass api_key parameter.")
233
+
234
+ if self.verbose:
235
+ print(f"Using Hugging Face token: {api_key[:5]}...")
236
+
237
+ self.model = InferenceClientModel(
238
+ model_id=model_id or "meta-llama/Llama-3-70B-Instruct",
239
+ provider=provider or "hf-inference",
240
+ token=api_key,
241
+ timeout=timeout or 120,
242
+ temperature=temperature
243
+ )
244
+ elif model_type == "LiteLLMModel":
245
+ from smolagents import LiteLLMModel
246
+ self.model = LiteLLMModel(
247
+ model_id=model_id or "gpt-4o",
248
+ api_key=api_key or os.getenv("OPENAI_API_KEY"),
249
+ temperature=temperature
250
+ )
251
+ elif model_type == "OpenAIServerModel":
252
+ # Check for xAI API key and base URL first
253
+ xai_api_key = os.getenv("XAI_API_KEY")
254
+ xai_api_base = os.getenv("XAI_API_BASE")
255
+
256
+ # If xAI credentials are available, use them
257
+ if xai_api_key and api_key is None:
258
+ api_key = xai_api_key
259
+ if self.verbose:
260
+ print(f"Using xAI API key: {api_key[:5]}...")
261
+
262
+ # If no API key specified, fall back to OPENAI_API_KEY
263
+ if api_key is None:
264
+ api_key = os.getenv("OPENAI_API_KEY")
265
+ if not api_key:
266
+ raise ValueError("No OpenAI API key provided. Please set OPENAI_API_KEY or XAI_API_KEY environment variable or pass api_key parameter.")
267
+
268
+ # If xAI API base is available and no api_base is provided, use it
269
+ if xai_api_base and api_base is None:
270
+ api_base = xai_api_base
271
+ if self.verbose:
272
+ print(f"Using xAI API base URL: {api_base}")
273
+
274
+ # If no API base specified but environment variable available, use it
275
+ if api_base is None:
276
+ api_base = os.getenv("AGENT_API_BASE")
277
+ if api_base and self.verbose:
278
+ print(f"Using API base from AGENT_API_BASE: {api_base}")
279
+
280
+ self.model = OpenAIServerModel(
281
+ model_id=model_id or "gpt-4o",
282
+ api_key=api_key,
283
+ api_base=api_base,
284
+ temperature=temperature
285
+ )
286
+ else:
287
+ raise ValueError(f"Unknown model type: {model_type}")
288
+
289
+ if self.verbose:
290
+ print(f"Initialized model: {model_type} - {model_id}")
291
+
292
+ # Initialize default tools
293
+ self.tools = [
294
+ DuckDuckGoSearchTool(),
295
+ PythonInterpreterTool(),
296
+ save_and_read_file,
297
+ download_file_from_url,
298
+ analyze_csv_file,
299
+ analyze_excel_file
300
+ ]
301
+
302
+ # Add extract_text_from_image if PIL and pytesseract are available
303
+ try:
304
+ import pytesseract
305
+ from PIL import Image
306
+ self.tools.append(extract_text_from_image)
307
+ if self.verbose:
308
+ print("Added image processing tool")
309
+ except ImportError:
310
+ if self.verbose:
311
+ print("Image processing libraries not available")
312
+
313
+ # Add any additional tools
314
+ if additional_tools:
315
+ self.tools.extend(additional_tools)
316
+
317
+ if self.verbose:
318
+ print(f"Initialized with {len(self.tools)} tools")
319
+
320
+ # Setup imports allowed
321
+ self.imports = ["pandas", "numpy", "datetime", "json", "re", "math", "os", "requests", "csv", "urllib"]
322
+ if additional_imports:
323
+ self.imports.extend(additional_imports)
324
+
325
+ # Initialize the CodeAgent
326
+ executor_kwargs = {}
327
+ if executor_type == "e2b":
328
+ try:
329
+ # Try to import e2b dependencies to check if they're available
330
+ from e2b_code_interpreter import Sandbox
331
+ if self.verbose:
332
+ print("Using e2b executor")
333
+ except ImportError:
334
+ if self.verbose:
335
+ print("e2b dependencies not found, falling back to local executor")
336
+ executor_type = "local" # Fallback to local if e2b is not available
337
+
338
+ self.agent = CodeAgent(
339
+ tools=self.tools,
340
+ model=self.model,
341
+ additional_authorized_imports=self.imports,
342
+ executor_type=executor_type,
343
+ executor_kwargs=executor_kwargs,
344
+ verbosity_level=2 if self.verbose else 0
345
+ )
346
+
347
+ if self.verbose:
348
+ print("Agent initialized and ready")
349
+
350
+ def answer_question(self, question: str, task_file_path: Optional[str] = None) -> str:
351
+ """
352
+ Process a GAIA benchmark question and return the answer
353
+
354
+ Args:
355
+ question: The question to answer
356
+ task_file_path: Optional path to a file associated with the question
357
+
358
+ Returns:
359
+ The answer to the question
360
+ """
361
+ try:
362
+ if self.verbose:
363
+ print(f"Processing question: {question}")
364
+ if task_file_path:
365
+ print(f"With associated file: {task_file_path}")
366
+
367
+ # Create a context with file information if available
368
+ context = question
369
+ file_content = None
370
+
371
+ # If there's a file, read it and include its content in the context
372
+ if task_file_path:
373
+ try:
374
+ with open(task_file_path, 'r') as f:
375
+ file_content = f.read()
376
+
377
+ # Determine file type from extension
378
+ import os
379
+ file_ext = os.path.splitext(task_file_path)[1].lower()
380
+
381
+ context = f"""
382
+ Question: {question}
383
+
384
+ This question has an associated file. Here is the file content:
385
+
386
+ ```{file_ext}
387
+ {file_content}
388
+ ```
389
+
390
+ Analyze the file content above to answer the question.
391
+ """
392
+ except Exception as file_e:
393
+ context = f"""
394
+ Question: {question}
395
+
396
+ This question has an associated file at path: {task_file_path}
397
+ However, there was an error reading the file: {file_e}
398
+ You can still try to answer the question based on the information provided.
399
+ """
400
+
401
+ # Check for special cases that need specific formatting
402
+ # Reversed text questions
403
+ if question.startswith(".") or ".rewsna eht sa" in question:
404
+ context = f"""
405
+ This question appears to be in reversed text. Here's the reversed version:
406
+ {question[::-1]}
407
+
408
+ Now answer the question above. Remember to format your answer exactly as requested.
409
+ """
410
+
411
+ # Add a prompt to ensure precise answers
412
+ full_prompt = f"""{context}
413
+
414
+ When answering, provide ONLY the precise answer requested.
415
+ Do not include explanations, steps, reasoning, or additional text.
416
+ Be direct and specific. GAIA benchmark requires exact matching answers.
417
+ For example, if asked "What is the capital of France?", respond simply with "Paris".
418
+ """
419
+
420
+ # Run the agent with the question
421
+ answer = self.agent.run(full_prompt)
422
+
423
+ # Clean up the answer to ensure it's in the expected format
424
+ # Remove common prefixes that models often add
425
+ answer = self._clean_answer(answer)
426
+
427
+ if self.verbose:
428
+ print(f"Generated answer: {answer}")
429
+
430
+ return answer
431
+ except Exception as e:
432
+ error_msg = f"Error answering question: {e}"
433
+ if self.verbose:
434
+ print(error_msg)
435
+ return error_msg
436
+
437
+ def _clean_answer(self, answer: any) -> str:
438
+ """
439
+ Clean up the answer to remove common prefixes and formatting
440
+ that models often add but that can cause exact match failures.
441
+
442
+ Args:
443
+ answer: The raw answer from the model
444
+
445
+ Returns:
446
+ The cleaned answer as a string
447
+ """
448
+ # Convert non-string types to strings
449
+ if not isinstance(answer, str):
450
+ # Handle numeric types (float, int)
451
+ if isinstance(answer, float):
452
+ # Format floating point numbers properly
453
+ # Check if it's an integer value in float form (e.g., 12.0)
454
+ if answer.is_integer():
455
+ formatted_answer = str(int(answer))
456
+ else:
457
+ # For currency values that might need formatting
458
+ if abs(answer) >= 1000:
459
+ formatted_answer = f"${answer:,.2f}"
460
+ else:
461
+ formatted_answer = str(answer)
462
+ return formatted_answer
463
+ elif isinstance(answer, int):
464
+ return str(answer)
465
+ else:
466
+ # For any other type
467
+ return str(answer)
468
+
469
+ # Now we know answer is a string, so we can safely use string methods
470
+ # Normalize whitespace
471
+ answer = answer.strip()
472
+
473
+ # Remove common prefixes and formatting that models add
474
+ prefixes_to_remove = [
475
+ "The answer is ",
476
+ "Answer: ",
477
+ "Final answer: ",
478
+ "The result is ",
479
+ "To answer this question: ",
480
+ "Based on the information provided, ",
481
+ "According to the information: ",
482
+ ]
483
+
484
+ for prefix in prefixes_to_remove:
485
+ if answer.startswith(prefix):
486
+ answer = answer[len(prefix):].strip()
487
+
488
+ # Remove quotes if they wrap the entire answer
489
+ if (answer.startswith('"') and answer.endswith('"')) or (answer.startswith("'") and answer.endswith("'")):
490
+ answer = answer[1:-1].strip()
491
+
492
+ return answer
local_test.py ADDED
@@ -0,0 +1,249 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Test script for the GAIA agent using real API keys.
4
+ This script simulates GAIA benchmark questions and helps debug/improve the agent.
5
+ """
6
+
7
+ import os
8
+ import sys
9
+ import json
10
+ import tempfile
11
+ from typing import List, Dict, Any, Optional
12
+ import traceback
13
+ import dotenv
14
+
15
+ # Load environment variables from .env file
16
+ dotenv.load_dotenv()
17
+
18
+ # Import our agent
19
+ from core_agent import GAIAAgent
20
+
21
+ # Simulation of GAIA benchmark questions
22
+ SAMPLE_QUESTIONS = [
23
+ {
24
+ "task_id": "task_001",
25
+ "question": "What is the capital of France?",
26
+ "expected_answer": "Paris",
27
+ "has_file": False,
28
+ "file_content": None
29
+ },
30
+ {
31
+ "task_id": "task_002",
32
+ "question": "What is the square root of 144?",
33
+ "expected_answer": "12",
34
+ "has_file": False,
35
+ "file_content": None
36
+ },
37
+ {
38
+ "task_id": "task_003",
39
+ "question": "If a train travels at 60 miles per hour, how far will it travel in 2.5 hours?",
40
+ "expected_answer": "150 miles",
41
+ "has_file": False,
42
+ "file_content": None
43
+ },
44
+ {
45
+ "task_id": "task_004",
46
+ "question": ".rewsna eht sa 'thgir' drow eht etirw ,tfel fo etisoppo eht si tahW",
47
+ "expected_answer": "right",
48
+ "has_file": False,
49
+ "file_content": None
50
+ },
51
+ {
52
+ "task_id": "task_005",
53
+ "question": "Analyze the data in the attached CSV file and tell me the total sales for the month of January.",
54
+ "expected_answer": "$10,250.75",
55
+ "has_file": True,
56
+ "file_content": """Date,Product,Quantity,Price,Total
57
+ 2023-01-05,Widget A,10,25.99,259.90
58
+ 2023-01-12,Widget B,5,45.50,227.50
59
+ 2023-01-15,Widget C,20,50.25,1005.00
60
+ 2023-01-20,Widget A,15,25.99,389.85
61
+ 2023-01-25,Widget B,8,45.50,364.00
62
+ 2023-01-28,Widget D,100,80.04,8004.50"""
63
+ },
64
+ {
65
+ "task_id": "task_006",
66
+ "question": "I'm making a grocery list for my mom, but she's a picky eater. She only eats foods that don't contain the letter 'e'. List 5 common fruits and vegetables she can eat.",
67
+ "expected_answer": "Banana, Kiwi, Corn, Fig, Taro",
68
+ "has_file": False,
69
+ "file_content": None
70
+ },
71
+ {
72
+ "task_id": "task_007",
73
+ "question": "How many studio albums were published by Mercedes Sosa between 1972 and 1985?",
74
+ "expected_answer": "12",
75
+ "has_file": False,
76
+ "file_content": None
77
+ },
78
+ {
79
+ "task_id": "task_008",
80
+ "question": "In the video https://www.youtube.com/watch?v=L1vXC1KMRd0, what color is primarily associated with the main character?",
81
+ "expected_answer": "Blue",
82
+ "has_file": False,
83
+ "file_content": None
84
+ }
85
+ ]
86
+
87
+ def initialize_agent():
88
+ """Initialize the GAIAAgent with appropriate API keys."""
89
+ print("Initializing GAIAAgent with API keys...")
90
+
91
+ # Try X.AI first (xAI) with the correct API endpoint
92
+ if os.getenv("XAI_API_KEY"):
93
+ print("Using X.AI API key")
94
+ try:
95
+ agent = GAIAAgent(
96
+ model_type="OpenAIServerModel",
97
+ model_id="grok-3-latest", # Use the X.AI model
98
+ api_key=os.getenv("XAI_API_KEY"),
99
+ api_base="https://api.x.ai/v1", # Correct X.AI endpoint
100
+ temperature=0.2,
101
+ executor_type="local",
102
+ verbose=True,
103
+ system_prompt_suffix=additional_system_prompt # Add our hints
104
+ )
105
+ print("Using OpenAIServerModel with X.AI API")
106
+ return agent
107
+ except Exception as e:
108
+ print(f"Error initializing with X.AI API: {e}")
109
+ traceback.print_exc()
110
+
111
+ # Then try OpenAI
112
+ if os.getenv("OPENAI_API_KEY"):
113
+ print("Using OpenAI API key")
114
+ try:
115
+ model_id = os.getenv("AGENT_MODEL_ID", "gpt-4o")
116
+ agent = GAIAAgent(
117
+ model_type="OpenAIServerModel",
118
+ model_id=model_id,
119
+ api_key=os.getenv("OPENAI_API_KEY"),
120
+ temperature=0.2,
121
+ executor_type="local",
122
+ verbose=True
123
+ )
124
+ print(f"Using OpenAIServerModel with model_id: {model_id}")
125
+ return agent
126
+ except Exception as e:
127
+ print(f"Error initializing with OpenAI API: {e}")
128
+ traceback.print_exc()
129
+
130
+ # Last resort, try Hugging Face
131
+ if os.getenv("HUGGINGFACEHUB_API_TOKEN"):
132
+ print("Using Hugging Face API token")
133
+ try:
134
+ # Use a smaller model that might work within free tier
135
+ model_id = "tiiuae/falcon-7b-instruct" # Try a smaller model that might be within free tier
136
+ agent = GAIAAgent(
137
+ model_type="HfApiModel",
138
+ model_id=model_id,
139
+ api_key=os.getenv("HUGGINGFACEHUB_API_TOKEN"),
140
+ temperature=0.2,
141
+ executor_type="local",
142
+ verbose=True
143
+ )
144
+ print(f"Using HfApiModel with model_id: {model_id}")
145
+ return agent
146
+ except Exception as e:
147
+ print(f"Error initializing with Hugging Face API: {e}")
148
+ traceback.print_exc()
149
+
150
+ print("ERROR: No valid API keys found in environment. Please set one of the following:")
151
+ print("- XAI_API_KEY (for X.AI)")
152
+ print("- OPENAI_API_KEY")
153
+ print("- HUGGINGFACEHUB_API_TOKEN")
154
+ return None
155
+
156
+ def save_test_file(task_id: str, content: str) -> str:
157
+ """Save a test file to a temporary location."""
158
+ temp_dir = tempfile.gettempdir()
159
+ file_path = os.path.join(temp_dir, f"test_file_{task_id}.csv")
160
+
161
+ with open(file_path, 'w') as f:
162
+ f.write(content)
163
+
164
+ return file_path
165
+
166
+ def run_tests():
167
+ """Run tests using the GAIAAgent with API keys."""
168
+ agent = initialize_agent()
169
+
170
+ if not agent:
171
+ print("Failed to initialize agent. Exiting.")
172
+ return
173
+
174
+ results = []
175
+ correct_count = 0
176
+ total_count = len(SAMPLE_QUESTIONS)
177
+
178
+ for idx, question_data in enumerate(SAMPLE_QUESTIONS):
179
+ task_id = question_data["task_id"]
180
+ question = question_data["question"]
181
+ expected = question_data["expected_answer"]
182
+
183
+ print(f"\n{'='*80}")
184
+ print(f"Question {idx+1}/{total_count}: {question}")
185
+ print(f"Expected: {expected}")
186
+
187
+ # Process any attached file
188
+ file_path = None
189
+ if question_data["has_file"] and question_data["file_content"]:
190
+ file_path = save_test_file(task_id, question_data["file_content"])
191
+ print(f"Created test file: {file_path}")
192
+
193
+ # Get answer from agent
194
+ try:
195
+ answer = agent.answer_question(question, file_path)
196
+ print(f"Agent answer: {answer}")
197
+
198
+ # Check if answer matches expected
199
+ is_correct = answer.lower() == expected.lower()
200
+ if is_correct:
201
+ correct_count += 1
202
+ print(f"βœ… CORRECT")
203
+ else:
204
+ print(f"❌ INCORRECT - Expected: {expected}")
205
+
206
+ results.append({
207
+ "task_id": task_id,
208
+ "question": question,
209
+ "expected": expected,
210
+ "answer": answer,
211
+ "is_correct": is_correct
212
+ })
213
+ except Exception as e:
214
+ error_details = traceback.format_exc()
215
+ print(f"Error processing question: {e}\n{error_details}")
216
+ results.append({
217
+ "task_id": task_id,
218
+ "question": question,
219
+ "expected": expected,
220
+ "answer": f"ERROR: {str(e)}",
221
+ "is_correct": False
222
+ })
223
+
224
+ # Print summary
225
+ accuracy = (correct_count / total_count) * 100
226
+ print(f"\n{'='*80}")
227
+ print(f"Test Results: {correct_count}/{total_count} correct ({accuracy:.1f}%)")
228
+
229
+ return results
230
+
231
+
232
+ if __name__ == "__main__":
233
+ print("Running tests for GAIA agent with API keys...")
234
+
235
+ # Print environment information
236
+ print("\nEnvironment information:")
237
+ print(f"XAI_API_KEY set: {'Yes' if os.getenv('XAI_API_KEY') else 'No'}")
238
+ print(f"OPENAI_API_KEY set: {'Yes' if os.getenv('OPENAI_API_KEY') else 'No'}")
239
+ print(f"HUGGINGFACEHUB_API_TOKEN set: {'Yes' if os.getenv('HUGGINGFACEHUB_API_TOKEN') else 'No'}")
240
+ print(f"AGENT_MODEL_TYPE: {os.getenv('AGENT_MODEL_TYPE', 'OpenAIServerModel')} (default: OpenAIServerModel)")
241
+ print(f"AGENT_MODEL_ID: {os.getenv('AGENT_MODEL_ID', 'gpt-4o')} (default: gpt-4o)")
242
+
243
+ results = run_tests()
244
+
245
+ # Save results to a file
246
+ with open("test_results.json", "w") as f:
247
+ json.dump(results, f, indent=2)
248
+
249
+ print("\nResults saved to test_results.json")
main.py ADDED
@@ -0,0 +1,277 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import tempfile
3
+ import gradio as gr
4
+ import pandas as pd
5
+ import traceback
6
+ from core_agent import GAIAAgent
7
+ from api_integration import GAIAApiClient
8
+
9
+ # Constants
10
+ DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
11
+
12
+ def save_task_file(file_content, task_id):
13
+ """
14
+ Save a task file to a temporary location
15
+ """
16
+ if not file_content:
17
+ return None
18
+
19
+ # Create a temporary file
20
+ temp_dir = tempfile.gettempdir()
21
+ file_path = os.path.join(temp_dir, f"gaia_task_{task_id}.txt")
22
+
23
+ # Write content to the file
24
+ with open(file_path, 'wb') as f:
25
+ f.write(file_content)
26
+
27
+ print(f"File saved to {file_path}")
28
+ return file_path
29
+
30
+ def get_agent_configuration():
31
+ """
32
+ Get the agent configuration based on environment variables
33
+ """
34
+ # Default configuration
35
+ config = {
36
+ "model_type": "OpenAIServerModel", # Default to OpenAIServerModel
37
+ "model_id": "gpt-4o", # Default model for OpenAI
38
+ "temperature": 0.2,
39
+ "executor_type": "local",
40
+ "verbose": False,
41
+ "provider": "hf-inference", # For InferenceClientModel
42
+ "timeout": 120 # For InferenceClientModel
43
+ }
44
+
45
+ # Check for xAI API key and base URL
46
+ xai_api_key = os.getenv("XAI_API_KEY")
47
+ xai_api_base = os.getenv("XAI_API_BASE")
48
+
49
+ # If we have xAI credentials, use them
50
+ if xai_api_key:
51
+ config["api_key"] = xai_api_key
52
+ if xai_api_base:
53
+ config["api_base"] = xai_api_base
54
+ # Use a model that works well with xAI
55
+ config["model_id"] = "mixtral-8x7b-32768"
56
+
57
+ # Override with environment variables if present
58
+ if os.getenv("AGENT_MODEL_TYPE"):
59
+ config["model_type"] = os.getenv("AGENT_MODEL_TYPE")
60
+
61
+ if os.getenv("AGENT_MODEL_ID"):
62
+ config["model_id"] = os.getenv("AGENT_MODEL_ID")
63
+
64
+ if os.getenv("AGENT_TEMPERATURE"):
65
+ config["temperature"] = float(os.getenv("AGENT_TEMPERATURE"))
66
+
67
+ if os.getenv("AGENT_EXECUTOR_TYPE"):
68
+ config["executor_type"] = os.getenv("AGENT_EXECUTOR_TYPE")
69
+
70
+ if os.getenv("AGENT_VERBOSE") is not None:
71
+ config["verbose"] = os.getenv("AGENT_VERBOSE").lower() == "true"
72
+
73
+ if os.getenv("AGENT_API_BASE"):
74
+ config["api_base"] = os.getenv("AGENT_API_BASE")
75
+
76
+ # InferenceClientModel specific settings
77
+ if os.getenv("AGENT_PROVIDER"):
78
+ config["provider"] = os.getenv("AGENT_PROVIDER")
79
+
80
+ if os.getenv("AGENT_TIMEOUT"):
81
+ config["timeout"] = int(os.getenv("AGENT_TIMEOUT"))
82
+
83
+ return config
84
+
85
+ def run_and_submit_all(profile: gr.OAuthProfile | None):
86
+ """
87
+ Fetches all questions, runs the GAIAAgent on them, submits all answers,
88
+ and displays the results.
89
+ """
90
+ # Check for user login
91
+ if not profile:
92
+ return "Please Login to Hugging Face with the button.", None
93
+
94
+ username = profile.username
95
+ print(f"User logged in: {username}")
96
+
97
+ # Get SPACE_ID for code link
98
+ space_id = os.getenv("SPACE_ID")
99
+ agent_code = f"https://huggingface.co/spaces/{space_id}/tree/main"
100
+
101
+ # Initialize API client
102
+ api_client = GAIAApiClient(DEFAULT_API_URL)
103
+
104
+ # Initialize Agent with configuration
105
+ try:
106
+ agent_config = get_agent_configuration()
107
+ print(f"Using agent configuration: {agent_config}")
108
+
109
+ agent = GAIAAgent(**agent_config)
110
+ print("Agent initialized successfully")
111
+ except Exception as e:
112
+ error_details = traceback.format_exc()
113
+ print(f"Error initializing agent: {e}\n{error_details}")
114
+ return f"Error initializing agent: {e}", None
115
+
116
+ # Fetch questions
117
+ try:
118
+ questions_data = api_client.get_questions()
119
+ if not questions_data:
120
+ return "Fetched questions list is empty or invalid format.", None
121
+ print(f"Fetched {len(questions_data)} questions.")
122
+ except Exception as e:
123
+ error_details = traceback.format_exc()
124
+ print(f"Error fetching questions: {e}\n{error_details}")
125
+ return f"Error fetching questions: {e}", None
126
+
127
+ # Run agent on questions
128
+ results_log = []
129
+ answers_payload = []
130
+ print(f"Running agent on {len(questions_data)} questions...")
131
+
132
+ # Progress tracking
133
+ total_questions = len(questions_data)
134
+ completed = 0
135
+ failed = 0
136
+
137
+ for item in questions_data:
138
+ task_id = item.get("task_id")
139
+ question_text = item.get("question")
140
+ if not task_id or question_text is None:
141
+ print(f"Skipping item with missing task_id or question: {item}")
142
+ continue
143
+
144
+ try:
145
+ # Update progress
146
+ completed += 1
147
+ print(f"Processing question {completed}/{total_questions}: Task ID {task_id}")
148
+
149
+ # Check if the question has an associated file
150
+ file_path = None
151
+ try:
152
+ file_content = api_client.get_file(task_id)
153
+ print(f"Downloaded file for task {task_id}")
154
+ file_path = save_task_file(file_content, task_id)
155
+ except Exception as file_e:
156
+ print(f"No file found for task {task_id} or error: {file_e}")
157
+
158
+ # Run the agent to get the answer
159
+ submitted_answer = agent.answer_question(question_text, file_path)
160
+
161
+ # Add to results
162
+ answers_payload.append({"task_id": task_id, "submitted_answer": submitted_answer})
163
+ results_log.append({
164
+ "Task ID": task_id,
165
+ "Question": question_text,
166
+ "Submitted Answer": submitted_answer
167
+ })
168
+ except Exception as e:
169
+ # Update error count
170
+ failed += 1
171
+ error_details = traceback.format_exc()
172
+ print(f"Error running agent on task {task_id}: {e}\n{error_details}")
173
+
174
+ # Add error to results
175
+ error_msg = f"AGENT ERROR: {e}"
176
+ answers_payload.append({"task_id": task_id, "submitted_answer": error_msg})
177
+ results_log.append({
178
+ "Task ID": task_id,
179
+ "Question": question_text,
180
+ "Submitted Answer": error_msg
181
+ })
182
+
183
+ # Print summary
184
+ print(f"\nProcessing complete: {completed} questions processed, {failed} failures")
185
+
186
+ if not answers_payload:
187
+ return "Agent did not produce any answers to submit.", pd.DataFrame(results_log)
188
+
189
+ # Submit answers
190
+ submission_data = {
191
+ "username": username.strip(),
192
+ "agent_code": agent_code,
193
+ "answers": answers_payload
194
+ }
195
+
196
+ print(f"Submitting {len(answers_payload)} answers for username '{username}'...")
197
+
198
+ try:
199
+ result_data = api_client.submit_answers(
200
+ username.strip(),
201
+ agent_code,
202
+ answers_payload
203
+ )
204
+
205
+ # Calculate success rate
206
+ correct_count = result_data.get('correct_count', 0)
207
+ total_attempted = result_data.get('total_attempted', len(answers_payload))
208
+ success_rate = (correct_count / total_attempted) * 100 if total_attempted > 0 else 0
209
+
210
+ final_status = (
211
+ f"Submission Successful!\n"
212
+ f"User: {result_data.get('username')}\n"
213
+ f"Overall Score: {result_data.get('score', 'N/A')}% "
214
+ f"({correct_count}/{total_attempted} correct, {success_rate:.1f}% success rate)\n"
215
+ f"Message: {result_data.get('message', 'No message received.')}"
216
+ )
217
+
218
+ print("Submission successful.")
219
+ return final_status, pd.DataFrame(results_log)
220
+ except Exception as e:
221
+ error_details = traceback.format_exc()
222
+ status_message = f"Submission Failed: {e}\n{error_details}"
223
+ print(status_message)
224
+ return status_message, pd.DataFrame(results_log)
225
+
226
+ # Build Gradio Interface
227
+ with gr.Blocks() as demo:
228
+ gr.Markdown("# GAIA Agent Evaluation Runner")
229
+ gr.Markdown(
230
+ """
231
+ **Instructions:**
232
+
233
+ 1. Log in to your Hugging Face account using the button below.
234
+ 2. Click 'Run Evaluation & Submit All Answers' to fetch questions, run your agent, submit answers, and see the score.
235
+
236
+ **Configuration:**
237
+
238
+ You can configure the agent by setting these environment variables:
239
+ - `AGENT_MODEL_TYPE`: Model type (HfApiModel, InferenceClientModel, LiteLLMModel, OpenAIServerModel)
240
+ - `AGENT_MODEL_ID`: Model ID
241
+ - `AGENT_TEMPERATURE`: Temperature for generation (0.0-1.0)
242
+ - `AGENT_EXECUTOR_TYPE`: Type of executor ('local' or 'e2b')
243
+ - `AGENT_VERBOSE`: Enable verbose logging (true/false)
244
+ - `AGENT_API_BASE`: Base URL for API calls (for OpenAIServerModel)
245
+
246
+ **xAI Support:**
247
+ - `XAI_API_KEY`: Your xAI API key
248
+ - `XAI_API_BASE`: Base URL for xAI API (default: https://api.groq.com/openai/v1)
249
+ - When using xAI, set AGENT_MODEL_TYPE=OpenAIServerModel and AGENT_MODEL_ID=mixtral-8x7b-32768
250
+
251
+ **InferenceClientModel specific settings:**
252
+ - `AGENT_PROVIDER`: Provider for InferenceClientModel (e.g., "hf-inference")
253
+ - `AGENT_TIMEOUT`: Timeout in seconds for API calls
254
+ """
255
+ )
256
+
257
+ gr.LoginButton()
258
+
259
+ run_button = gr.Button("Run Evaluation & Submit All Answers")
260
+
261
+ status_output = gr.Textbox(label="Run Status / Submission Result", lines=5, interactive=False)
262
+ results_table = gr.DataFrame(label="Questions and Agent Answers", wrap=True)
263
+
264
+ run_button.click(
265
+ fn=run_and_submit_all,
266
+ outputs=[status_output, results_table]
267
+ )
268
+
269
+ if __name__ == "__main__":
270
+ print("\n" + "-"*30 + " App Starting " + "-"*30)
271
+
272
+ # Check for environment variables
273
+ config = get_agent_configuration()
274
+ print(f"Agent configuration: {config}")
275
+
276
+ # Run the Gradio app
277
+ demo.launch(debug=True, share=False)
requirements.txt ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ gradio
2
+ requests
3
+ smolagents
4
+ python-dotenv
5
+ pandas
6
+ numpy