Dacho688 commited on
Commit
fdd0e93
Β·
1 Parent(s): 9fbad84

Upgrade to llama 3.3 70B from 3.1

Browse files
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- title: Agent Data Analyst
3
  emoji: πŸ€”πŸ“Š
4
  colorFrom: yellow
5
  colorTo: red
@@ -8,7 +8,11 @@ sdk_version: 4.38.1
8
  app_file: app.py
9
  pinned: false
10
  license: apache-2.0
11
- short_description: Need to analyze data? Let a Llama-3.1 agent do it for you!
12
  ---
 
13
 
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
1
  ---
2
+ title: Data Analyst AI Agent
3
  emoji: πŸ€”πŸ“Š
4
  colorFrom: yellow
5
  colorTo: red
 
8
  app_file: app.py
9
  pinned: false
10
  license: apache-2.0
11
+ short_description: Need to analyze data? Let a Llama-3.3 AI agent do it for you!
12
  ---
13
+ ## Agent Data Analyst
14
 
15
+ I'm your personal Data Analyst AI Agent built on top of Llama-3.3-70B-Instruct model and the ReAct (Reasoning and Acting) framework.
16
+ I break down the task step-by-step until I reach an answer/solution.
17
+ Along the way I share my thoughts, actions (Python code blobs), and observations.
18
+ I come packed with pandas, numpy, sklearn, matplotlib, seaborn, and more!
__pycache__/streaming.cpython-311.pyc DELETED
Binary file (4.04 kB)
 
__pycache__/streaming.cpython-312.pyc DELETED
Binary file (3.43 kB)
 
__pycache__/test_streaming.cpython-311.pyc DELETED
Binary file (3.98 kB)
 
__pycache__/test_streaming.cpython-312.pyc DELETED
Binary file (3.43 kB)
 
__pycache__/test_streaming.cpython-39.pyc DELETED
Binary file (2.1 kB)
 
_config.yml DELETED
@@ -1,13 +0,0 @@
1
- title: Data Analyst # your name (or website title) here
2
- logo: "/images/logo.jpg?raw=true" # your photo (or logo) here
3
- description: > # your text below (remove <br> elements if you don't need line breaks)
4
- <br>
5
- I’m your personal Data Analyst built on top of Llama-3.1-70B and the ReAct agent framework. I break down your task step-by-step until I reach an answer/solution. Along the way I share my thoughts, actions (Python code blobs), and observations. I come packed with pandas, numpy, sklearn, matplotlib, seaborn, and more!
6
- <br><br>
7
- <a href="https://huggingface.co/spaces/dkondic/data-analyst">Try me on Hugging Face!</a>
8
-
9
- theme: jekyll-theme-minimal
10
- #google_analytics: UA-000000-0 # your Google Analytics tracking ID here
11
-
12
- colors:
13
- crimson: '#900C3F'
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app.py CHANGED
@@ -11,7 +11,7 @@ from gradio.data_classes import FileData
11
 
12
  login(os.getenv("HUGGINGFACEHUB_API_TOKEN"))
13
 
14
- llm_engine = HfEngine("meta-llama/Meta-Llama-3.1-70B-Instruct")
15
 
16
  agent = ReactCodeAgent(
17
  tools=[],
@@ -26,15 +26,7 @@ The data file is passed to you as the variable data_file, it is a pandas datafra
26
  DO NOT try to load data_file, it is already a dataframe pre-loaded in your python interpreter!
27
  When plotting using matplotlib/seaborn save the figures to the (already existing) folder'./figures/': take care to clear
28
  each figure with plt.clf() before doing another plot.
29
- When plotting make the plots as pretty as possible given your tools. Same with tables, charts, or anything else.
30
-
31
- In your final answer: summarize your findings and steps taken.
32
- After each number derive real worlds insights, for instance: "Correlation between is_december and boredness is 1.3453, which suggest people are more bored in winter".
33
- Your final answer should be a long string with at least 4 numbered and detailed parts:
34
- 1. Summary of Question/Problem
35
- 2. Summary of Actions
36
- 3. Summary of Findings
37
- 3. Potential Next Steps
38
 
39
  Use the data file to answer the question or perform a task below.
40
 
@@ -44,22 +36,7 @@ Structure of the data:
44
  Question/Problem:
45
  """
46
 
47
- example_notes="""This data is about the Titanic wreck in 1912.
48
- The target variable is the survival of passengers, noted by 'Survived'
49
- pclass: A proxy for socio-economic status (SES)
50
- 1st = Upper
51
- 2nd = Middle
52
- 3rd = Lower
53
- age: Age is fractional if less than 1. If the age is estimated, is it in the form of xx.5
54
- sibsp: The dataset defines family relations in this way...
55
- Sibling = brother, sister, stepbrother, stepsister
56
- Spouse = husband, wife (mistresses and fiancΓ©s were ignored)
57
- parch: The dataset defines family relations in this way...
58
- Parent = mother, father
59
- Child = daughter, son, stepdaughter, stepson
60
- Some children travelled only with a nanny, therefore parch=0 for them.
61
-
62
- Run a logistic regression."""
63
 
64
  def get_images_in_directory(directory):
65
  image_extensions = {'.png', '.jpg', '.jpeg', '.gif', '.bmp', '.tiff'}
@@ -117,24 +94,17 @@ with gr.Blocks(
117
  gr.Markdown("""# Data Analyst (ReAct Code Agent) πŸ“ŠπŸ€”
118
 
119
  **Who am I?**
120
- I'm your personal Data Analyst built on top of Llama-3.1-70B and the ReAct agent framework.
121
- I break down your task step-by-step until I reach an answer/solution.
122
  Along the way I share my thoughts, actions (Python code blobs), and observations.
123
  I come packed with pandas, numpy, sklearn, matplotlib, seaborn, and more!
124
 
125
  **Instructions**
126
  1. Drop or upload a `.csv` file below.
127
  2. Ask a question or give it a task.
128
- 3. **Watch Llama-3.1-70B think, act, and observe until final answer.
129
  \n**For an example, click on the example at the bottom of page to auto populate.**""")
130
- chatbot = gr.Chatbot(
131
- label="Data Analyst Agent",
132
- type="messages",
133
- avatar_images=(
134
- None,
135
- "https://em-content.zobj.net/source/twitter/53/robot-face_1f916.png",
136
- ),
137
- )
138
  file_input = gr.File(label="Drop/upload a .csv file to analyze")
139
  text_input = gr.Textbox(
140
  label="Ask a question or give it a task."
@@ -144,7 +114,16 @@ I come packed with pandas, numpy, sklearn, matplotlib, seaborn, and more!
144
  examples=[["./example/titanic.csv", example_notes]],
145
  inputs=[file_input, text_input],
146
  cache_examples=False,
147
- label='Click anywhere below to try this example.'
 
 
 
 
 
 
 
 
 
148
  )
149
 
150
  submit.click(interact_with_agent, [file_input, text_input], [chatbot])
 
11
 
12
  login(os.getenv("HUGGINGFACEHUB_API_TOKEN"))
13
 
14
+ llm_engine = HfEngine("meta-llama/Llama-3.3-70B-Instruct")
15
 
16
  agent = ReactCodeAgent(
17
  tools=[],
 
26
  DO NOT try to load data_file, it is already a dataframe pre-loaded in your python interpreter!
27
  When plotting using matplotlib/seaborn save the figures to the (already existing) folder'./figures/': take care to clear
28
  each figure with plt.clf() before doing another plot.
29
+ When plotting make the plots as visually appealing as possible. Same with tables, charts, or anything else.
 
 
 
 
 
 
 
 
30
 
31
  Use the data file to answer the question or perform a task below.
32
 
 
36
  Question/Problem:
37
  """
38
 
39
+ example_notes="""What is the survival rate by class?"""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40
 
41
  def get_images_in_directory(directory):
42
  image_extensions = {'.png', '.jpg', '.jpeg', '.gif', '.bmp', '.tiff'}
 
94
  gr.Markdown("""# Data Analyst (ReAct Code Agent) πŸ“ŠπŸ€”
95
 
96
  **Who am I?**
97
+ I'm your personal Data Analyst built on top of Llama-3.3-70B-Instruct model and the ReAct (Reasoning and Acting) framework.
98
+ I break down the task step-by-step until I reach an answer/solution.
99
  Along the way I share my thoughts, actions (Python code blobs), and observations.
100
  I come packed with pandas, numpy, sklearn, matplotlib, seaborn, and more!
101
 
102
  **Instructions**
103
  1. Drop or upload a `.csv` file below.
104
  2. Ask a question or give it a task.
105
+ 3. **Watch the AI Agent think, act, and observe until final answer.
106
  \n**For an example, click on the example at the bottom of page to auto populate.**""")
107
+
 
 
 
 
 
 
 
108
  file_input = gr.File(label="Drop/upload a .csv file to analyze")
109
  text_input = gr.Textbox(
110
  label="Ask a question or give it a task."
 
114
  examples=[["./example/titanic.csv", example_notes]],
115
  inputs=[file_input, text_input],
116
  cache_examples=False,
117
+ label='Click on an example below.'
118
+ )
119
+ chatbot = gr.Chatbot(
120
+ label="Data Analyst Agent",
121
+ type="messages",
122
+ avatar_images=(
123
+ None,
124
+ "https://em-content.zobj.net/source/twitter/53/robot-face_1f916.png",
125
+ ),
126
+ height = 1000
127
  )
128
 
129
  submit.click(interact_with_agent, [file_input, text_input], [chatbot])
app_original.py DELETED
@@ -1,133 +0,0 @@
1
- import os
2
- import shutil
3
- import gradio as gr
4
- from transformers import ReactCodeAgent, HfEngine, Tool
5
- import pandas as pd
6
-
7
- from gradio import Chatbot
8
- from streaming import stream_to_gradio
9
- from huggingface_hub import login
10
- from gradio.data_classes import FileData
11
-
12
- login(os.getenv("HUGGINGFACEHUB_API_TOKEN"))
13
-
14
- llm_engine = HfEngine("meta-llama/Meta-Llama-3.1-70B-Instruct")
15
-
16
- agent = ReactCodeAgent(
17
- tools=[],
18
- llm_engine=llm_engine,
19
- additional_authorized_imports=["numpy", "pandas", "matplotlib.pyplot", "seaborn", "scipy.stats"],
20
- max_iterations=10,
21
- )
22
-
23
- base_prompt = """You are an expert data analyst.
24
- According to the features you have and the data structure given below, determine which feature should be the target.
25
- Then list 3 interesting questions that could be asked on this data, for instance about specific correlations with target variable.
26
- Then answer these questions one by one, by finding the relevant numbers.
27
- Meanwhile, plot some figures using matplotlib/seaborn and save them to the (already existing) folder './figures/': take care to clear each figure with plt.clf() before doing another plot.
28
-
29
- In your final answer: summarize these correlations and trends
30
- After each number derive real worlds insights, for instance: "Correlation between is_december and boredness is 1.3453, which suggest people are more bored in winter".
31
- Your final answer should be a long string with at least 3 numbered and detailed parts.
32
-
33
- Structure of the data:
34
- {structure_notes}
35
-
36
- The data file is passed to you as the variable data_file, it is a pandas dataframe, you can use it directly.
37
- DO NOT try to load data_file, it is already a dataframe pre-loaded in your python interpreter!
38
- """
39
-
40
- example_notes="""This data is about the Titanic wreck in 1912.
41
- The target figure is the survival of passengers, notes by 'Survived'
42
- pclass: A proxy for socio-economic status (SES)
43
- 1st = Upper
44
- 2nd = Middle
45
- 3rd = Lower
46
- age: Age is fractional if less than 1. If the age is estimated, is it in the form of xx.5
47
- sibsp: The dataset defines family relations in this way...
48
- Sibling = brother, sister, stepbrother, stepsister
49
- Spouse = husband, wife (mistresses and fiancΓ©s were ignored)
50
- parch: The dataset defines family relations in this way...
51
- Parent = mother, father
52
- Child = daughter, son, stepdaughter, stepson
53
- Some children travelled only with a nanny, therefore parch=0 for them."""
54
-
55
- def get_images_in_directory(directory):
56
- image_extensions = {'.png', '.jpg', '.jpeg', '.gif', '.bmp', '.tiff'}
57
-
58
- image_files = []
59
- for root, dirs, files in os.walk(directory):
60
- for file in files:
61
- if os.path.splitext(file)[1].lower() in image_extensions:
62
- image_files.append(os.path.join(root, file))
63
- return image_files
64
-
65
- def interact_with_agent(file_input, additional_notes):
66
- shutil.rmtree("./figures")
67
- os.makedirs("./figures")
68
-
69
- data_file = pd.read_csv(file_input)
70
- data_structure_notes = f"""- Description (output of .describe()):
71
- {data_file.describe()}
72
- - Columns with dtypes:
73
- {data_file.dtypes}"""
74
-
75
- prompt = base_prompt.format(structure_notes=data_structure_notes)
76
-
77
- if additional_notes and len(additional_notes) > 0:
78
- prompt += "\nAdditional notes on the data:\n" + additional_notes
79
-
80
- messages = [gr.ChatMessage(role="user", content=prompt)]
81
- yield messages + [
82
- gr.ChatMessage(role="assistant", content="⏳ _Starting task..._")
83
- ]
84
-
85
- plot_image_paths = {}
86
- for msg in stream_to_gradio(agent, prompt, data_file=data_file):
87
- messages.append(msg)
88
- for image_path in get_images_in_directory("./figures"):
89
- if image_path not in plot_image_paths:
90
- image_message = gr.ChatMessage(
91
- role="assistant",
92
- content=FileData(path=image_path, mime_type="image/png"),
93
- )
94
- plot_image_paths[image_path] = True
95
- messages.append(image_message)
96
- yield messages + [
97
- gr.ChatMessage(role="assistant", content="⏳ _Still processing..._")
98
- ]
99
- yield messages
100
-
101
-
102
- with gr.Blocks(
103
- theme=gr.themes.Soft(
104
- primary_hue=gr.themes.colors.yellow,
105
- secondary_hue=gr.themes.colors.blue,
106
- )
107
- ) as demo:
108
- gr.Markdown("""# Llama-3.1 Data analyst πŸ“ŠπŸ€”
109
-
110
- Drop a `.csv` file below, add notes to describe this data if needed, and **Llama-3.1-70B will analyze the file content and draw figures for you!**""")
111
- file_input = gr.File(label="Your file to analyze")
112
- text_input = gr.Textbox(
113
- label="Additional notes to support the analysis"
114
- )
115
- submit = gr.Button("Run analysis!", variant="primary")
116
- chatbot = gr.Chatbot(
117
- label="Data Analyst Agent",
118
- type="messages",
119
- avatar_images=(
120
- None,
121
- "https://em-content.zobj.net/source/twitter/53/robot-face_1f916.png",
122
- ),
123
- )
124
- gr.Examples(
125
- examples=[["./example/titanic.csv", example_notes]],
126
- inputs=[file_input, text_input],
127
- cache_examples=False
128
- )
129
-
130
- submit.click(interact_with_agent, [file_input, text_input], [chatbot])
131
-
132
- if __name__ == "__main__":
133
- demo.launch()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
figures/survival_rate_by_age.png DELETED
Binary file (16.9 kB)
 
figures/survival_rate_by_class.png ADDED
figures/survival_rate_by_pclass.png DELETED
Binary file (16.4 kB)
 
figures/survival_rate_by_sex.png DELETED
Binary file (16.1 kB)
 
figures/survived_distribution.png DELETED
Binary file (12.6 kB)
 
images/logo.jpg DELETED
Binary file (7.19 kB)
 
index.md DELETED
@@ -1,3 +0,0 @@
1
- ## Agent Data Analyst
2
-
3
- I'm your personal Data Analyst built on top of Llama-3.1-70B and the ReAct agent framework. I break down your task step-by-step until I reach an answer/solution. Along the way I share my thoughts, actions (Python code blobs), and observations. I come packed with pandas, numpy, sklearn, matplotlib, seaborn, and more!
 
 
 
 
requirements.txt CHANGED
@@ -1,4 +1,6 @@
1
  transformers == 4.43.3
 
 
2
  matplotlib
3
  seaborn
4
  scikit-learn
 
1
  transformers == 4.43.3
2
+ pandas
3
+ numpy
4
  matplotlib
5
  seaborn
6
  scikit-learn
streaming.py CHANGED
@@ -61,4 +61,4 @@ def stream_to_gradio(agent: ReactAgent, task: str, **kwargs):
61
  content={"path": Output.output.to_string(), "mime_type": "audio/wav"},
62
  )
63
  else:
64
- yield ChatMessage(role="assistant", content=Output.output)
 
61
  content={"path": Output.output.to_string(), "mime_type": "audio/wav"},
62
  )
63
  else:
64
+ yield ChatMessage(role="assistant", content=str(Output.output))