Spaces:

dkondic
/

data-analyst

Running

App Files Files Community

Dacho688 commited on 10 days ago

Commit

fdd0e93

1 Parent(s): 9fbad84

Upgrade to llama 3.3 70B from 3.1

Browse files

Files changed (18) hide show

README.md +7 -3
__pycache__/streaming.cpython-311.pyc +0 -0
__pycache__/streaming.cpython-312.pyc +0 -0
__pycache__/test_streaming.cpython-311.pyc +0 -0
__pycache__/test_streaming.cpython-312.pyc +0 -0
__pycache__/test_streaming.cpython-39.pyc +0 -0
_config.yml +0 -13
app.py +17 -38
app_original.py +0 -133
figures/survival_rate_by_age.png +0 -0
figures/survival_rate_by_class.png +0 -0
figures/survival_rate_by_pclass.png +0 -0
figures/survival_rate_by_sex.png +0 -0
figures/survived_distribution.png +0 -0
images/logo.jpg +0 -0
index.md +0 -3
requirements.txt +2 -0
streaming.py +1 -1

README.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-title: Agent Data Analyst
 emoji: 🤔📊
 colorFrom: yellow
 colorTo: red
@@ -8,7 +8,11 @@ sdk_version: 4.38.1
 app_file: app.py
 pinned: false
 license: apache-2.0
-short_description: Need to analyze data? Let a Llama-3.1 agent do it for you!
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Data Analyst AI Agent
 emoji: 🤔📊
 colorFrom: yellow
 colorTo: red
 app_file: app.py
 pinned: false
 license: apache-2.0
+short_description: Need to analyze data? Let a Llama-3.3 AI agent do it for you!
 ---
+## Agent Data Analyst
+I'm your personal Data Analyst AI Agent built on top of Llama-3.3-70B-Instruct model and the ReAct (Reasoning and Acting) framework.
+I break down the task step-by-step until I reach an answer/solution.
+Along the way I share my thoughts, actions (Python code blobs), and observations.
+I come packed with pandas, numpy, sklearn, matplotlib, seaborn, and more!

__pycache__/streaming.cpython-311.pyc DELETED Viewed

Binary file (4.04 kB)

__pycache__/streaming.cpython-312.pyc DELETED Viewed

Binary file (3.43 kB)

__pycache__/test_streaming.cpython-311.pyc DELETED Viewed

Binary file (3.98 kB)

__pycache__/test_streaming.cpython-312.pyc DELETED Viewed

Binary file (3.43 kB)

__pycache__/test_streaming.cpython-39.pyc DELETED Viewed

Binary file (2.1 kB)

_config.yml DELETED Viewed

@@ -1,13 +0,0 @@
-title: Data Analyst # your name (or website title) here
-logo: "/images/logo.jpg?raw=true" # your photo (or logo) here
-description: > # your text below (remove <br> elements if you don't need line breaks)
-  <br>
-  I’m your personal Data Analyst built on top of Llama-3.1-70B and the ReAct agent framework. I break down your task step-by-step until I reach an answer/solution. Along the way I share my thoughts, actions (Python code blobs), and observations. I come packed with pandas, numpy, sklearn, matplotlib, seaborn, and more!
-  <br><br>
-  <a href="https://huggingface.co/spaces/dkondic/data-analyst">Try me on Hugging Face!</a>
-theme: jekyll-theme-minimal
-#google_analytics: UA-000000-0 # your Google Analytics tracking ID here
-colors:
-  crimson: '#900C3F'

app.py CHANGED Viewed

@@ -11,7 +11,7 @@ from gradio.data_classes import FileData
 login(os.getenv("HUGGINGFACEHUB_API_TOKEN"))
-llm_engine = HfEngine("meta-llama/Meta-Llama-3.1-70B-Instruct")
 agent = ReactCodeAgent(
     tools=[],
@@ -26,15 +26,7 @@ The data file is passed to you as the variable data_file, it is a pandas datafra
 DO NOT try to load data_file, it is already a dataframe pre-loaded in your python interpreter!
 When plotting using matplotlib/seaborn save the figures to the (already existing) folder'./figures/': take care to clear
 each figure with plt.clf() before doing another plot.
-When plotting make the plots as pretty as possible given your tools. Same with tables, charts, or anything else.
-In your final answer: summarize your findings and steps taken.
-After each number derive real worlds insights, for instance: "Correlation between is_december and boredness is 1.3453, which suggest people are more bored in winter".
-Your final answer should be a long string with at least 4 numbered and detailed parts:
-    1. Summary of Question/Problem
-    2. Summary of Actions
-    3. Summary of Findings
-    3. Potential Next Steps
 Use the data file to answer the question or perform a task below.
@@ -44,22 +36,7 @@ Structure of the data:
 Question/Problem:
 """
-example_notes="""This data is about the Titanic wreck in 1912.
-The target variable is the survival of passengers, noted by 'Survived'
-pclass: A proxy for socio-economic status (SES)
-1st = Upper
-2nd = Middle
-3rd = Lower
-age: Age is fractional if less than 1. If the age is estimated, is it in the form of xx.5
-sibsp: The dataset defines family relations in this way...
-Sibling = brother, sister, stepbrother, stepsister
-Spouse = husband, wife (mistresses and fiancés were ignored)
-parch: The dataset defines family relations in this way...
-Parent = mother, father
-Child = daughter, son, stepdaughter, stepson
-Some children travelled only with a nanny, therefore parch=0 for them.
-Run a logistic regression."""
 def get_images_in_directory(directory):
     image_extensions = {'.png', '.jpg', '.jpeg', '.gif', '.bmp', '.tiff'}
@@ -117,24 +94,17 @@ with gr.Blocks(
     gr.Markdown("""# Data Analyst (ReAct Code Agent) 📊🤔
 **Who am I?**
-I'm your personal Data Analyst built on top of Llama-3.1-70B and the ReAct agent framework.
-I break down your task step-by-step until I reach an answer/solution.
 Along the way I share my thoughts, actions (Python code blobs), and observations.
 I come packed with pandas, numpy, sklearn, matplotlib, seaborn, and more!
 **Instructions**
 1. Drop or upload a `.csv` file below.
 2. Ask a question or give it a task.
-3. **Watch Llama-3.1-70B think, act, and observe until final answer.
 \n**For an example, click on the example at the bottom of page to auto populate.**""")
-    chatbot = gr.Chatbot(
-        label="Data Analyst Agent",
-        type="messages",
-        avatar_images=(
-            None,
-            "https://em-content.zobj.net/source/twitter/53/robot-face_1f916.png",
-        ),
-    )
     file_input = gr.File(label="Drop/upload a .csv file to analyze")
     text_input = gr.Textbox(
         label="Ask a question or give it a task."
@@ -144,7 +114,16 @@ I come packed with pandas, numpy, sklearn, matplotlib, seaborn, and more!
         examples=[["./example/titanic.csv", example_notes]],
         inputs=[file_input, text_input],
         cache_examples=False,
-        label='Click anywhere below to try this example.'
     )
     submit.click(interact_with_agent, [file_input, text_input], [chatbot])

 login(os.getenv("HUGGINGFACEHUB_API_TOKEN"))
+llm_engine = HfEngine("meta-llama/Llama-3.3-70B-Instruct")
 agent = ReactCodeAgent(
     tools=[],
 DO NOT try to load data_file, it is already a dataframe pre-loaded in your python interpreter!
 When plotting using matplotlib/seaborn save the figures to the (already existing) folder'./figures/': take care to clear
 each figure with plt.clf() before doing another plot.
+When plotting make the plots as visually appealing as possible. Same with tables, charts, or anything else.
 Use the data file to answer the question or perform a task below.
 Question/Problem:
 """
+example_notes="""What is the survival rate by class?"""
 def get_images_in_directory(directory):
     image_extensions = {'.png', '.jpg', '.jpeg', '.gif', '.bmp', '.tiff'}
     gr.Markdown("""# Data Analyst (ReAct Code Agent) 📊🤔
 **Who am I?**
+I'm your personal Data Analyst built on top of Llama-3.3-70B-Instruct model and the ReAct (Reasoning and Acting) framework.
+I break down the task step-by-step until I reach an answer/solution.
 Along the way I share my thoughts, actions (Python code blobs), and observations.
 I come packed with pandas, numpy, sklearn, matplotlib, seaborn, and more!
 **Instructions**
 1. Drop or upload a `.csv` file below.
 2. Ask a question or give it a task.
+3. **Watch the AI Agent think, act, and observe until final answer.
 \n**For an example, click on the example at the bottom of page to auto populate.**""")
     file_input = gr.File(label="Drop/upload a .csv file to analyze")
     text_input = gr.Textbox(
         label="Ask a question or give it a task."
         examples=[["./example/titanic.csv", example_notes]],
         inputs=[file_input, text_input],
         cache_examples=False,
+        label='Click on an example below.'
+    )
+    chatbot = gr.Chatbot(
+        label="Data Analyst Agent",
+        type="messages",
+        avatar_images=(
+            None,
+            "https://em-content.zobj.net/source/twitter/53/robot-face_1f916.png",
+        ),
+        height = 1000
     )
     submit.click(interact_with_agent, [file_input, text_input], [chatbot])

app_original.py DELETED Viewed

@@ -1,133 +0,0 @@
-import os
-import shutil
-import gradio as gr
-from transformers import ReactCodeAgent, HfEngine, Tool
-import pandas as pd
-from gradio import Chatbot
-from streaming import stream_to_gradio
-from huggingface_hub import login
-from gradio.data_classes import FileData
-login(os.getenv("HUGGINGFACEHUB_API_TOKEN"))
-llm_engine = HfEngine("meta-llama/Meta-Llama-3.1-70B-Instruct")
-agent = ReactCodeAgent(
-    tools=[],
-    llm_engine=llm_engine,
-    additional_authorized_imports=["numpy", "pandas", "matplotlib.pyplot", "seaborn", "scipy.stats"],
-    max_iterations=10,
-)
-base_prompt = """You are an expert data analyst.
-According to the features you have and the data structure given below, determine which feature should be the target.
-Then list 3 interesting questions that could be asked on this data, for instance about specific correlations with target variable.
-Then answer these questions one by one, by finding the relevant numbers.
-Meanwhile, plot some figures using matplotlib/seaborn and save them to the (already existing) folder './figures/': take care to clear each figure with plt.clf() before doing another plot.
-In your final answer: summarize these correlations and trends
-After each number derive real worlds insights, for instance: "Correlation between is_december and boredness is 1.3453, which suggest people are more bored in winter".
-Your final answer should be a long string with at least 3 numbered and detailed parts.
-Structure of the data:
-{structure_notes}
-The data file is passed to you as the variable data_file, it is a pandas dataframe, you can use it directly.
-DO NOT try to load data_file, it is already a dataframe pre-loaded in your python interpreter!
-"""
-example_notes="""This data is about the Titanic wreck in 1912.
-The target figure is the survival of passengers, notes by 'Survived'
-pclass: A proxy for socio-economic status (SES)
-1st = Upper
-2nd = Middle
-3rd = Lower
-age: Age is fractional if less than 1. If the age is estimated, is it in the form of xx.5
-sibsp: The dataset defines family relations in this way...
-Sibling = brother, sister, stepbrother, stepsister
-Spouse = husband, wife (mistresses and fiancés were ignored)
-parch: The dataset defines family relations in this way...
-Parent = mother, father
-Child = daughter, son, stepdaughter, stepson
-Some children travelled only with a nanny, therefore parch=0 for them."""
-def get_images_in_directory(directory):
-    image_extensions = {'.png', '.jpg', '.jpeg', '.gif', '.bmp', '.tiff'}
-    image_files = []
-    for root, dirs, files in os.walk(directory):
-        for file in files:
-            if os.path.splitext(file)[1].lower() in image_extensions:
-                image_files.append(os.path.join(root, file))
-    return image_files
-def interact_with_agent(file_input, additional_notes):
-    shutil.rmtree("./figures")
-    os.makedirs("./figures")
-    data_file = pd.read_csv(file_input)
-    data_structure_notes = f"""- Description (output of .describe()):
-    {data_file.describe()}
-    - Columns with dtypes:
-    {data_file.dtypes}"""
-    prompt = base_prompt.format(structure_notes=data_structure_notes)
-    if additional_notes and len(additional_notes) > 0:
-        prompt += "\nAdditional notes on the data:\n" + additional_notes
-    messages = [gr.ChatMessage(role="user", content=prompt)]
-    yield messages + [
-        gr.ChatMessage(role="assistant", content="⏳ _Starting task..._")
-    ]
-    plot_image_paths = {}
-    for msg in stream_to_gradio(agent, prompt, data_file=data_file):
-        messages.append(msg)
-        for image_path in get_images_in_directory("./figures"):
-            if image_path not in plot_image_paths:
-                image_message = gr.ChatMessage(
-                    role="assistant",
-                    content=FileData(path=image_path, mime_type="image/png"),
-                )
-                plot_image_paths[image_path] = True
-                messages.append(image_message)
-        yield messages + [
-            gr.ChatMessage(role="assistant", content="⏳ _Still processing..._")
-        ]
-    yield messages
-with gr.Blocks(
-    theme=gr.themes.Soft(
-        primary_hue=gr.themes.colors.yellow,
-        secondary_hue=gr.themes.colors.blue,
-    )
-) as demo:
-    gr.Markdown("""# Llama-3.1 Data analyst 📊🤔
-Drop a `.csv` file below, add notes to describe this data if needed, and **Llama-3.1-70B will analyze the file content and draw figures for you!**""")
-    file_input = gr.File(label="Your file to analyze")
-    text_input = gr.Textbox(
-        label="Additional notes to support the analysis"
-    )
-    submit = gr.Button("Run analysis!", variant="primary")
-    chatbot = gr.Chatbot(
-        label="Data Analyst Agent",
-        type="messages",
-        avatar_images=(
-            None,
-            "https://em-content.zobj.net/source/twitter/53/robot-face_1f916.png",
-        ),
-    )
-    gr.Examples(
-        examples=[["./example/titanic.csv", example_notes]],
-        inputs=[file_input, text_input],
-        cache_examples=False
-    )
-    submit.click(interact_with_agent, [file_input, text_input], [chatbot])
-if __name__ == "__main__":
-    demo.launch()

figures/survival_rate_by_age.png DELETED Viewed

Binary file (16.9 kB)

figures/survival_rate_by_class.png ADDED Viewed

figures/survival_rate_by_pclass.png DELETED Viewed

Binary file (16.4 kB)

figures/survival_rate_by_sex.png DELETED Viewed

Binary file (16.1 kB)

figures/survived_distribution.png DELETED Viewed

Binary file (12.6 kB)

images/logo.jpg DELETED Viewed

Binary file (7.19 kB)

index.md DELETED Viewed

@@ -1,3 +0,0 @@
-## Agent Data Analyst
-I'm your personal Data Analyst built on top of Llama-3.1-70B and the ReAct agent framework. I break down your task step-by-step until I reach an answer/solution. Along the way I share my thoughts, actions (Python code blobs), and observations. I come packed with pandas, numpy, sklearn, matplotlib, seaborn, and more!

requirements.txt CHANGED Viewed

@@ -1,4 +1,6 @@
 transformers == 4.43.3
 matplotlib
 seaborn
 scikit-learn

 transformers == 4.43.3
+pandas
+numpy
 matplotlib
 seaborn
 scikit-learn

streaming.py CHANGED Viewed

@@ -61,4 +61,4 @@ def stream_to_gradio(agent: ReactAgent, task: str, **kwargs):
             content={"path": Output.output.to_string(), "mime_type": "audio/wav"},
         )
     else:
-        yield ChatMessage(role="assistant", content=Output.output)

             content={"path": Output.output.to_string(), "mime_type": "audio/wav"},
         )
     else:
+        yield ChatMessage(role="assistant", content=str(Output.output))