Spaces:

zjunlp
/

OneKE

Running

App Files Files Community

ShawnRu commited on Mar 15

Commit

e6e7506

1 Parent(s): c376793

update

Browse files

Files changed (42) hide show

.gitattributes +1 -2
.gitignore +2 -2
LICENSE +21 -0
data/Artificial_Intelligence_Wikipedia.txt +0 -46
data/Harry_Potter_Chapter1.pdf +0 -3
data/Tulsi_Gabbard_News.html +0 -0
examples/config/BookExtraction.yaml +15 -0
examples/config/EE.yaml +14 -0
examples/config/NER.yaml +13 -0
examples/config/NewsExtraction.yaml +15 -0
examples/config/RE.yaml +15 -0
examples/config/Triple2KG.yaml +21 -0
examples/example.py +17 -0
examples/results/BookExtraction.json +48 -0
examples/results/EE.json +13 -0
examples/results/NER.json +16 -0
examples/results/NewsExtraction.json +51 -0
examples/results/RE.json +9 -0
examples/results/TripleExtraction.json +156 -0
experiments/dataset_def.py +181 -0
experiments/run_ner.py +15 -0
experiments/run_re.py +10 -0
figs/logo.png +0 -0
figs/main.png +0 -0
requirements.txt +2 -1
src/config.yaml +2 -1
src/construct/__init__.py +1 -0
src/construct/convert.py +201 -0
src/models/llm_def.py +15 -16
src/models/prompt_example.py +12 -12
src/models/prompt_template.py +14 -14
src/models/vllm_serve.py +2 -3
src/modules/extraction_agent.py +28 -0
src/modules/knowledge_base/case_repository.py +22 -22
src/modules/knowledge_base/schema_repository.py +13 -0
src/modules/reflection_agent.py +4 -5
src/modules/schema_agent.py +12 -0
src/pipeline.py +18 -2
src/run.py +11 -2
src/utils/__init__.py +0 -1
src/utils/process.py +66 -29
src/webui.py +33 -12

.gitattributes CHANGED Viewed

@@ -32,5 +32,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.xz filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
-*tfevents* filter=lfs diff=lfs merge=lfs -text
-data/Harry_Potter_Chapter1.pdf filter=lfs diff=lfs merge=lfs -text

 *.xz filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text

.gitignore CHANGED Viewed

@@ -1,3 +1,3 @@
-local
 **/__pycache__
-*.pyc

 **/__pycache__
+*.pyc
+dev

LICENSE ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2025 ZJUNLP
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

data/Artificial_Intelligence_Wikipedia.txt DELETED Viewed

@@ -1,46 +0,0 @@
-In the 22nd century, rising sea levels from global warming
-have wiped out coastal cities and altered the world's climate.
-With the human population in decline, nations have created
-humanoid robots called mechas to fulfill various roles.
-In Madison, New Jersey, David, an 11-year-old prototype mecha
-child capable of love, is given to Henry Swinton and his wife
-Monica, whose son Martin is in suspended animation. Monica
-initially feels uncomfortable but warms to David after he is
-activated and imprinted. David befriends Teddy, Martin's robotic
-teddy bear.
-After Martin is cured and brought home, he goads David into
-cutting off a piece of Monica's hair. Later, David accidentally
-pokes Monica's eye with scissors. During a pool party, David
-reacts to being poked with a knife and both he and Martin fall
-into the pool. Martin is saved, but David is blamed.
-Henry convinces Monica to return David to his creators for
-destruction, but instead, she abandons him in the woods with
-Teddy. David, believing that becoming human will regain Monica's
-love, decides to find the Blue Fairy.
-David and Teddy are captured by the "Flesh Fair", where obsolete
-mechas are destroyed. David pleads for his life, and the audience
-allows him to escape with Gigolo Joe, a mecha framed for murder.
-They travel to Rouge City, where "Dr. Know", a holographic answer
-engine, directs them to the ruins of New York City and suggests
-that the Blue Fairy may help.
-David meets Professor Hobby, who shows him copies of himself,
-including female variants. Disheartened, David attempts suicide,
-but Joe rescues him. They find the Blue Fairy, which turns out
-to be a statue. David repeatedly asks the statue to turn him
-into a real boy until his power source is depleted.
-Two thousand years later, humanity is extinct and Manhattan is
-buried under ice. Mechas have evolved, and a group called the
-Specialists resurrect David and Teddy. They reconstruct the Swinton
-home from David's memories and explain that he cannot become human.
-However, they recreate Monica using genetic material from the
-strand of hair Teddy kept. Monica can live for only one day.
-David spends his happiest day with Monica, and as she falls asleep,
-she tells him she has always loved him. David lies down next to her
-and closes his eyes.

data/Harry_Potter_Chapter1.pdf DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:b9eb5104658f4d6ef8ff9b457f28f188b6aa1b201443719c501e462072eacf57
-size 163709

data/Tulsi_Gabbard_News.html DELETED Viewed

The diff for this file is too large to render. See raw diff

examples/config/BookExtraction.yaml ADDED Viewed

	@@ -0,0 +1,15 @@

+model:
+  # Recommend using ChatGPT or DeepSeek APIs for complex IE task.
+  category: ChatGPT # model category, chosen from ChatGPT, DeepSeek, LLaMA, Qwen, ChatGLM, MiniCPM, OneKE.
+  model_name_or_path: gpt-4o-mini # # model name, chosen from the model list of the selected category.
+  api_key: your_api_key # your API key for the model with API service. No need for open-source models.
+  base_url: https://api.openai.com/v1 # # base URL for the API service. No need for open-source models.
+extraction:
+  task: Base # task type, chosen from Base, NER, RE, EE.
+  instruction: Extract main characters and background setting from this chapter. # description for the task. No need for NER, RE, EE task.
+  use_file: true # whether to use a file for the input text. Default set to false.
+  file_path: ./data/input_files/Harry_Potter_Chapter1.pdf #  # path to the input file. No need if use_file is set to false.
+  mode: quick # extraction mode, chosen from quick, detailed, customized. Default set to quick. See src/config.yaml for more details.
+  update_case: false # whether to update the case repository. Default set to false.
+  show_trajectory: false # whether to display the extracted intermediate steps

examples/config/EE.yaml ADDED Viewed

	@@ -0,0 +1,14 @@

+model:
+  category: DeepSeek  # model category, chosen from ChatGPT, DeepSeek, LLaMA, Qwen, ChatGLM, MiniCPM, OneKE.
+  model_name_or_path: deepseek-chat # model name, chosen from the model list of the selected category.
+  api_key: your_api_key  # your API key for the model with API service. No need for open-source models.
+  base_url: https://api.deepseek.com # base URL for the API service. No need for open-source models.
+extraction:
+  task: EE # task type, chosen from Base, NER, RE, EE.
+  text: UConn Health , an academic medical center , says in a media statement that it identified approximately 326,000 potentially impacted individuals whose personal information was contained in the compromised email accounts. # input text for the extraction task. No need if use_file is set to true.
+  constraint: {"phishing": ["damage amount", "attack pattern", "tool", "victim", "place", "attacker", "purpose", "trusted entity", "time"], "data breach": ["damage amount", "attack pattern", "number of data", "number of victim", "tool", "compromised data", "victim", "place", "attacker", "purpose", "time"], "ransom": ["damage amount", "attack pattern", "payment method", "tool", "victim", "place", "attacker", "price", "time"], "discover vulnerability": ["vulnerable system", "vulnerability", "vulnerable system owner", "vulnerable system version", "supported platform", "common vulnerabilities and exposures", "capabilities", "time", "discoverer"], "patch vulnerability": ["vulnerable system", "vulnerability", "issues addressed", "vulnerable system version", "releaser", "supported platform", "common vulnerabilities and exposures", "patch number", "time", "patch"]} # Specified event type and the corresponding arguments for the event extraction task. Structured as a dictionary with the event type as the key and the list of arguments as the value. Default set to empty.
+  use_file: false # whether to use a file for the input text.
+  mode: standard # extraction mode, chosen from quick, detailed, customized. Default set to quick. See src/config.yaml for more details.
+  update_case: false # whether to update the case repository. Default set to false.
+  show_trajectory: false # whether to display the extracted intermediate steps

examples/config/NER.yaml ADDED Viewed

	@@ -0,0 +1,13 @@

+model:
+  category: LLaMA # model category, chosen from ChatGPT, DeepSeek, LLaMA, Qwen, ChatGLM, MiniCPM, OneKE.
+  model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct # model name to download from huggingface or use the local model path.
+  vllm_serve: false # whether to use the vllm. Default set to false.
+extraction:
+  task: NER  # task type, chosen from Base, NER, RE, EE.
+  text: Finally , every other year , ELRA organizes a major conference LREC , the International Language Resources and Evaluation Conference . # input text for the extraction task. No need if use_file is set to true.
+  constraint: ["algorithm", "conference", "else", "product", "task", "field", "metrics", "organization", "researcher", "program language", "country", "location", "person", "university"] # Specified entity types for the named entity recognition task. Default set to empty.
+  use_file: false # whether to use a file for the input text.
+  mode: quick # extraction mode, chosen from quick, detailed, customized. Default set to quick. See src/config.yaml for more details.
+  update_case: false # whether to update the case repository. Default set to false.
+  show_trajectory: false # whether to display the extracted intermediate steps

examples/config/NewsExtraction.yaml ADDED Viewed

	@@ -0,0 +1,15 @@

+model:
+  category: DeepSeek  # model category, chosen from ChatGPT, DeepSeek, LLaMA, Qwen, ChatGLM, MiniCPM, OneKE.
+  model_name_or_path: deepseek-chat # model name, chosen from the model list of the selected category.
+  api_key: your_api_key  # your API key for the model with API service. No need for open-source models.
+  base_url: https://api.deepseek.com # base URL for the API service. No need for open-source models.
+extraction:
+  task: Base # task type, chosen from Base, NER, RE, EE.
+  instruction: Extract key information from the given text. # description for the task. No need for NER, RE, EE task.
+  use_file: true # whether to use a file for the input text. Default set to false.
+  file_path: ./data/input_files/Tulsi_Gabbard_News.html # path to the input file. No need if use_file is set to false.
+  output_schema: NewsReport # output schema for the extraction task. Selected the from schema repository.
+  mode: customized # extraction mode, chosen from quick, detailed, customized. Default set to quick. See src/config.yaml for more details.
+  update_case: false # whether to update the case repository. Default set to false.
+  show_trajectory: false # whether to display the extracted intermediate steps

examples/config/RE.yaml ADDED Viewed

	@@ -0,0 +1,15 @@

+model:
+  category: ChatGPT # model category, chosen from ChatGPT, DeepSeek, LLaMA, Qwen, ChatGLM, MiniCPM, OneKE.
+  model_name_or_path: gpt-4o-mini # model name, chosen from the model list of the selected category.
+  api_key: your_api_key # your API key for the model with API service. No need for open-source models.
+  base_url: https://api.openai.com/v1 # base URL for the API service. No need for open-source models.
+extraction:
+  task: RE  # task type, chosen from Base, NER, RE, EE.
+  text: The aid group Doctors Without Borders said that since Saturday , more than 275 wounded people had been admitted and treated at Donka Hospital in the capital of Guinea , Conakry .  # input text for the extraction task. No need if use_file is set to true.
+  constraint: ["nationality", "country capital", "place of death", "children", "location contains", "place of birth", "place lived", "administrative division of country", "country of administrative divisions", "company", "neighborhood of", "company founders"] # Specified entity types for the named entity recognition task. Default set to empty.
+  truth: {"relation_list": [{"head": "Guinea", "tail": "Conakry", "relation": "country capital"}]} # Truth data for the relation extraction task. Structured as a dictionary with the list of relation tuples as the value. Required if set update_case to true.
+  use_file: false # whether to use a file for the input text.
+  mode: quick # extraction mode, chosen from quick, detailed, customized. Default set to quick. See src/config.yaml for more details.
+  update_case: true # whether to update the case repository. Default set to false.
+  show_trajectory: false # whether to display the extracted intermediate steps

examples/config/Triple2KG.yaml ADDED Viewed

	@@ -0,0 +1,21 @@

+model:
+  # Recommend using ChatGPT or DeepSeek APIs for complex Triple task.
+  category: ChatGPT # model category, chosen from ChatGPT, DeepSeek, LLaMA, Qwen, ChatGLM, MiniCPM, OneKE.
+  model_name_or_path: gpt-4o-mini # # model name, chosen from the model list of the selected category.
+  api_key: your_api_key # your API key for the model with API service. No need for open-source models.
+  base_url: https://api.openai.com/v1 # # base URL for the API service. No need for open-source models.
+extraction:
+  mode: quick # extraction mode, chosen from quick, detailed, customized. Default set to quick. See src/config.yaml for more details.
+  task: Triple  # task type, chosen from Base, NER, RE, EE. Now newly added task 'Triple'.
+  use_file: true # whether to use a file for the input text. Default set to false.
+  file_path: ./data/input_files/Artificial_Intelligence_Wikipedia.txt #  # path to the input file. No need if use_file is set to false.
+  constraint: [["Person", "Place", "Event", "Property"], ["Interpersonal", "Located", "Ownership", "Action"]] # Specified entity or relation types for Triple Extraction task. You can write 3 lists for subject, relation and object types. Or you can write 2 lists for entity and relation types. Or you can write 1 list for entity type only.
+  update_case: false # whether to update the case repository. Default set to false.
+  show_trajectory: false # whether to display the extracted intermediate steps
+# construct: # (Optional) If you want to construct a Knowledge Graph, you need to set the construct field, or you must delete this field.
+#   database: Neo4j # database type, now only support Neo4j.
+#   url: neo4j://localhost:7687 # your database URL，Neo4j's default port is 7687.
+#   username: your_username # your database username.
+#   password: "your_password" # your database password.

examples/example.py ADDED Viewed

	@@ -0,0 +1,17 @@

+import sys
+sys.path.append("./src")
+from models import *
+from pipeline import *
+import json
+# model configuration
+model = ChatGPT(model_name_or_path="your_model_name_or_path", api_key="your_api_key")
+pipeline = Pipeline(model)
+# extraction configuration
+Task = "NER"
+Text = "Finally , every other year , ELRA organizes a major conference LREC , the International Language Resources and Evaluation Conference."
+Constraint = ["nationality", "country capital", "place of death", "children", "location contains", "place of birth", "place lived", "administrative division of country", "country of administrative divisions", "company", "neighborhood of", "company founders"]
+# get extraction result
+result, trajectory, frontend_schema, frontend_res = pipeline.get_extract_result(task=Task, text=Text, constraint=Constraint, show_trajectory=True)

examples/results/BookExtraction.json ADDED Viewed

	@@ -0,0 +1,48 @@

+{
+    "main_characters": [
+      {
+        "name": "Mr. Dursley",
+        "description": "The director of a firm called Grunnings, a big, beefy man with hardly any neck and a large mustache."
+      },
+      {
+        "name": "Mrs. Dursley",
+        "description": "Thin and blonde, with nearly twice the usual amount of neck, spends time spying on neighbors."
+      },
+      {
+        "name": "Dudley Dursley",
+        "description": "The small son of Mr. and Mrs. Dursley, considered by them to be the finest boy anywhere."
+      },
+      {
+        "name": "Albus Dumbledore",
+        "description": "A tall, thin, and very old man with long silver hair and a purple cloak, who arrives mysteriously."
+      },
+      {
+        "name": "Professor McGonagall",
+        "description": "A severe-looking woman who can transform into a cat, wearing an emerald cloak."
+      },
+      {
+        "name": "Voldemort",
+        "description": "The dark wizard who has caused fear and chaos, but has mysteriously disappeared."
+      },
+      {
+        "name": "Harry Potter",
+        "description": "The young boy who survived Voldemort's attack, becoming a significant figure in the wizarding world."
+      },
+      {
+        "name": "Lily Potter",
+        "description": "Harry's mother, who is mentioned as having been killed by Voldemort."
+      },
+      {
+        "name": "James Potter",
+        "description": "Harry's father, who is mentioned as having been killed by Voldemort."
+      },
+      {
+        "name": "Hagrid",
+        "description": "A giant man who is caring and emotional about Harry's situation."
+      }
+    ],
+    "background_setting": {
+      "location": "Number four, Privet Drive, Suburban",
+      "time_period": "A dull, gray Tuesday morning, Late 20th Century"
+    }
+}

examples/results/EE.json ADDED Viewed

	@@ -0,0 +1,13 @@

+{
+    "event_list": [
+      {
+        "event_type": "data breach",
+        "event_trigger": "compromised",
+        "event_argument": {
+          "number of victim": 326000,
+          "compromised data": "personal information contained in email accounts",
+          "victim": "individuals whose personal information was compromised"
+        }
+      }
+    ]
+}

examples/results/NER.json ADDED Viewed

	@@ -0,0 +1,16 @@

+{
+    "entity_list": [
+      {
+        "name": "ELRA",
+        "type": "organization"
+      },
+      {
+        "name": "LREC",
+        "type": "conference"
+      },
+      {
+        "name": "International Language Resources and Evaluation Conference",
+        "type": "conference"
+      }
+    ]
+  }

examples/results/NewsExtraction.json ADDED Viewed

	@@ -0,0 +1,51 @@

+{
+    "title": "Who is Tulsi Gabbard? Meet Trump's pick for director of national intelligence",
+    "summary": "Tulsi Gabbard, President-elect Donald Trump\u2019s choice for director of national intelligence, could face a challenging Senate confirmation battle due to her lack of intelligence experience and controversial views.",
+    "publication_date": "December 4, 2024",
+    "keywords": [
+      "Tulsi Gabbard",
+      "Donald Trump",
+      "director of national intelligence",
+      "confirmation battle",
+      "intelligence agencies",
+      "Russia",
+      "Syria",
+      "Bashar al-Assad"
+    ],
+    "events": [
+      {
+        "name": "Tulsi Gabbard's nomination for director of national intelligence",
+        "people_involved": [
+          {
+            "name": "Tulsi Gabbard",
+            "identity": "Former U.S. Representative",
+            "role": "Nominee for director of national intelligence"
+          },
+          {
+            "name": "Donald Trump",
+            "identity": "President-elect",
+            "role": "Nominator"
+          },
+          {
+            "name": "Tammy Duckworth",
+            "identity": "Democratic Senator",
+            "role": "Critic of Gabbard's nomination"
+          },
+          {
+            "name": "Olivia Troye",
+            "identity": "Former national security official",
+            "role": "Commentator on Gabbard's potential impact"
+          }
+        ],
+        "process": "Gabbard's nomination is expected to lead to a Senate confirmation battle."
+      }
+    ],
+    "quotes": {
+      "Tammy Duckworth": "The U.S. intelligence community has identified her as having troubling relationships with America\u2019s foes, and so my worry is that she couldn\u2019t pass a background check.",
+      "Olivia Troye": "If Gabbard is confirmed, America\u2019s allies may not share as much information with the U.S."
+    },
+    "viewpoints": [
+      "Gabbard's lack of intelligence experience raises concerns about her ability to oversee 18 intelligence agencies.",
+      "Her past comments and meetings with foreign adversaries have led to accusations of being a national security risk."
+    ]
+  }

examples/results/RE.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+    "relation_list": [
+      {
+        "head": "Guinea",
+        "tail": "Conakry",
+        "relation": "country capital"
+      }
+    ]
+  }

examples/results/TripleExtraction.json ADDED Viewed

	@@ -0,0 +1,156 @@

+{
+    "triple_list": [
+        {
+            "head": "sea levels",
+            "head_type": "Property",
+            "relation": "wiped out",
+            "relation_type": "Action",
+            "tail": "coastal cities",
+            "tail_type": "Place"
+        },
+        {
+            "head": "nations",
+            "head_type": "Person",
+            "relation": "created",
+            "relation_type": "Action",
+            "tail": "mechas",
+            "tail_type": "Property"
+        },
+        {
+            "head": "David",
+            "head_type": "Person",
+            "relation": "given to",
+            "relation_type": "Ownership",
+            "tail": "Henry and Monica",
+            "tail_type": "Person"
+        },
+        {
+            "head": "Monica",
+            "head_type": "Person",
+            "relation": "feels uncomfortable",
+            "relation_type": "Interpersonal",
+            "tail": "David",
+            "tail_type": "Person"
+        },
+        {
+            "head": "David",
+            "head_type": "Person",
+            "relation": "befriends",
+            "relation_type": "Interpersonal",
+            "tail": "Teddy",
+            "tail_type": "Person"
+        },
+        {
+            "head": "Martin",
+            "head_type": "Person",
+            "relation": "goads",
+            "relation_type": "Action",
+            "tail": "David",
+            "tail_type": "Person"
+        },
+        {
+            "head": "David",
+            "head_type": "Person",
+            "relation": "blamed for",
+            "relation_type": "Action",
+            "tail": "incident",
+            "tail_type": "Event"
+        },
+        {
+            "head": "Monica",
+            "head_type": "Person",
+            "relation": "returns David to",
+            "relation_type": "Ownership",
+            "tail": "creators",
+            "tail_type": "Person"
+        },
+        {
+            "head": "David",
+            "head_type": "Person",
+            "relation": "decides to find",
+            "relation_type": "Action",
+            "tail": "Blue Fairy",
+            "tail_type": "Property"
+        },
+        {
+            "head": "David",
+            "head_type": "Person",
+            "relation": "pleads for",
+            "relation_type": "Action",
+            "tail": "his life",
+            "tail_type": "Event"
+        },
+        {
+            "head": "David",
+            "head_type": "Person",
+            "relation": "meets",
+            "relation_type": "Interpersonal",
+            "tail": "Professor Hobby",
+            "tail_type": "Person"
+        },
+        {
+            "head": "David",
+            "head_type": "Person",
+            "relation": "attempts",
+            "relation_type": "Action",
+            "tail": "suicide",
+            "tail_type": "Event"
+        },
+        {
+            "head": "Joe",
+            "head_type": "Person",
+            "relation": "rescues",
+            "relation_type": "Action",
+            "tail": "David",
+            "tail_type": "Person"
+        },
+        {
+            "head": "David",
+            "head_type": "Person",
+            "relation": "asks statue to turn him into",
+            "relation_type": "Action",
+            "tail": "real boy",
+            "tail_type": "Property"
+        },
+        {
+            "head": "humanity",
+            "head_type": "Person",
+            "relation": "is extinct",
+            "relation_type": "Action",
+            "tail": "future",
+            "tail_type": "Event"
+        },
+        {
+            "head": "Specialists",
+            "head_type": "Person",
+            "relation": "resurrect",
+            "relation_type": "Action",
+            "tail": "David and Teddy",
+            "tail_type": "Person"
+        },
+        {
+            "head": "Monica",
+            "head_type": "Person",
+            "relation": "can live for",
+            "relation_type": "Property",
+            "tail": "one day",
+            "tail_type": "Property"
+        },
+        {
+            "head": "David",
+            "head_type": "Person",
+            "relation": "spends",
+            "relation_type": "Action",
+            "tail": "happiest day with Monica",
+            "tail_type": "Event"
+        },
+        {
+            "head": "Monica",
+            "head_type": "Person",
+            "relation": "tells",
+            "relation_type": "Interpersonal",
+            "tail": "David",
+            "tail_type": "Person"
+        }
+    ]
+}

experiments/dataset_def.py ADDED Viewed

	@@ -0,0 +1,181 @@

+import os
+import json
+import random
+from utils import *
+from pipeline import *
+current_dir = os.path.dirname(os.path.abspath(__file__))
+DATA_DIR = os.path.join(current_dir, "../data/datasets")
+OUTPUT_DIR = os.path.join(current_dir, "results")
+class BaseDataset:
+    def __init__(self):
+        pass
+    def __getitem__(self, idx):
+        return None
+    def __len__(self):
+        return None
+    def evaluate(self, idx, answer):
+        return None
+class NERDataset(BaseDataset):
+    def __init__(self, name=None, task="NER", data_dir = f"{DATA_DIR}/CrossNER", output_dir = f"{OUTPUT_DIR}", train=False):
+        self.name = name
+        self.task = task
+        self.data_dir = data_dir
+        self.output_dir = output_dir
+        self.test_file = json.load(open(f"{data_dir}/train.json")) if train else json.load(open(f"{data_dir}/test.json"))
+        self.schema = str(json.load(open(f"{data_dir}/class.json")))
+        self.retry = 2
+    def evaluate(self, llm: BaseEngine, mode="", sample=None, random_sample=False, update_case=False):
+        # initialize
+        sample = len(self.test_file) if sample is None else sample
+        if random_sample:
+            test_file = random.sample(self.test_file, sample)
+        else:
+            test_file = self.test_file[:sample]
+        total_precision, total_recall, total_f1 = 0, 0, 0
+        num_items = 0
+        output_path = f"{self.output_dir}/{self.name}_{self.task}_{mode}_{llm.name}_sample{sample}.jsonl"
+        print("Results will be saved to: ", output_path)
+        # predict and evaluate
+        pipeline = Pipeline(llm=llm)
+        for item in test_file:
+            try:
+                # get prediction
+                num_items += 1
+                truth = list(item.items())[1]
+                truth = {truth[0]: truth[1]}
+                pred_set = set()
+                for attempt in range(self.retry):
+                    pred_result, pred_detailed, _, _ = pipeline.get_extract_result(task=self.task, text=item['sentence'], constraint=self.schema, mode=mode, truth=truth, update_case=update_case)
+                    try:
+                        pred_result = pred_result['entity_list']
+                        pred_set = dict_list_to_set(pred_result)
+                        break
+                    except Exception as e:
+                        print(f"Failed to parse result: {pred_result}, retrying... Exception: {e}")
+                # evaluate
+                truth_result = item["entity_list"]
+                truth_set = dict_list_to_set(truth_result)
+                print(truth_set)
+                print(pred_set)
+                precision, recall, f1_score = calculate_metrics(truth_set, pred_set)
+                total_precision += precision
+                total_recall += recall
+                total_f1 += f1_score
+                pred_detailed["pred"] = pred_result
+                pred_detailed["truth"] = truth_result
+                pred_detailed["metrics"] = {"precision": precision, "recall": recall, "f1_score": f1_score}
+                res_detailed = {"id": num_items}
+                res_detailed.update(pred_detailed)
+                with open(output_path, 'a') as file:
+                    file.write(json.dumps(res_detailed) + '\n')
+            except Exception as e:
+                print(f"Exception occured: {e}")
+                print(f"idx: {num_items}")
+                pass
+        # calculate overall metrics
+        if num_items > 0:
+            avg_precision = total_precision / num_items
+            avg_recall = total_recall / num_items
+            avg_f1 = total_f1 / num_items
+            overall_metrics = {
+                "total_items": num_items,
+                "average_precision": avg_precision,
+                "average_recall": avg_recall,
+                "average_f1_score": avg_f1
+            }
+            with open(output_path, 'a') as file:
+                file.write(json.dumps(overall_metrics) + '\n\n')
+            print(f"Overall Metrics:\nTotal Items: {num_items}\nAverage Precision: {avg_precision:.4f}\nAverage Recall: {avg_recall:.4f}\nAverage F1 Score: {avg_f1:.4f}")
+        else:
+            print("No items processed.")
+class REDataset(BaseDataset):
+    def __init__(self, name=None, task="RE", data_dir = f"{DATA_DIR}/NYT11", output_dir = f"{OUTPUT_DIR}", train=False):
+        self.name = name
+        self.task = task
+        self.data_dir = data_dir
+        self.output_dir = output_dir
+        self.test_file = json.load(open(f"{data_dir}/train.json")) if train else json.load(open(f"{data_dir}/test.json"))
+        self.schema = str(json.load(open(f"{data_dir}/class.json")))
+        self.retry = 2
+    def evaluate(self, llm: BaseEngine, mode="", sample=None, random_sample=False, update_case=False):
+        # initialize
+        sample = len(self.test_file) if sample is None else sample
+        if random_sample:
+            test_file = random.sample(self.test_file, sample)
+        else:
+            test_file = self.test_file[:sample]
+        total_precision, total_recall, total_f1 = 0, 0, 0
+        num_items = 0
+        output_path = f"{self.output_dir}/{self.name}_{self.task}_{mode}_{llm.name}_sample{sample}.jsonl"
+        print("Results will be saved to: ", output_path)
+        # predict and evaluate
+        pipeline = Pipeline(llm=llm)
+        for item in test_file:
+            try:
+                # get prediction
+                num_items += 1
+                truth = list(item.items())[1]
+                truth = {truth[0]: truth[1]}
+                pred_set = set()
+                for attempt in range(self.retry):
+                    pred_result, pred_detailed, _, _ = pipeline.get_extract_result(task=self.task, text=item['sentence'], constraint=self.schema, mode=mode, truth=truth, update_case=update_case)
+                    try:
+                        pred_result = pred_result['relation_list']
+                        pred_set = dict_list_to_set(pred_result)
+                        break
+                    except Exception as e:
+                        print(f"Failed to parse result: {pred_result}, retrying... Exception: {e}")
+                # evaluate
+                truth_result = item["relation_list"]
+                truth_set = dict_list_to_set(truth_result)
+                print(truth_set)
+                print(pred_set)
+                precision, recall, f1_score = calculate_metrics(truth_set, pred_set)
+                total_precision += precision
+                total_recall += recall
+                total_f1 += f1_score
+                pred_detailed["pred"] = pred_result
+                pred_detailed["truth"] = truth_result
+                pred_detailed["metrics"] = {"precision": precision, "recall": recall, "f1_score": f1_score}
+                res_detailed = {"id": num_items}
+                res_detailed.update(pred_detailed)
+                with open(output_path, 'a') as file:
+                    file.write(json.dumps(res_detailed) + '\n')
+            except Exception as e:
+                print(f"Exception occured: {e}")
+                print(f"idx: {num_items}")
+                pass
+        # calculate overall metrics
+        if num_items > 0:
+            avg_precision = total_precision / num_items
+            avg_recall = total_recall / num_items
+            avg_f1 = total_f1 / num_items
+            overall_metrics = {
+                "total_items": num_items,
+                "average_precision": avg_precision,
+                "average_recall": avg_recall,
+                "average_f1_score": avg_f1
+            }
+            with open(output_path, 'a') as file:
+                file.write(json.dumps(overall_metrics) + '\n\n')
+            print(f"Overall Metrics:\nTotal Items: {num_items}\nAverage Precision: {avg_precision:.4f}\nAverage Recall: {avg_recall:.4f}\nAverage F1 Score: {avg_f1:.4f}")
+        else:
+            print("No items processed.")

experiments/run_ner.py ADDED Viewed

	@@ -0,0 +1,15 @@

+import sys
+sys.path.append("./src")
+from models import *
+from dataset_def import *
+name = "crossner-"
+data_dir = "./data/datasets/CrossNER/"
+model = ChatGPT(model_name_or_path="gpt-4o-mini", api_key="your_api_key", base_url=" https://api.openai.com/v1")
+tasklist = ["ai", "literature", "music", "politics", "science"]
+for task in tasklist:
+    task_name = name + task
+    task_data_dir = data_dir + task
+    dataset = NERDataset(name=task_name, data_dir=task_data_dir)
+    mode = "quick"
+    f1_score = dataset.evaluate(llm=model, mode=mode)
+    print(f"Task: {task_name}, f1_score: {f1_score}")

experiments/run_re.py ADDED Viewed

	@@ -0,0 +1,10 @@

+import sys
+sys.path.append("./src")
+from models import *
+from dataset_def import *
+data_dir = "./data/datasets/NYT11/"
+model = LLaMA("meta-llama/Meta-Llama-3-8B-Instruct")
+dataset = REDataset(name="NYT11", data_dir=data_dir)
+f1_score = dataset.evaluate(llm=model, mode="quick")
+print("f1_score: ", f1_score)

figs/logo.png ADDED Viewed

figs/main.png ADDED Viewed

requirements.txt CHANGED Viewed

@@ -15,4 +15,5 @@ sentencepiece==0.2.0
 protobuf==5.29.3
 bitsandbytes==0.45.0
 vllm==0.6.0
-gradio==4.44.0

 protobuf==5.29.3
 bitsandbytes==0.45.0
 vllm==0.6.0
+gradio==4.44.0
+neo4j==5.28.1

src/config.yaml CHANGED Viewed

@@ -6,6 +6,7 @@ agent:
   default_ner: Extract the Named Entities in the given text.
   default_re: Extract Relationships between Named Entities in the given text.
   default_ee: Extract the Events in the given text.
   chunk_token_limit: 1024
   mode:
     quick:
@@ -16,5 +17,5 @@ agent:
       extraction_agent: extract_information_with_case
       reflection_agent: reflect_with_case
     customized:
-      schema_agent: get_retrieved_schema
       extraction_agent: extract_information_direct

   default_ner: Extract the Named Entities in the given text.
   default_re: Extract Relationships between Named Entities in the given text.
   default_ee: Extract the Events in the given text.
+  default_triple: Extract the Triples (subject, relation, object) from the given text, hope that all the relationships for each entity can be extracted.
   chunk_token_limit: 1024
   mode:
     quick:
       extraction_agent: extract_information_with_case
       reflection_agent: reflect_with_case
     customized:
+      schema_agent:  get_retrieved_schema
       extraction_agent: extract_information_direct

src/construct/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ from .convert import *

src/construct/convert.py ADDED Viewed

	@@ -0,0 +1,201 @@

+import json
+import re
+from neo4j import GraphDatabase
+def sanitize_string(input_str, max_length=255):
+    """
+    Process the input string to ensure it meets the database requirements.
+    """
+    # step1: Replace invalid characters
+    input_str = re.sub(r'[^a-zA-Z0-9_]', '_', input_str)
+    # step2: Add prefix if it starts with a digit
+    if input_str[0].isdigit():
+        input_str = 'num' + input_str
+    # step3: Limit length
+    if len(input_str) > max_length:
+        input_str = input_str[:max_length]
+    return input_str
+def generate_cypher_statements(data):
+    """
+    Generates Cypher query statements based on the provided JSON data.
+    """
+    cypher_statements = []
+    parsed_data = json.loads(data)
+    def create_statement(triple):
+        head = triple.get("head")
+        head_type = triple.get("head_type")
+        relation = triple.get("relation")
+        relation_type = triple.get("relation_type")
+        tail = triple.get("tail")
+        tail_type = triple.get("tail_type")
+        # head_safe = sanitize_string(head) if head else None
+        head_type_safe = sanitize_string(head_type) if head_type else None
+        # relation_safe = sanitize_string(relation) if relation else None
+        relation_type_safe = sanitize_string(relation_type) if relation_type else None
+        # tail_safe = sanitize_string(tail) if tail else None
+        tail_type_safe = sanitize_string(tail_type) if tail_type else None
+        statement = ""
+        if head:
+            if head_type_safe:
+                statement += f'MERGE (a:{head_type_safe} {{name: "{head}"}}) '
+            else:
+                statement += f'MERGE (a:UNTYPED {{name: "{head}"}}) '
+        if tail:
+            if tail_type_safe:
+                statement += f'MERGE (b:{tail_type_safe} {{name: "{tail}"}}) '
+            else:
+                statement += f'MERGE (b:UNTYPED {{name: "{tail}"}}) '
+        if relation:
+            if head and tail: # Only create relation if head and tail exist.
+                if relation_type_safe:
+                    statement += f'MERGE (a)-[:{relation_type_safe} {{name: "{relation}"}}]->(b);'
+                else:
+                    statement += f'MERGE (a)-[:UNTYPED {{name: "{relation}"}}]->(b);'
+            else:
+                statement += ';' if statement != "" else ''
+        else:
+            if relation_type_safe: # if relation is not provided, create relation by `relation_type`.
+                statement += f'MERGE (a)-[:{relation_type_safe} {{name: "{relation_type_safe}"}}]->(b);'
+            else:
+                statement += ';' if statement != "" else ''
+        return statement
+    if "triple_list" in parsed_data:
+        for triple in parsed_data["triple_list"]:
+            cypher_statements.append(create_statement(triple))
+    else:
+        cypher_statements.append(create_statement(parsed_data))
+    return cypher_statements
+def execute_cypher_statements(uri, user, password, cypher_statements):
+    """
+    Executes the generated Cypher query statements.
+    """
+    driver = GraphDatabase.driver(uri, auth=(user, password))
+    with driver.session() as session:
+        for statement in cypher_statements:
+            session.run(statement)
+            print(f"Executed: {statement}")
+    # Write excuted cypher statements to a text file if you want.
+    # with open("executed_statements.txt", 'a') as f:
+    #     for statement in cypher_statements:
+    #         f.write(statement + '\n')
+    #     f.write('\n')
+    driver.close()
+# Here is a test of your database connection:
+if __name__ == "__main__":
+    # test_data 1: Contains a list of triples
+    test_data = '''
+    {
+        "triple_list": [
+            {
+                "head": "J.K. Rowling",
+                "head_type": "Person",
+                "relation": "wrote",
+                "relation_type": "Actions",
+                "tail": "Fantastic Beasts and Where to Find Them",
+                "tail_type": "Book"
+            },
+            {
+                "head": "Fantastic Beasts and Where to Find Them",
+                "head_type": "Book",
+                "relation": "extra section of",
+                "relation_type": "Affiliation",
+                "tail": "Harry Potter Series",
+                "tail_type": "Book"
+            },
+            {
+                "head": "J.K. Rowling",
+                "head_type": "Person",
+                "relation": "wrote",
+                "relation_type": "Actions",
+                "tail": "Harry Potter Series",
+                "tail_type": "Book"
+            },
+            {
+                "head": "Harry Potter Series",
+                "head_type": "Book",
+                "relation": "create",
+                "relation_type": "Actions",
+                "tail": "Dumbledore",
+                "tail_type": "Person"
+            },
+            {
+                "head": "Fantastic Beasts and Where to Find Them",
+                "head_type": "Book",
+                "relation": "mention",
+                "relation_type": "Actions",
+                "tail": "Dumbledore",
+                "tail_type": "Person"
+            },
+            {
+                "head": "Voldemort",
+                "head_type": "Person",
+                "relation": "afrid",
+                "relation_type": "Emotion",
+                "tail": "Dumbledore",
+                "tail_type": "Person"
+            },
+            {
+                "head": "Voldemort",
+                "head_type": "Person",
+                "relation": "robs",
+                "relation_type": "Actions",
+                "tail": "the Elder Wand",
+                "tail_type": "Weapon"
+            },
+            {
+                "head": "the Elder Wand",
+                "head_type": "Weapon",
+                "relation": "belong to",
+                "relation_type": "Affiliation",
+                "tail": "Dumbledore",
+                "tail_type": "Person"
+            }
+        ]
+    }
+    '''
+    # test_data 2: Contains a single triple
+    # test_data = '''
+    # {
+    #     "head": "Christopher Nolan",
+    #     "head_type": "Person",
+    #     "relation": "directed",
+    #     "relation_type": "Action",
+    #     "tail": "Inception",
+    #     "tail_type": "Movie"
+    # }
+    # '''
+    # Generate Cypher query statements
+    cypher_statements = generate_cypher_statements(test_data)
+    # Print the generated Cypher query statements
+    for statement in cypher_statements:
+        print(statement)
+    print("\n")
+    # Execute the generated Cypher query statements
+    execute_cypher_statements(
+        uri="neo4j://localhost:7687", # your URI
+        user="your_username", # your username
+        password="your_password", # your password
+        cypher_statements=cypher_statements,
+    )

src/models/llm_def.py CHANGED Viewed

@@ -22,7 +22,7 @@ class BaseEngine:
         self.top_p = 0.9
         self.max_tokens = 1024
         self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
     def get_chat_response(self, prompt):
         raise NotImplementedError
@@ -30,7 +30,7 @@ class BaseEngine:
         self.temperature = temperature
         self.top_p = top_p
         self.max_tokens = max_tokens
 class LLaMA(BaseEngine):
     def __init__(self, model_name_or_path: str):
         super().__init__(model_name_or_path)
@@ -61,7 +61,7 @@ class LLaMA(BaseEngine):
             top_p=self.top_p,
         )
         return outputs[0]["generated_text"][-1]['content'].strip()
 class Qwen(BaseEngine):
     def __init__(self, model_name_or_path: str):
         super().__init__(model_name_or_path)
@@ -72,7 +72,7 @@ class Qwen(BaseEngine):
             torch_dtype="auto",
             device_map="auto"
         )
     def get_chat_response(self, prompt):
         messages = [
             {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
@@ -94,7 +94,7 @@ class Qwen(BaseEngine):
             output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
         ]
         response = self.tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0].strip()
         return response
 class MiniCPM(BaseEngine):
@@ -125,7 +125,7 @@ class MiniCPM(BaseEngine):
             model_outputs[i][len(model_inputs[i]):] for i in range(len(model_inputs))
         ]
         response = self.tokenizer.batch_decode(output_token_ids, skip_special_tokens=True)[0].strip()
         return response
 class ChatGLM(BaseEngine):
@@ -155,7 +155,7 @@ class ChatGLM(BaseEngine):
         )
         model_outputs = model_outputs[:, model_inputs['input_ids'].shape[1]:]
         response = self.tokenizer.batch_decode(model_outputs, skip_special_tokens=True)[0].strip()
         return response
 class OneKE(BaseEngine):
@@ -164,7 +164,7 @@ class OneKE(BaseEngine):
         self.name = "OneKE"
         self.model_id = model_name_or_path
         config = AutoConfig.from_pretrained(self.model_id, trust_remote_code=True)
-        quantization_config=BitsAndBytesConfig(
             load_in_4bit=True,
             llm_int8_threshold=6.0,
             llm_int8_has_fp16_weight=False,
@@ -175,12 +175,12 @@ class OneKE(BaseEngine):
         self.model = AutoModelForCausalLM.from_pretrained(
             self.model_id,
             config=config,
-            device_map="auto",
             quantization_config=quantization_config,
             torch_dtype=torch.bfloat16,
             trust_remote_code=True,
         )
     def get_chat_response(self, prompt):
         system_prompt = '<<SYS>>\nYou are a helpful assistant. 你是一个乐于助人的助手。\n<</SYS>>\n\n'
         sintruct = '[INST] ' + system_prompt + prompt + '[/INST]'
@@ -191,9 +191,9 @@ class OneKE(BaseEngine):
         generation_output = generation_output.sequences[0]
         generation_output = generation_output[input_length:]
         response = self.tokenizer.decode(generation_output, skip_special_tokens=True)
         return response
 class ChatGPT(BaseEngine):
     def __init__(self, model_name_or_path: str, api_key: str, base_url=openai.base_url):
         self.name = "ChatGPT"
@@ -207,7 +207,7 @@ class ChatGPT(BaseEngine):
         else:
             self.api_key = os.environ["OPENAI_API_KEY"]
         self.client = OpenAI(api_key=self.api_key, base_url=self.base_url)
     def get_chat_response(self, input):
         response = self.client.chat.completions.create(
             model=self.model,
@@ -234,7 +234,7 @@ class DeepSeek(BaseEngine):
         else:
             self.api_key = os.environ["DEEPSEEK_API_KEY"]
         self.client = OpenAI(api_key=self.api_key, base_url=self.base_url)
     def get_chat_response(self, input):
         response = self.client.chat.completions.create(
             model=self.model,
@@ -258,7 +258,7 @@ class LocalServer(BaseEngine):
         self.max_tokens = 1024
         self.api_key = "EMPTY_API_KEY"
         self.client = OpenAI(api_key=self.api_key, base_url=self.base_url)
     def get_chat_response(self, input):
         try:
             response = self.client.chat.completions.create(
@@ -276,4 +276,3 @@ class LocalServer(BaseEngine):
             print("Error: Unable to connect to the server. Please check if the vllm service is running and the port is 8080.")
         except Exception as e:
             print(f"Error: {e}")

         self.top_p = 0.9
         self.max_tokens = 1024
         self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
     def get_chat_response(self, prompt):
         raise NotImplementedError
         self.temperature = temperature
         self.top_p = top_p
         self.max_tokens = max_tokens
 class LLaMA(BaseEngine):
     def __init__(self, model_name_or_path: str):
         super().__init__(model_name_or_path)
             top_p=self.top_p,
         )
         return outputs[0]["generated_text"][-1]['content'].strip()
 class Qwen(BaseEngine):
     def __init__(self, model_name_or_path: str):
         super().__init__(model_name_or_path)
             torch_dtype="auto",
             device_map="auto"
         )
     def get_chat_response(self, prompt):
         messages = [
             {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
             output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
         ]
         response = self.tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0].strip()
         return response
 class MiniCPM(BaseEngine):
             model_outputs[i][len(model_inputs[i]):] for i in range(len(model_inputs))
         ]
         response = self.tokenizer.batch_decode(output_token_ids, skip_special_tokens=True)[0].strip()
         return response
 class ChatGLM(BaseEngine):
         )
         model_outputs = model_outputs[:, model_inputs['input_ids'].shape[1]:]
         response = self.tokenizer.batch_decode(model_outputs, skip_special_tokens=True)[0].strip()
         return response
 class OneKE(BaseEngine):
         self.name = "OneKE"
         self.model_id = model_name_or_path
         config = AutoConfig.from_pretrained(self.model_id, trust_remote_code=True)
+        quantization_config=BitsAndBytesConfig(
             load_in_4bit=True,
             llm_int8_threshold=6.0,
             llm_int8_has_fp16_weight=False,
         self.model = AutoModelForCausalLM.from_pretrained(
             self.model_id,
             config=config,
+            device_map="auto",
             quantization_config=quantization_config,
             torch_dtype=torch.bfloat16,
             trust_remote_code=True,
         )
     def get_chat_response(self, prompt):
         system_prompt = '<<SYS>>\nYou are a helpful assistant. 你是一个乐于助人的助手。\n<</SYS>>\n\n'
         sintruct = '[INST] ' + system_prompt + prompt + '[/INST]'
         generation_output = generation_output.sequences[0]
         generation_output = generation_output[input_length:]
         response = self.tokenizer.decode(generation_output, skip_special_tokens=True)
         return response
 class ChatGPT(BaseEngine):
     def __init__(self, model_name_or_path: str, api_key: str, base_url=openai.base_url):
         self.name = "ChatGPT"
         else:
             self.api_key = os.environ["OPENAI_API_KEY"]
         self.client = OpenAI(api_key=self.api_key, base_url=self.base_url)
     def get_chat_response(self, input):
         response = self.client.chat.completions.create(
             model=self.model,
         else:
             self.api_key = os.environ["DEEPSEEK_API_KEY"]
         self.client = OpenAI(api_key=self.api_key, base_url=self.base_url)
     def get_chat_response(self, input):
         response = self.client.chat.completions.create(
             model=self.model,
         self.max_tokens = 1024
         self.api_key = "EMPTY_API_KEY"
         self.client = OpenAI(api_key=self.api_key, base_url=self.base_url)
     def get_chat_response(self, input):
         try:
             response = self.client.chat.completions.create(
             print("Error: Unable to connect to the server. Please check if the vllm service is running and the port is 8080.")
         except Exception as e:
             print(f"Error: {e}")

src/models/prompt_example.py CHANGED Viewed

@@ -2,7 +2,7 @@ json_schema_examples = """
 **Task**: Please extract all economic policies affecting the stock market between 2015 and 2023 and the exact dates of their implementation.
 **Text**: This text is from the field of Economics and represents the genre of Article.
 ...(example text)...
-**Output Schema**:
 {
   "economic_policies": [
       {
@@ -31,9 +31,9 @@ Example3:
 **Text**: This text is from the field of Political and represents the genre of News Report.
 ...(example text)...
 **Output Schema**:
-Answer:
 {
-  "news_report":
     {
       "title": null,
       "summary": null,
@@ -56,9 +56,9 @@ Answer:
 """
 code_schema_examples = """
-Example1:
 **Task**: Extract all the entities in the given text.
-**Text**:
 ...(example text)...
 **Output Schema**:
 ```python
@@ -68,12 +68,12 @@ from pydantic import BaseModel, Field
 class Entity(BaseModel):
     label : str = Field(description="The type or category of the entity, such as 'Process', 'Technique', 'Data Structure', 'Methodology', 'Person', etc. ")
     name : str = Field(description="The specific name of the entity. It should represent a single, distinct concept and must not be an empty string. For example, if the entity is a 'Technique', the name could be 'Neural Networks'.")
 class ExtractionTarget(BaseModel):
     entity_list : List[Entity] = Field(description="All the entities presented in the context. The entities should encode ONE concept.")
 ```
-Example2:
 **Task**: Extract all the information in the given text.
 **Text**: This text is from the field of Political and represents the genre of News Article.
 ...(example text)...
@@ -95,7 +95,7 @@ class Event(BaseModel):
     process: Optional[str] = Field(description="Details of the event process")
     result: Optional[str] = Field(default=None, description="Result or outcome of the event")
-class NewsReport(BaseModel):
     title: str = Field(description="The title or headline of the news report")
     summary: str = Field(description="A brief summary of the news report")
     publication_date: Optional[str] = Field(description="The publication date of the report")
@@ -116,16 +116,16 @@ from pydantic import BaseModel, Field
 class MetaData(BaseModel):
     title : str = Field(description="The title of the article")
     authors : List[str] = Field(description="The list of the article's authors")
-    abstract: str = Field(description="The article's abstract")
     key_words: List[str] = Field(description="The key words associated with the article")
 class Baseline(BaseModel):
     method_name : str = Field(description="The name of the baseline method")
     proposed_solution : str = Field(description="the proposed solution in details")
     performance_metrics : str = Field(description="The performance metrics of the method and comparative analysis")
 class ExtractionTarget(BaseModel):
     key_contributions: List[str] = Field(description="The key contributions of the article")
     limitation_of_sota : str=Field(description="the summary limitation of the existing work")
     proposed_solution : str = Field(description="the proposed solution in details")

 **Task**: Please extract all economic policies affecting the stock market between 2015 and 2023 and the exact dates of their implementation.
 **Text**: This text is from the field of Economics and represents the genre of Article.
 ...(example text)...
+**Output Schema**:
 {
   "economic_policies": [
       {
 **Text**: This text is from the field of Political and represents the genre of News Report.
 ...(example text)...
 **Output Schema**:
+Answer:
 {
+  "news_report":
     {
       "title": null,
       "summary": null,
 """
 code_schema_examples = """
+Example1:
 **Task**: Extract all the entities in the given text.
+**Text**:
 ...(example text)...
 **Output Schema**:
 ```python
 class Entity(BaseModel):
     label : str = Field(description="The type or category of the entity, such as 'Process', 'Technique', 'Data Structure', 'Methodology', 'Person', etc. ")
     name : str = Field(description="The specific name of the entity. It should represent a single, distinct concept and must not be an empty string. For example, if the entity is a 'Technique', the name could be 'Neural Networks'.")
 class ExtractionTarget(BaseModel):
     entity_list : List[Entity] = Field(description="All the entities presented in the context. The entities should encode ONE concept.")
 ```
+Example2:
 **Task**: Extract all the information in the given text.
 **Text**: This text is from the field of Political and represents the genre of News Article.
 ...(example text)...
     process: Optional[str] = Field(description="Details of the event process")
     result: Optional[str] = Field(default=None, description="Result or outcome of the event")
+class NewsReport(BaseModel):
     title: str = Field(description="The title or headline of the news report")
     summary: str = Field(description="A brief summary of the news report")
     publication_date: Optional[str] = Field(description="The publication date of the report")
 class MetaData(BaseModel):
     title : str = Field(description="The title of the article")
     authors : List[str] = Field(description="The list of the article's authors")
+    abstract: str = Field(description="The article's abstract")
     key_words: List[str] = Field(description="The key words associated with the article")
 class Baseline(BaseModel):
     method_name : str = Field(description="The name of the baseline method")
     proposed_solution : str = Field(description="the proposed solution in details")
     performance_metrics : str = Field(description="The performance metrics of the method and comparative analysis")
 class ExtractionTarget(BaseModel):
     key_contributions: List[str] = Field(description="The key contributions of the article")
     limitation_of_sota : str=Field(description="the summary limitation of the existing work")
     proposed_solution : str = Field(description="the proposed solution in details")

src/models/prompt_template.py CHANGED Viewed

@@ -1,9 +1,9 @@
 from langchain.prompts import PromptTemplate
 from .prompt_example import *
-# ==================================================================== #
-#                           SCHEMA AGENT                               #
-# ==================================================================== #
 # Get Text Analysis
 TEXT_ANALYSIS_INSTRUCTION = """
@@ -22,9 +22,9 @@ text_analysis_instruction = PromptTemplate(
 # Get Deduced Schema Json
 DEDUCE_SCHEMA_JSON_INSTRUCTION = """
 **Instruction**: Generate an output format that meets the requirements as described in the task. Pay attention to the following requirements:
-    - Format: Return your responses in dictionary format as a JSON object.
     - Content: Do not include any actual data; all attributes values should be set to None.
-    - Note: Attributes not mentioned in the task description should be ignored.
 {examples}
 **Task**: {instruction}
@@ -57,9 +57,9 @@ deduced_schema_code_instruction = PromptTemplate(
 )
-# ==================================================================== #
-#                         EXTRACTION AGENT                             #
-# ==================================================================== #
 EXTRACT_INSTRUCTION = """
 **Instruction**: You are an agent skilled in information extarction. {instruction}
@@ -113,9 +113,9 @@ summarize_instruction = PromptTemplate(
-# ==================================================================== #
-#                          REFLECION AGENT                             #
-# ==================================================================== #
 REFLECT_INSTRUCTION = """**Instruction**: You are an agent skilled in reflection and optimization based on the original result. Refer to **Reflection Reference** to identify potential issues in the current extraction results.
 **Reflection Reference**: {examples}
@@ -153,9 +153,9 @@ summarize_instruction = PromptTemplate(
-# ==================================================================== #
-#                            CASE REPOSITORY                           #
-# ==================================================================== #
 GOOD_CASE_ANALYSIS_INSTRUCTION = """
 **Instruction**: Below is an information extraction task and its corresponding correct answer. Provide the reasoning steps that led to the correct answer, along with brief explanation of the answer. Your response should be brief and organized.

 from langchain.prompts import PromptTemplate
 from .prompt_example import *
+# ==================================================================== #
+#                           SCHEMA AGENT                               #
+# ==================================================================== #
 # Get Text Analysis
 TEXT_ANALYSIS_INSTRUCTION = """
 # Get Deduced Schema Json
 DEDUCE_SCHEMA_JSON_INSTRUCTION = """
 **Instruction**: Generate an output format that meets the requirements as described in the task. Pay attention to the following requirements:
+    - Format: Return your responses in dictionary format as a JSON object.
     - Content: Do not include any actual data; all attributes values should be set to None.
+    - Note: Attributes not mentioned in the task description should be ignored.
 {examples}
 **Task**: {instruction}
 )
+# ==================================================================== #
+#                         EXTRACTION AGENT                             #
+# ==================================================================== #
 EXTRACT_INSTRUCTION = """
 **Instruction**: You are an agent skilled in information extarction. {instruction}
+# ==================================================================== #
+#                          REFLECION AGENT                             #
+# ==================================================================== #
 REFLECT_INSTRUCTION = """**Instruction**: You are an agent skilled in reflection and optimization based on the original result. Refer to **Reflection Reference** to identify potential issues in the current extraction results.
 **Reflection Reference**: {examples}
+# ==================================================================== #
+#                            CASE REPOSITORY                           #
+# ==================================================================== #
 GOOD_CASE_ANALYSIS_INSTRUCTION = """
 **Instruction**: Below is an information extraction task and its corresponding correct answer. Provide the reasoning steps that led to the correct answer, along with brief explanation of the answer. Your response should be brief and organized.

src/models/vllm_serve.py CHANGED Viewed

@@ -9,13 +9,13 @@ from utils import *
 def main():
     # Create command-line argument parser
     parser = argparse.ArgumentParser(description='Run the extraction model.')
-    parser.add_argument('--config', type=str, required=True,
                         help='Path to the YAML configuration file.')
     parser.add_argument('--tensor-parallel-size', type=int, default=2,
                         help='Tensor parallel size for the VLLM server.')
     parser.add_argument('--max-model-len', type=int, default=32768,
                         help='Maximum model length for the VLLM server.')
     # Parse command-line arguments
     args = parser.parse_args()
@@ -31,4 +31,3 @@ def main():
 if __name__ == "__main__":
     main()

 def main():
     # Create command-line argument parser
     parser = argparse.ArgumentParser(description='Run the extraction model.')
+    parser.add_argument('--config', type=str, required=True,
                         help='Path to the YAML configuration file.')
     parser.add_argument('--tensor-parallel-size', type=int, default=2,
                         help='Tensor parallel size for the VLLM server.')
     parser.add_argument('--max-model-len', type=int, default=32768,
                         help='Maximum model length for the VLLM server.')
     # Parse command-line arguments
     args = parser.parse_args()
 if __name__ == "__main__":
     main()

src/modules/extraction_agent.py CHANGED Viewed

@@ -65,6 +65,34 @@ class ExtractionAgent:
                     data.constraint = json.dumps(result)
                 except:
                     print("Invalid Constraint: Event Extraction constraint must be a dictionary with event types as keys and lists of arguments as values.", data.constraint)
         return data
     def extract_information_direct(self, data: DataPoint):

                     data.constraint = json.dumps(result)
                 except:
                     print("Invalid Constraint: Event Extraction constraint must be a dictionary with event types as keys and lists of arguments as values.", data.constraint)
+        elif data.task == "Triple":
+            constraint = json.dumps(data.constraint)
+            if "**Triple Extraction Constraint**" in constraint:
+                return data
+            if self.llm.name != "OneKE":
+                if len(data.constraint) == 1: # 1 list means entity
+                    data.constraint = f"\n**Triple Extraction Constraint**: Entities type must chosen from following list:\n{constraint}\n"
+                elif len(data.constraint) == 2: # 2 list means entity and relation
+                    if data.constraint[0] == []:
+                        data.constraint = f"\n**Triple Extraction Constraint**: Relation type must chosen from following list:\n{data.constraint[1]}\n"
+                    elif data.constraint[1] == []:
+                        data.constraint = f"\n**Triple Extraction Constraint**: Entities type must chosen from following list:\n{data.constraint[0]}\n"
+                    else:
+                        data.constraint = f"\n**Triple Extraction Constraint**: Entities type must chosen from following list:\n{data.constraint[0]}\nRelation type must chosen from following list:\n{data.constraint[1]}\n"
+                elif len(data.constraint) == 3: # 3 list means entity, relation and object
+                    if data.constraint[0] == []:
+                        data.constraint = f"\n**Triple Extraction Constraint**: Relation type must chosen from following list:\n{data.constraint[1]}\nObject Entities must chosen from following list:\n{data.constraint[2]}\n"
+                    elif data.constraint[1] == []:
+                        data.constraint = f"\n**Triple Extraction Constraint**: Subject Entities must chosen from following list:\n{data.constraint[0]}\nObject Entities must chosen from following list:\n{data.constraint[2]}\n"
+                    elif data.constraint[2] == []:
+                        data.constraint = f"\n**Triple Extraction Constraint**: Subject Entities must chosen from following list:\n{data.constraint[0]}\nRelation type must chosen from following list:\n{data.constraint[1]}\n"
+                    else:
+                        data.constraint = f"\n**Triple Extraction Constraint**: Subject Entities must chosen from following list:\n{data.constraint[0]}\nRelation type must chosen from following list:\n{data.constraint[1]}\nObject Entities must chosen from following list:\n{data.constraint[2]}\n"
+                else:
+                    data.constraint = f"\n**Triple Extraction Constraint**: The type of entities must be chosen from the following list:\n{constraint}\n"
+            else:
+                print("OneKE does not support Triple Extraction task now, please wait for the next version.")
+            # print("data.constraint", data.constraint)
         return data
     def extract_information_direct(self, data: DataPoint):

src/modules/knowledge_base/case_repository.py CHANGED Viewed

@@ -19,7 +19,7 @@ class CaseRepository:
             self.embedder = SentenceTransformer(docker_model_path)
         except:
             self.embedder = SentenceTransformer(config['model']['embedding_model'])
-        self.embedder.to(device)
         self.corpus = self.load_corpus()
         self.embedded_corpus = self.embed_corpus()
@@ -27,14 +27,14 @@ class CaseRepository:
         with open(os.path.join(os.path.dirname(__file__), "case_repository.json")) as file:
             corpus = json.load(file)
         return corpus
     def update_corpus(self):
         try:
             with open(os.path.join(os.path.dirname(__file__), "case_repository.json"), "w") as file:
                 json.dump(self.corpus, file, indent=2)
         except Exception as e:
             print(f"Error when updating corpus: {e}")
     def embed_corpus(self):
         embedded_corpus = {}
         for key, content in self.corpus.items():
@@ -43,8 +43,8 @@ class CaseRepository:
             bad_index = [item['index']['embed_index'] for item in content['bad']]
             encoded_bad_index = self.embedder.encode(bad_index, convert_to_tensor=True).to(device)
             embedded_corpus[key] = {"good": encoded_good_index, "bad": encoded_bad_index}
-        return embedded_corpus
     def get_similarity_scores(self, task: TaskType, embed_index="", str_index="", case_type="", top_k=2):
         device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
         # Embedding similarity match
@@ -58,7 +58,7 @@ class CaseRepository:
         scores_dict = {match[0]: match[1] for match in str_similarity_results}
         scores_in_order = [scores_dict[candidate] for candidate in str_match_corpus]
         str_similarity_scores = torch.tensor(scores_in_order, dtype=torch.float32).to(device)
         # Normalize scores
         embedding_score_range = embedding_similarity_scores.max() - embedding_similarity_scores.min()
         str_score_range = str_similarity_scores.max() - str_similarity_scores.min()
@@ -74,16 +74,16 @@ class CaseRepository:
         # Combine the scores with weights
         combined_scores = 0.5 * embed_norm_scores + 0.5 * str_norm_scores
         original_combined_scores = 0.5 * embedding_similarity_scores + 0.5 * str_similarity_scores / 100
         scores, indices = torch.topk(combined_scores, k=min(top_k, combined_scores.size(0)))
         original_scores, original_indices = torch.topk(original_combined_scores, k=min(top_k, original_combined_scores.size(0)))
         return scores, indices, original_scores, original_indices
     def query_case(self, task: TaskType, embed_index="", str_index="", case_type="", top_k=2) -> list:
         _, indices, _, _ = self.get_similarity_scores(task, embed_index, str_index, case_type, top_k)
         top_matches = [self.corpus[task][case_type][idx]["content"] for idx in indices]
         return top_matches
     def update_case(self, task: TaskType, embed_index="", str_index="", content="" ,case_type=""):
         self.corpus[task][case_type].append({"index": {"embed_index": embed_index, "str_index": str_index}, "content": content})
         self.embedded_corpus[task][case_type] = torch.cat([self.embedded_corpus[task][case_type], self.embedder.encode([embed_index], convert_to_tensor=True).to(device)], dim=0)
@@ -102,9 +102,9 @@ class CaseRepositoryHandler:
             response = self.llm.get_chat_response(prompt)
             response = extract_json_dict(response)
             if not isinstance(response, dict):
-                return response
         return None
     def __get_bad_case_reflection(self, instruction="", text="", original_answer="", correct_answer="", additional_info=""):
         prompt = bad_case_reflection_instruction.format(
             instruction=instruction, text=text, original_answer=original_answer, correct_answer=correct_answer, additional_info=additional_info
@@ -115,34 +115,34 @@ class CaseRepositoryHandler:
             if not isinstance(response, dict):
                 return response
         return None
     def __get_index(self, data: DataPoint, case_type: str):
         # set embed_index
         embed_index = f"**Text**: {data.distilled_text}\n{data.chunk_text_list[0]}"
         # set str_index
         if data.task == "Base":
             str_index = f"**Task**: {data.instruction}"
         else:
             str_index = f"{data.constraint}"
         if case_type == "bad":
             str_index += f"\n\n**Original Result**: {json.dumps(data.pred)}"
         return embed_index, str_index
     def query_good_case(self, data: DataPoint):
         embed_index, str_index = self.__get_index(data, "good")
         return self.repository.query_case(task=data.task, embed_index=embed_index, str_index=str_index, case_type="good")
     def query_bad_case(self, data: DataPoint):
         embed_index, str_index = self.__get_index(data, "bad")
         return self.repository.query_case(task=data.task, embed_index=embed_index, str_index=str_index, case_type="bad")
     def update_good_case(self, data: DataPoint):
         if data.truth == "" :
             print("No truth value provided.")
-            return
         embed_index, str_index = self.__get_index(data, "good")
         _, _, original_scores, _ = self.repository.get_similarity_scores(data.task, embed_index, str_index, "good", 1)
         original_scores = original_scores.tolist()
@@ -159,11 +159,11 @@ class CaseRepositoryHandler:
         else:
             content = f"{wrapped_text}\n\n{data.constraint}\n\n{wrapped_good_case_analysis}\n\n{wrapped_answer}"
         self.repository.update_case(data.task, embed_index, str_index, content, "good")
     def update_bad_case(self, data: DataPoint):
         if data.truth == "" :
             print("No truth value provided.")
-            return
         if normalize_obj(data.pred) == normalize_obj(data.truth):
             return
         embed_index, str_index = self.__get_index(data, "bad")
@@ -183,7 +183,7 @@ class CaseRepositoryHandler:
         else:
             content =  f"{wrapped_text}\n\n{data.constraint}\n\n{wrapper_original_answer}\n\n{wrapped_bad_case_reflection}\n\n{wrapper_correct_answer}"
         self.repository.update_case(data.task, embed_index, str_index, content, "bad")
     def update_case(self, data: DataPoint):
         self.update_good_case(data)
         self.update_bad_case(data)

             self.embedder = SentenceTransformer(docker_model_path)
         except:
             self.embedder = SentenceTransformer(config['model']['embedding_model'])
+        self.embedder.to(device)
         self.corpus = self.load_corpus()
         self.embedded_corpus = self.embed_corpus()
         with open(os.path.join(os.path.dirname(__file__), "case_repository.json")) as file:
             corpus = json.load(file)
         return corpus
     def update_corpus(self):
         try:
             with open(os.path.join(os.path.dirname(__file__), "case_repository.json"), "w") as file:
                 json.dump(self.corpus, file, indent=2)
         except Exception as e:
             print(f"Error when updating corpus: {e}")
     def embed_corpus(self):
         embedded_corpus = {}
         for key, content in self.corpus.items():
             bad_index = [item['index']['embed_index'] for item in content['bad']]
             encoded_bad_index = self.embedder.encode(bad_index, convert_to_tensor=True).to(device)
             embedded_corpus[key] = {"good": encoded_good_index, "bad": encoded_bad_index}
+        return embedded_corpus
     def get_similarity_scores(self, task: TaskType, embed_index="", str_index="", case_type="", top_k=2):
         device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
         # Embedding similarity match
         scores_dict = {match[0]: match[1] for match in str_similarity_results}
         scores_in_order = [scores_dict[candidate] for candidate in str_match_corpus]
         str_similarity_scores = torch.tensor(scores_in_order, dtype=torch.float32).to(device)
         # Normalize scores
         embedding_score_range = embedding_similarity_scores.max() - embedding_similarity_scores.min()
         str_score_range = str_similarity_scores.max() - str_similarity_scores.min()
         # Combine the scores with weights
         combined_scores = 0.5 * embed_norm_scores + 0.5 * str_norm_scores
         original_combined_scores = 0.5 * embedding_similarity_scores + 0.5 * str_similarity_scores / 100
         scores, indices = torch.topk(combined_scores, k=min(top_k, combined_scores.size(0)))
         original_scores, original_indices = torch.topk(original_combined_scores, k=min(top_k, original_combined_scores.size(0)))
         return scores, indices, original_scores, original_indices
     def query_case(self, task: TaskType, embed_index="", str_index="", case_type="", top_k=2) -> list:
         _, indices, _, _ = self.get_similarity_scores(task, embed_index, str_index, case_type, top_k)
         top_matches = [self.corpus[task][case_type][idx]["content"] for idx in indices]
         return top_matches
     def update_case(self, task: TaskType, embed_index="", str_index="", content="" ,case_type=""):
         self.corpus[task][case_type].append({"index": {"embed_index": embed_index, "str_index": str_index}, "content": content})
         self.embedded_corpus[task][case_type] = torch.cat([self.embedded_corpus[task][case_type], self.embedder.encode([embed_index], convert_to_tensor=True).to(device)], dim=0)
             response = self.llm.get_chat_response(prompt)
             response = extract_json_dict(response)
             if not isinstance(response, dict):
+                return response
         return None
     def __get_bad_case_reflection(self, instruction="", text="", original_answer="", correct_answer="", additional_info=""):
         prompt = bad_case_reflection_instruction.format(
             instruction=instruction, text=text, original_answer=original_answer, correct_answer=correct_answer, additional_info=additional_info
             if not isinstance(response, dict):
                 return response
         return None
     def __get_index(self, data: DataPoint, case_type: str):
         # set embed_index
         embed_index = f"**Text**: {data.distilled_text}\n{data.chunk_text_list[0]}"
         # set str_index
         if data.task == "Base":
             str_index = f"**Task**: {data.instruction}"
         else:
             str_index = f"{data.constraint}"
         if case_type == "bad":
             str_index += f"\n\n**Original Result**: {json.dumps(data.pred)}"
         return embed_index, str_index
     def query_good_case(self, data: DataPoint):
         embed_index, str_index = self.__get_index(data, "good")
         return self.repository.query_case(task=data.task, embed_index=embed_index, str_index=str_index, case_type="good")
     def query_bad_case(self, data: DataPoint):
         embed_index, str_index = self.__get_index(data, "bad")
         return self.repository.query_case(task=data.task, embed_index=embed_index, str_index=str_index, case_type="bad")
     def update_good_case(self, data: DataPoint):
         if data.truth == "" :
             print("No truth value provided.")
+            return
         embed_index, str_index = self.__get_index(data, "good")
         _, _, original_scores, _ = self.repository.get_similarity_scores(data.task, embed_index, str_index, "good", 1)
         original_scores = original_scores.tolist()
         else:
             content = f"{wrapped_text}\n\n{data.constraint}\n\n{wrapped_good_case_analysis}\n\n{wrapped_answer}"
         self.repository.update_case(data.task, embed_index, str_index, content, "good")
     def update_bad_case(self, data: DataPoint):
         if data.truth == "" :
             print("No truth value provided.")
+            return
         if normalize_obj(data.pred) == normalize_obj(data.truth):
             return
         embed_index, str_index = self.__get_index(data, "bad")
         else:
             content =  f"{wrapped_text}\n\n{data.constraint}\n\n{wrapper_original_answer}\n\n{wrapped_bad_case_reflection}\n\n{wrapper_correct_answer}"
         self.repository.update_case(data.task, embed_index, str_index, content, "bad")
     def update_case(self, data: DataPoint):
         self.update_good_case(data)
         self.update_bad_case(data)

src/modules/knowledge_base/schema_repository.py CHANGED Viewed

@@ -33,6 +33,19 @@ class Event(BaseModel):
 class EventList(BaseModel):
     event_list : List[Event] = Field(description="The events presented in the text.")
 # ==================================================================== #
 #                          TEXT DESCRIPTION                            #
 # ==================================================================== #

 class EventList(BaseModel):
     event_list : List[Event] = Field(description="The events presented in the text.")
+# ==================================================================== #
+#                            Triple TASK                               #
+# ==================================================================== #
+class Triple(BaseModel):
+    head: str = Field(description="The subject or head of the triple.")
+    head_type: str = Field(description="The type of the subject entity.")
+    relation: str = Field(description="The predicate or relation between the entities.")
+    relation_type: str = Field(description="The type of the relation.")
+    tail: str = Field(description="The object or tail of the triple.")
+    tail_type: str = Field(description="The type of the object entity.")
+class TripleList(BaseModel):
+    triple_list: List[Triple] = Field(description="The collection of triples and their types presented in the text.")
 # ==================================================================== #
 #                          TEXT DESCRIPTION                            #
 # ==================================================================== #

src/modules/reflection_agent.py CHANGED Viewed

@@ -5,7 +5,7 @@ from .knowledge_base.case_repository import CaseRepositoryHandler
 class ReflectionGenerator:
     def __init__(self, llm: BaseEngine):
         self.llm = llm
     def get_reflection(self, instruction="", examples="", text="",schema="", result=""):
         result = json.dumps(result)
         examples = bad_case_wrapper(examples)
@@ -13,7 +13,7 @@ class ReflectionGenerator:
         response = self.llm.get_chat_response(prompt)
         response = extract_json_dict(response)
         return response
 class ReflectionAgent:
     def __init__(self, llm: BaseEngine, case_repo: CaseRepositoryHandler):
         self.llm = llm
@@ -29,7 +29,7 @@ class ReflectionAgent:
         else:
             selected_obj = max(result_list, key=lambda o: len(json.dumps(o)))
         return selected_obj
     def __self_consistance_check(self, data: DataPoint):
         extract_func = list(data.result_trajectory.keys())[-1]
         if hasattr(self.extractor, extract_func):
@@ -55,7 +55,7 @@ class ReflectionAgent:
                 consistant_result.append(selected_element)
             data.set_result_list(consistant_result)
             return reflect_index
     def reflect_with_case(self, data: DataPoint):
         if data.result_list == []:
             return data
@@ -71,4 +71,3 @@ class ReflectionAgent:
         function_name = current_function_name()
         data.update_trajectory(function_name, data.result_list)
         return data

 class ReflectionGenerator:
     def __init__(self, llm: BaseEngine):
         self.llm = llm
     def get_reflection(self, instruction="", examples="", text="",schema="", result=""):
         result = json.dumps(result)
         examples = bad_case_wrapper(examples)
         response = self.llm.get_chat_response(prompt)
         response = extract_json_dict(response)
         return response
 class ReflectionAgent:
     def __init__(self, llm: BaseEngine, case_repo: CaseRepositoryHandler):
         self.llm = llm
         else:
             selected_obj = max(result_list, key=lambda o: len(json.dumps(o)))
         return selected_obj
     def __self_consistance_check(self, data: DataPoint):
         extract_func = list(data.result_trajectory.keys())[-1]
         if hasattr(self.extractor, extract_func):
                 consistant_result.append(selected_element)
             data.set_result_list(consistant_result)
             return reflect_index
     def reflect_with_case(self, data: DataPoint):
         if data.result_list == []:
             return data
         function_name = current_function_name()
         data.update_trajectory(function_name, data.result_list)
         return data

src/modules/schema_agent.py CHANGED Viewed

@@ -106,6 +106,18 @@ class Event(BaseModel):
 class EventList(BaseModel):
     event_list : List[Event] = Field(description="The events presented in the text.")
             """
         return data
     def get_default_schema(self, data: DataPoint):

 class EventList(BaseModel):
     event_list : List[Event] = Field(description="The events presented in the text.")
             """
+        elif data.task == "Triple":
+            data.print_schema = """
+class Triple(BaseModel):
+    head: str = Field(description="The subject or head of the triple.")
+    head_type: str = Field(description="The type of the subject entity.")
+    relation: str = Field(description="The predicate or relation between the entities.")
+    relation_type: str = Field(description="The type of the relation.")
+    tail: str = Field(description="The object or tail of the triple.")
+    tail_type: str = Field(description="The type of the object entity.")
+class TripleList(BaseModel):
+    triple_list: List[Triple] = Field(description="The collection of triples and their types presented in the text.")
+"""
         return data
     def get_default_schema(self, data: DataPoint):

src/pipeline.py CHANGED Viewed

@@ -2,6 +2,7 @@ from typing import Literal
 from models import *
 from utils import *
 from modules import *
 class Pipeline:
@@ -14,7 +15,7 @@ class Pipeline:
     def __check_consistancy(self, llm, task, mode, update_case):
         if llm.name == "OneKE":
-            if task == "Base":
                 raise ValueError("The finetuned OneKE only supports quick extraction mode for NER, RE and EE Task.")
             else:
                 mode = "quick"
@@ -44,12 +45,16 @@ class Pipeline:
         elif data.task == "EE":
             data.instruction = config['agent']['default_ee']
             data.output_schema = "EventList"
         return data
     # main entry
     def get_extract_result(self,
                            task: TaskType,
                            three_agents = {},
                            instruction: str = "",
                            text: str = "",
                            output_schema: str = "",
@@ -61,6 +66,7 @@ class Pipeline:
                            update_case: bool = False,
                            show_trajectory: bool = False,
                            isgui: bool = False,
                            ):
         # for key, value in locals().items():
         #     print(f"{key}: {value}")
@@ -105,7 +111,17 @@ class Pipeline:
         # show result
         if show_trajectory:
             print("Extraction Trajectory: \n", json.dumps(data.get_result_trajectory(), indent=2))
-        print("Extraction Result: \n", json.dumps(data.pred, indent=2))
         frontend_res = data.pred #

 from models import *
 from utils import *
 from modules import *
+from construct import *
 class Pipeline:
     def __check_consistancy(self, llm, task, mode, update_case):
         if llm.name == "OneKE":
+            if task == "Base" or task == "Triple":
                 raise ValueError("The finetuned OneKE only supports quick extraction mode for NER, RE and EE Task.")
             else:
                 mode = "quick"
         elif data.task == "EE":
             data.instruction = config['agent']['default_ee']
             data.output_schema = "EventList"
+        elif data.task == "Triple":
+            data.instruction = config['agent']['default_triple']
+            data.output_schema = "TripleList"
         return data
     # main entry
     def get_extract_result(self,
                            task: TaskType,
                            three_agents = {},
+                           construct = {},
                            instruction: str = "",
                            text: str = "",
                            output_schema: str = "",
                            update_case: bool = False,
                            show_trajectory: bool = False,
                            isgui: bool = False,
+                           iskg: bool = False,
                            ):
         # for key, value in locals().items():
         #     print(f"{key}: {value}")
         # show result
         if show_trajectory:
             print("Extraction Trajectory: \n", json.dumps(data.get_result_trajectory(), indent=2))
+        extraction_result = json.dumps(data.pred, indent=2)
+        print("Extraction Result: \n", extraction_result)
+        # construct KG
+        if iskg:
+            myurl = construct['url']
+            myusername = construct['username']
+            mypassword = construct['password']
+            print(f"Construct KG in your {construct['database']} now...")
+            cypher_statements = generate_cypher_statements(extraction_result)
+            execute_cypher_statements(uri=myurl, user=myusername, password=mypassword, cypher_statements=cypher_statements)
         frontend_res = data.pred #

src/run.py CHANGED Viewed

@@ -11,9 +11,9 @@ from modules import *
 def main():
     # Create command-line argument parser
     parser = argparse.ArgumentParser(description='Run the extraction framefork.')
-    parser.add_argument('--config', type=str, required=True,
                         help='Path to the YAML configuration file.')
     # Parse command-line arguments
     args = parser.parse_args()
@@ -35,6 +35,15 @@ def main():
     pipeline = Pipeline(model)
     # Extraction config
     extraction_config = config['extraction']
     result, trajectory, _, _ = pipeline.get_extract_result(task=extraction_config['task'], instruction=extraction_config['instruction'], text=extraction_config['text'], output_schema=extraction_config['output_schema'], constraint=extraction_config['constraint'], use_file=extraction_config['use_file'], file_path=extraction_config['file_path'], truth=extraction_config['truth'], mode=extraction_config['mode'], update_case=extraction_config['update_case'], show_trajectory=extraction_config['show_trajectory'])
     return

 def main():
     # Create command-line argument parser
     parser = argparse.ArgumentParser(description='Run the extraction framefork.')
+    parser.add_argument('--config', type=str, required=True,
                         help='Path to the YAML configuration file.')
     # Parse command-line arguments
     args = parser.parse_args()
     pipeline = Pipeline(model)
     # Extraction config
     extraction_config = config['extraction']
+    # constuct config
+    if 'construct' in config:
+        construct_config = config['construct']
+        result, trajectory, _, _ = pipeline.get_extract_result(task=extraction_config['task'], instruction=extraction_config['instruction'], text=extraction_config['text'], output_schema=extraction_config['output_schema'], constraint=extraction_config['constraint'], use_file=extraction_config['use_file'], file_path=extraction_config['file_path'], truth=extraction_config['truth'], mode=extraction_config['mode'], update_case=extraction_config['update_case'], show_trajectory=extraction_config['show_trajectory'],
+                                                               construct=construct_config, iskg=True) # When 'construct' is provided, 'iskg' should be True to construct the knowledge graph.
+        return
+    else:
+        print("please provide construct config in the yaml file.")
     result, trajectory, _, _ = pipeline.get_extract_result(task=extraction_config['task'], instruction=extraction_config['instruction'], text=extraction_config['text'], output_schema=extraction_config['output_schema'], constraint=extraction_config['constraint'], use_file=extraction_config['use_file'], file_path=extraction_config['file_path'], truth=extraction_config['truth'], mode=extraction_config['mode'], update_case=extraction_config['update_case'], show_trajectory=extraction_config['show_trajectory'])
     return

src/utils/__init__.py CHANGED Viewed

@@ -1,3 +1,2 @@
 from .process import *
 from .data_def import DataPoint, TaskType


1	from .process import *
2	from .data_def import DataPoint, TaskType

src/utils/process.py CHANGED Viewed

@@ -17,28 +17,28 @@ import inspect
 import ast
 with open(os.path.join(os.path.dirname(__file__), "..", "config.yaml")) as file:
     config = yaml.safe_load(file)
-# Load configuration
 def load_extraction_config(yaml_path):
     # Read YAML content from the file path
     if not os.path.exists(yaml_path):
         print(f"Error: The config file '{yaml_path}' does not exist.")
         return {}
     with open(yaml_path, 'r') as file:
         config = yaml.safe_load(file)
     # Extract the 'extraction' configuration dictionary
     model_config = config.get('model', {})
     extraction_config = config.get('extraction', {})
     # Model config
     model_name_or_path = model_config.get('model_name_or_path', "")
     model_category = model_config.get('category', "")
     api_key = model_config.get('api_key', "")
     base_url = model_config.get('base_url', "")
     vllm_serve = model_config.get('vllm_serve', False)
     # Extraction config
     task = extraction_config.get('task', "")
     instruction = extraction_config.get('instruction', "")
@@ -52,6 +52,43 @@ def load_extraction_config(yaml_path):
     update_case = extraction_config.get('update_case', False)
     show_trajectory = extraction_config.get('show_trajectory', False)
     # Return a dictionary containing these variables
     return {
         "model": {
@@ -75,7 +112,7 @@ def load_extraction_config(yaml_path):
             "show_trajectory": show_trajectory
         }
     }
 # Split the string text into chunks
 def chunk_str(text):
     sentences = sent_tokenize(text)
@@ -102,24 +139,24 @@ def chunk_file(file_path):
     pages = []
     if file_path.endswith(".pdf"):
-        loader = PyPDFLoader(file_path)
     elif file_path.endswith(".txt"):
-        loader = TextLoader(file_path)
     elif file_path.endswith(".docx"):
-        loader = Docx2txtLoader(file_path)
     elif file_path.endswith(".html"):
-        loader = BSHTMLLoader(file_path)
     elif file_path.endswith(".json"):
-        loader = JSONLoader(file_path)
     else:
         raise ValueError("Unsupported file format")  # Inform that the format is unsupported
-    pages = loader.load_and_split()
     docs = ""
     for item in pages:
         docs += item.page_content
     pages = chunk_str(docs)
     return pages
 def process_single_quotes(text):
@@ -147,11 +184,11 @@ def remove_empty_values(data):
 def extract_json_dict(text):
     if isinstance(text, dict):
         return text
-    pattern = r'\{(?:[^{}]|(?:\{(?:[^{}]|(?:\{[^{}]*\})*)*\})*)*\}'
-    matches = re.findall(pattern, text)
     if matches:
-        json_string = matches[-1]
-        json_string = process_single_quotes(json_string)
         try:
             json_dict = json.loads(json_string)
             json_dict = remove_empty_values(json_dict)
@@ -159,9 +196,9 @@ def extract_json_dict(text):
                 return "No valid information found."
             return json_dict
         except json.JSONDecodeError:
-            return json_string
     else:
-        return text
 def good_case_wrapper(example: str):
     if example is None or example == "":
@@ -182,10 +219,10 @@ def example_wrapper(example: str):
     return example
 def remove_redundant_space(s):
-    s = ' '.join(s.split())
-    s = re.sub(r"\s*(,|:|\(|\)|\.|_|;|'|-)\s*", r'\1', s)
     return s
 def format_string(s):
     s = remove_redundant_space(s)
     s = s.lower()
@@ -197,9 +234,9 @@ def format_string(s):
     return s
 def calculate_metrics(y_truth: set, y_pred: set):
-    TP = len(y_truth & y_pred)
-    FN = len(y_truth - y_pred)
-    FP = len(y_pred - y_truth)
     precision = TP / (TP + FP) if (TP + FP) > 0 else 0
     recall = TP / (TP + FN) if (TP + FN) > 0 else 0
     f1_score = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0
@@ -214,11 +251,11 @@ def current_function_name():
         else:
             print("No caller function found")
             return None
     except Exception as e:
         print(f"An error occurred: {e}")
-        pass
 def normalize_obj(value):
     if isinstance(value, dict):
         return frozenset((k, normalize_obj(v)) for k, v in value.items())

 import ast
 with open(os.path.join(os.path.dirname(__file__), "..", "config.yaml")) as file:
     config = yaml.safe_load(file)
+# Load configuration
 def load_extraction_config(yaml_path):
     # Read YAML content from the file path
     if not os.path.exists(yaml_path):
         print(f"Error: The config file '{yaml_path}' does not exist.")
         return {}
     with open(yaml_path, 'r') as file:
         config = yaml.safe_load(file)
     # Extract the 'extraction' configuration dictionary
     model_config = config.get('model', {})
     extraction_config = config.get('extraction', {})
     # Model config
     model_name_or_path = model_config.get('model_name_or_path', "")
     model_category = model_config.get('category', "")
     api_key = model_config.get('api_key', "")
     base_url = model_config.get('base_url', "")
     vllm_serve = model_config.get('vllm_serve', False)
     # Extraction config
     task = extraction_config.get('task', "")
     instruction = extraction_config.get('instruction', "")
     update_case = extraction_config.get('update_case', False)
     show_trajectory = extraction_config.get('show_trajectory', False)
+    # Construct config (optional: for constructing your knowledge graph)
+    if 'construct' in config:
+        construct_config = config.get('construct', {})
+        database = construct_config.get('database', "")
+        url = construct_config.get('url', "")
+        username = construct_config.get('username', "")
+        password = construct_config.get('password', "")
+        # Return a dictionary containing these variables
+        return {
+            "model": {
+                "model_name_or_path": model_name_or_path,
+                "category": model_category,
+                "api_key": api_key,
+                "base_url": base_url,
+                "vllm_serve": vllm_serve
+            },
+            "extraction": {
+                "task": task,
+                "instruction": instruction,
+                "text": text,
+                "output_schema": output_schema,
+                "constraint": constraint,
+                "truth": truth,
+                "use_file": use_file,
+                "file_path": file_path,
+                "mode": mode,
+                "update_case": update_case,
+                "show_trajectory": show_trajectory
+            },
+            "construct": {
+                "database": database,
+                "url": url,
+                "username": username,
+                "password": password
+            }
+        }
     # Return a dictionary containing these variables
     return {
         "model": {
             "show_trajectory": show_trajectory
         }
     }
 # Split the string text into chunks
 def chunk_str(text):
     sentences = sent_tokenize(text)
     pages = []
     if file_path.endswith(".pdf"):
+        loader = PyPDFLoader(file_path)
     elif file_path.endswith(".txt"):
+        loader = TextLoader(file_path)
     elif file_path.endswith(".docx"):
+        loader = Docx2txtLoader(file_path)
     elif file_path.endswith(".html"):
+        loader = BSHTMLLoader(file_path)
     elif file_path.endswith(".json"):
+        loader = JSONLoader(file_path)
     else:
         raise ValueError("Unsupported file format")  # Inform that the format is unsupported
+    pages = loader.load_and_split()
     docs = ""
     for item in pages:
         docs += item.page_content
     pages = chunk_str(docs)
     return pages
 def process_single_quotes(text):
 def extract_json_dict(text):
     if isinstance(text, dict):
         return text
+    pattern = r'\{(?:[^{}]|(?:\{(?:[^{}]|(?:\{[^{}]*\})*)*\})*)*\}'
+    matches = re.findall(pattern, text)
     if matches:
+        json_string = matches[-1]
+        json_string = process_single_quotes(json_string)
         try:
             json_dict = json.loads(json_string)
             json_dict = remove_empty_values(json_dict)
                 return "No valid information found."
             return json_dict
         except json.JSONDecodeError:
+            return json_string
     else:
+        return text
 def good_case_wrapper(example: str):
     if example is None or example == "":
     return example
 def remove_redundant_space(s):
+    s = ' '.join(s.split())
+    s = re.sub(r"\s*(,|:|\(|\)|\.|_|;|'|-)\s*", r'\1', s)
     return s
 def format_string(s):
     s = remove_redundant_space(s)
     s = s.lower()
     return s
 def calculate_metrics(y_truth: set, y_pred: set):
+    TP = len(y_truth & y_pred)
+    FN = len(y_truth - y_pred)
+    FP = len(y_pred - y_truth)
     precision = TP / (TP + FP) if (TP + FP) > 0 else 0
     recall = TP / (TP + FN) if (TP + FN) > 0 else 0
     f1_score = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0
         else:
             print("No caller function found")
             return None
     except Exception as e:
         print(f"An error occurred: {e}")
+        pass
 def normalize_obj(value):
     if isinstance(value, dict):
         return frozenset((k, normalize_obj(v)) for k, v in value.items())

src/webui.py CHANGED Viewed

@@ -1,9 +1,15 @@
-import random
-import json
 import gradio as gr
-from pipeline import Pipeline
 from models import *
 examples = [
     {
@@ -43,7 +49,7 @@ examples = [
         "task": "Base",
         "mode": "quick",
         "use_file": True,
-        "file_path": "data/Harry_Potter_Chapter1.pdf",
         "instruction": "Extract main characters and the background setting from this chapter.",
         "constraint": "",
         "text": "",
@@ -54,13 +60,24 @@ examples = [
         "task": "Base",
         "mode": "quick",
         "use_file": True,
-        "file_path": "data/Tulsi_Gabbard_News.html",
         "instruction": "Extract key information from the given text.",
         "constraint": "",
         "text": "",
         "update_case": False,
         "truth": "",
     },
 ]
@@ -75,16 +92,16 @@ def create_interface():
                 </p>
                 <h1>OneKE: A Dockerized Schema-Guided LLM Agent-based Knowledge Extraction System</h1>
                 <p>
-                🌐[<a href="https://oneke.openkg.cn/" target="_blank">Web</a>]
-                ⌨️[<a href="https://github.com/zjunlp/OneKE" target="_blank">Code</a>]
                 📹[<a href="http://oneke.openkg.cn/demo.mp4" target="_blank">Video</a>]
                 </p>
             </div>
         """)
         example_button_gr = gr.Button("🎲 Quick Start with an Example 🎲")
         with gr.Row():
             with gr.Column():
                 model_gr = gr.Dropdown(
@@ -103,7 +120,7 @@ def create_interface():
             with gr.Column():
                 task_gr = gr.Dropdown(
                     label="🎯 Select your Task",
-                    choices=["Base", "NER", "RE", "EE"],
                     value="Base",
                 )
                 mode_gr = gr.Dropdown(
@@ -139,6 +156,8 @@ def create_interface():
                 return gr.update(visible=False), gr.update(visible=True, label="🕹️ Constraint", placeholder="Enter your RE Constraint")
             elif task == "EE":
                 return gr.update(visible=False), gr.update(visible=True, label="🕹️ Constraint", placeholder="Enter your EE Constraint")
         def update_input_fields(use_file):
             if use_file:
@@ -162,7 +181,7 @@ def create_interface():
                 gr.update(value=example["file_path"], visible=example["use_file"]),
                 gr.update(value=example["text"], visible=not example["use_file"]),
                 gr.update(value=example["instruction"], visible=example["task"] == "Base"),
-                gr.update(value=example["constraint"], visible=example["task"] in ["NER", "RE", "EE"]),
                 gr.update(value=example["update_case"]),
                 gr.update(value=example["truth"]),
                 gr.update(value="NOT REQUIRED", visible=False),
@@ -207,7 +226,7 @@ def create_interface():
                     if reflection_agent not in ["", "NOT REQUIRED"]:
                         agent3["reflection_agent"] = reflection_agent
-                # 调用 Pipeline
                 _, _, ger_frontend_schema, ger_frontend_res = pipeline.get_extract_result(
                     task=task,
                     text=text,
@@ -336,6 +355,8 @@ def create_interface():
     return demo
 if __name__ == "__main__":
     interface = create_interface()
-    interface.launch()

+"""
+....../OneKE$ python src/webui.py
+"""
 import gradio as gr
+import json
+import random
 from models import *
+from pipeline import Pipeline
 examples = [
     {
         "task": "Base",
         "mode": "quick",
         "use_file": True,
+        "file_path": "data/input_files/Harry_Potter_Chapter1.pdf",
         "instruction": "Extract main characters and the background setting from this chapter.",
         "constraint": "",
         "text": "",
         "task": "Base",
         "mode": "quick",
         "use_file": True,
+        "file_path": "data/input_files/Tulsi_Gabbard_News.html",
         "instruction": "Extract key information from the given text.",
         "constraint": "",
         "text": "",
         "update_case": False,
         "truth": "",
     },
+    {
+        "task": "Triple",
+        "mode": "quick",
+        "use_file": True,
+        "file_path": "data/input_files/Artificial_Intelligence_Wikipedia.txt",
+        "instruction": "",
+        "constraint": """[["Person", "Place", "Event", "property"], ["Interpersonal", "Located", "Ownership", "Action"]]""",
+        "text": "",
+        "update_case": False,
+        "truth": "",
+    }
 ]
                 </p>
                 <h1>OneKE: A Dockerized Schema-Guided LLM Agent-based Knowledge Extraction System</h1>
                 <p>
+                🌐[<a href="https://oneke.openkg.cn/" target="_blank">Home</a>]
                 📹[<a href="http://oneke.openkg.cn/demo.mp4" target="_blank">Video</a>]
+                📝[<a href="https://arxiv.org/abs/2209.10707" target="_blank">Paper</a>]
+                💻[<a href="https://github.com/zjunlp/OneKE" target="_blank">Code</a>]
                 </p>
             </div>
         """)
         example_button_gr = gr.Button("🎲 Quick Start with an Example 🎲")
         with gr.Row():
             with gr.Column():
                 model_gr = gr.Dropdown(
             with gr.Column():
                 task_gr = gr.Dropdown(
                     label="🎯 Select your Task",
+                    choices=["Base", "NER", "RE", "EE", "Triple"],
                     value="Base",
                 )
                 mode_gr = gr.Dropdown(
                 return gr.update(visible=False), gr.update(visible=True, label="🕹️ Constraint", placeholder="Enter your RE Constraint")
             elif task == "EE":
                 return gr.update(visible=False), gr.update(visible=True, label="🕹️ Constraint", placeholder="Enter your EE Constraint")
+            elif task == "Triple":
+                return gr.update(visible=False), gr.update(visible=True, label="🕹️ Constraint", placeholder="Enter your Triple Constraint")
         def update_input_fields(use_file):
             if use_file:
                 gr.update(value=example["file_path"], visible=example["use_file"]),
                 gr.update(value=example["text"], visible=not example["use_file"]),
                 gr.update(value=example["instruction"], visible=example["task"] == "Base"),
+                gr.update(value=example["constraint"], visible=example["task"] in ["NER", "RE", "EE", "Triple"]),
                 gr.update(value=example["update_case"]),
                 gr.update(value=example["truth"]),
                 gr.update(value="NOT REQUIRED", visible=False),
                     if reflection_agent not in ["", "NOT REQUIRED"]:
                         agent3["reflection_agent"] = reflection_agent
+                # use 'Pipeline'
                 _, _, ger_frontend_schema, ger_frontend_res = pipeline.get_extract_result(
                     task=task,
                     text=text,
     return demo
+# Launch the front-end interface
 if __name__ == "__main__":
     interface = create_interface()
+    interface.launch() # the Gradio defalut URL usually is: 127.0.0.1:7860