chengyingmo commited on
Commit
ce2b87c
·
verified ·
1 Parent(s): 3a056a5

Upload 7 files

Browse files
Files changed (7) hide show
  1. README.md +110 -12
  2. __init__.py +0 -0
  3. app.py +354 -64
  4. graph_demo_ui.py +87 -0
  5. requirements.txt +10 -1
  6. webui-test-graph.py +283 -0
  7. webui-test.py +354 -0
README.md CHANGED
@@ -1,12 +1,110 @@
1
- ---
2
- title: Ragdoing
3
- emoji: 💬
4
- colorFrom: yellow
5
- colorTo: purple
6
- sdk: gradio
7
- sdk_version: 5.0.1
8
- app_file: app.py
9
- pinned: false
10
- ---
11
-
12
- An example chatbot using [Gradio](https://gradio.app), [`huggingface_hub`](https://huggingface.co/docs/huggingface_hub/v0.22.2/en/index), and the [Hugging Face Inference API](https://huggingface.co/docs/api-inference/index).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Easy-RAG
2
+ 一个适合学习、使用、自主扩展的RAG【检索增强生成】系统,可以联网做AI搜索!
3
+
4
+
5
+ ![img](https://github.com/yuntianhe2014/Easy-RAG/blob/main/img/zhu.png)
6
+
7
+ 更新历史
8
+
9
+ 2024/9/04 增加 AI网络搜索 可以联网查询
10
+ 2024/9/04 优化webui异步调用,提高响应速度
11
+ 2024/8/21 增加对 Elasticsearch 支持,在config中设置
12
+ 2024/7/23 参考 meet-libai 项目增加了一个知识图谱的时时提取工具,目前仅是提取,未存储 graph_demo_ui.py
13
+ 2024/7/11 新增faiss向量数据库支持,目前支持(Chroma\FAISS)
14
+ 2024/7/10 更新rerank搜索方式
15
+ 2024/7/09 第一版发布
16
+ ![img](https://github.com/yuntianhe2014/Easy-RAG/blob/main/img/zhuye.png)
17
+
18
+ 1、目前已有的功能
19
+
20
+ 知识库(目前仅支持txt\csv\pdf\md\doc\docx\mp3\mp4\wav\excel\格式数据):
21
+
22
+ 1、知识库的创建(目前仅支持Chroma\Faiss\Elasticsearch)
23
+ 2、知识库的更新
24
+ 3、删除知识库中某个文件
25
+ 4、删除知识库
26
+ 5、向量化知识库
27
+ 6、支持音频视频的语音转文本然后向量化
28
+ 语音转文本 使用的 funasr ,第一次启动时,会从魔塔下载模型,可能会慢一些,之后会自动加载模型
29
+
30
+ chat
31
+
32
+ 1、支持纯大模型聊天多轮
33
+ 2、支持知识库问答 ["复杂召回方式", "简单召回方式","rerank"]
34
+
35
+ AI网络搜索
36
+
37
+ 支持网络搜素,大家可以优化 prompt 增加不同 程度的 总结
38
+ llm基于ollama可以选择不同模型
39
+ 注意:联网基于 searxng,需要先本地或者服务启动 这个项目,我用docker 启动的
40
+ 参考 https://github.com/searxng/searxng-docker
41
+
42
+ ![img](https://github.com/yuntianhe2014/Easy-RAG/blob/main/img/复杂方式.png)
43
+ 3、通过使用rerank重新排序来提高检索效率
44
+
45
+ 本次rerank 使用了bge-reranker-large 模型,需要下载到本地,然后再 rag/rerank.py中配置路径
46
+ 模型地址:https://hf-mirror.com/BAAI/bge-reranker-large
47
+
48
+
49
+ 2、后续更新计划
50
+
51
+ 知识库:
52
+
53
+ 0、支持Elasticsearch、Milvus,MongoDB等向量数据
54
+
55
+
56
+ chat:
57
+
58
+ 1、添加 语音回答输出
59
+ 2、增加 问题路由知识库的 功能
60
+
61
+
62
+ 安装使用
63
+
64
+ Ollma安装,在如下网址选择适合你机器的ollama 安装包,傻瓜式安装即可
65
+
66
+ https://ollama.com/download
67
+ Ollama 安装模型,本次直接安装我们需要的两个 cmd中执行
68
+
69
+ ollama run qwen2:7b
70
+ ollama run mofanke/acge_text_embedding:latest
71
+
72
+ 下载bge-reranker-large 模型然后在 rag/rerank.py中配置路径
73
+
74
+ https://hf-mirror.com/BAAI/bge-reranker-large
75
+
76
+ 选择你想使用的向量数据库 目前仅支持(Chroma和Faiss)
77
+
78
+ 在 Config/config.py中配置你想用的 向量数据库
79
+ 如果选择 Elasticsearch 请先启动 Elasticsearch,我是使用docker 启动的
80
+ docker run -p 9200:9200 -e "discovery.type=single-node" -e "xpack.security.enabled=false" -e "xpack.security.http.ssl.enabled=false" docker.elastic.co/elasticsearch/elasticsearch:8.12.1
81
+ 注意修改 es_url
82
+
83
+ 构造python环境
84
+
85
+ conda create -n Easy-RAG python=3.10.9
86
+ conda activate Easy-RAG
87
+
88
+ 项目开发使用的 python3.10.9 经测试 pyhon3.8以上皆可使用
89
+
90
+ git clone https://github.com/yuntianhe2014/Easy-RAG.git
91
+ 安装依赖
92
+
93
+ pip3 install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple
94
+
95
+ 部署依赖联网项目searxng
96
+ 参考 https://github.com/searxng/searxng-docker
97
+ 项目启动
98
+
99
+ python webui.py
100
+
101
+ 知识图谱时时提取工具
102
+ python graph_demo_ui.py
103
+ ![img](https://github.com/yuntianhe2014/Easy-RAG/blob/main/img/graph-tool.png)
104
+
105
+ 更多介绍参考 公众号文章:世界大模型
106
+ ![img](https://github.com/yuntianhe2014/Easy-RAG/blob/main/img/%E5%BE%AE%E4%BF%A1%E5%9B%BE%E7%89%87_20240524180648.jpg)
107
+
108
+ 项目参考:
109
+ https://github.com/BinNong/meet-libai
110
+ https://github.com/searxng/searxng-docker
__init__.py ADDED
File without changes
app.py CHANGED
@@ -1,64 +1,354 @@
1
- import gradio as gr
2
- from huggingface_hub import InferenceClient
3
-
4
- """
5
- For more information on `huggingface_hub` Inference API support, please check the docs: https://huggingface.co/docs/huggingface_hub/v0.22.2/en/guides/inference
6
- """
7
- client = InferenceClient("HuggingFaceH4/zephyr-7b-beta")
8
-
9
-
10
- def respond(
11
- message,
12
- history: list[tuple[str, str]],
13
- system_message,
14
- max_tokens,
15
- temperature,
16
- top_p,
17
- ):
18
- messages = [{"role": "system", "content": system_message}]
19
-
20
- for val in history:
21
- if val[0]:
22
- messages.append({"role": "user", "content": val[0]})
23
- if val[1]:
24
- messages.append({"role": "assistant", "content": val[1]})
25
-
26
- messages.append({"role": "user", "content": message})
27
-
28
- response = ""
29
-
30
- for message in client.chat_completion(
31
- messages,
32
- max_tokens=max_tokens,
33
- stream=True,
34
- temperature=temperature,
35
- top_p=top_p,
36
- ):
37
- token = message.choices[0].delta.content
38
-
39
- response += token
40
- yield response
41
-
42
-
43
- """
44
- For information on how to customize the ChatInterface, peruse the gradio docs: https://www.gradio.app/docs/chatinterface
45
- """
46
- demo = gr.ChatInterface(
47
- respond,
48
- additional_inputs=[
49
- gr.Textbox(value="You are a friendly Chatbot.", label="System message"),
50
- gr.Slider(minimum=1, maximum=2048, value=512, step=1, label="Max new tokens"),
51
- gr.Slider(minimum=0.1, maximum=4.0, value=0.7, step=0.1, label="Temperature"),
52
- gr.Slider(
53
- minimum=0.1,
54
- maximum=1.0,
55
- value=0.95,
56
- step=0.05,
57
- label="Top-p (nucleus sampling)",
58
- ),
59
- ],
60
- )
61
-
62
-
63
- if __name__ == "__main__":
64
- demo.launch()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import threading
3
+ import asyncio
4
+ import logging
5
+ from concurrent.futures import ThreadPoolExecutor
6
+ from functools import lru_cache
7
+ import requests
8
+ import json
9
+
10
+ # 假设这些是您的自定义模块,需要根据实际情况进行调整
11
+ from Config.config import VECTOR_DB, DB_directory
12
+ from Ollama_api.ollama_api import *
13
+ from rag.rag_class import *
14
+
15
+ # 设置日志
16
+ logging.basicConfig(level=logging.INFO)
17
+ logger = logging.getLogger(__name__)
18
+
19
+ # 根据VECTOR_DB选择合适的向量数据库
20
+ if VECTOR_DB == 1:
21
+ from embeding.chromadb import ChromaDB as vectorDB
22
+ vectordb = vectorDB(persist_directory=DB_directory)
23
+ elif VECTOR_DB == 2:
24
+ from embeding.faissdb import FaissDB as vectorDB
25
+ vectordb = vectorDB(persist_directory=DB_directory)
26
+ elif VECTOR_DB == 3:
27
+ from embeding.elasticsearchStore import ElsStore as vectorDB
28
+ vectordb = vectorDB()
29
+
30
+ # 存储上传的文件
31
+ uploaded_files = []
32
+
33
+ @lru_cache(maxsize=100)
34
+ def get_knowledge_base_files():
35
+ cl_dict = {}
36
+ cols = vectordb.get_all_collections_name()
37
+ for c_name in cols:
38
+ cl_dict[c_name] = vectordb.get_collcetion_content_files(c_name)
39
+ return cl_dict
40
+
41
+ knowledge_base_files = get_knowledge_base_files()
42
+
43
+ def upload_files(files):
44
+ if files:
45
+ new_files = [file.name for file in files]
46
+ uploaded_files.extend(new_files)
47
+ update_knowledge_base_files()
48
+ logger.info(f"Uploaded files: {new_files}")
49
+ return update_file_list(), new_files, "<div style='color: green; padding: 10px; border: 2px solid green; border-radius: 5px;'>Upload successful!</div>"
50
+ update_knowledge_base_files()
51
+ return update_file_list(), [], "<div style='color: red; padding: 10px; border: 2px solid red; border-radius: 5px;'>Upload failed!</div>"
52
+
53
+ def delete_files(selected_files):
54
+ global uploaded_files
55
+ uploaded_files = [f for f in uploaded_files if f not in selected_files]
56
+ if selected_files:
57
+ update_knowledge_base_files()
58
+ logger.info(f"Deleted files: {selected_files}")
59
+ return update_file_list(), "<div style='color: green; padding: 10px; border: 2px solid green; border-radius: 5px;'>Delete successful!</div>"
60
+ update_knowledge_base_files()
61
+ return update_file_list(), "<div style='color: red; padding: 10px; border: 2px solid red; border-radius: 5px;'>Delete failed!</div>"
62
+
63
+ def delete_collection(selected_knowledge_base):
64
+ if selected_knowledge_base and selected_knowledge_base != "创建知识库":
65
+ vectordb.delete_collection(selected_knowledge_base)
66
+ update_knowledge_base_files()
67
+ logger.info(f"Deleted collection: {selected_knowledge_base}")
68
+ return update_knowledge_base_dropdown(), "<div style='color: green; padding: 10px; border: 2px solid green; border-radius: 5px;'>Collection deleted successfully!</div>"
69
+ return update_knowledge_base_dropdown(), "<div style='color: red; padding: 10px; border: 2px solid red; border-radius: 5px;'>Delete collection failed!</div>"
70
+
71
+ async def async_vectorize_files(selected_files, selected_knowledge_base, new_kb_name, chunk_size, chunk_overlap):
72
+ if selected_files:
73
+ if selected_knowledge_base == "创建知识库":
74
+ knowledge_base = new_kb_name
75
+ vectordb.create_collection(selected_files, knowledge_base, chunk_size=chunk_size, chunk_overlap=chunk_overlap)
76
+ else:
77
+ knowledge_base = selected_knowledge_base
78
+ vectordb.add_chroma(selected_files, knowledge_base, chunk_size=chunk_size, chunk_overlap=chunk_overlap)
79
+
80
+ if knowledge_base not in knowledge_base_files:
81
+ knowledge_base_files[knowledge_base] = []
82
+ knowledge_base_files[knowledge_base].extend(selected_files)
83
+
84
+ logger.info(f"Vectorized files: {selected_files} for knowledge base: {knowledge_base}")
85
+ await asyncio.sleep(0) # 允许其他任务执行
86
+ return f"Vectorized files: {', '.join(selected_files)}\nKnowledge Base: {knowledge_base}\nUploaded Files: {', '.join(uploaded_files)}", "<div style='color: green; padding: 10px; border: 2px solid green; border-radius: 5px;'>Vectorization successful!</div>"
87
+ return "", "<div style='color: red; padding: 10px; border: 2px solid red; border-radius: 5px;'>Vectorization failed!</div>"
88
+
89
+ def update_file_list():
90
+ return gr.update(choices=uploaded_files, value=[])
91
+
92
+ def search_knowledge_base(selected_knowledge_base):
93
+ if selected_knowledge_base in knowledge_base_files:
94
+ kb_files = knowledge_base_files[selected_knowledge_base]
95
+ return gr.update(choices=kb_files, value=[])
96
+ return gr.update(choices=[], value=[])
97
+
98
+ def update_knowledge_base_files():
99
+ global knowledge_base_files
100
+ knowledge_base_files = get_knowledge_base_files()
101
+
102
+ # 处理聊天消息的函数
103
+ chat_history = []
104
+
105
+ def safe_chat_response(model_dropdown, vector_dropdown, chat_knowledge_base_dropdown, chain_dropdown, message):
106
+ try:
107
+ return chat_response(model_dropdown, vector_dropdown, chat_knowledge_base_dropdown, chain_dropdown, message)
108
+ except Exception as e:
109
+ logger.error(f"Error in chat response: {str(e)}")
110
+ return f"<div style='color: red;'>Error: {str(e)}</div>", ""
111
+
112
+ def chat_response(model_dropdown, vector_dropdown, chat_knowledge_base_dropdown, chain_dropdown, message):
113
+ global chat_history
114
+ if message:
115
+ chat_history.append(("User", message))
116
+ if chat_knowledge_base_dropdown == "仅使用模型":
117
+ rag = RAG_class(model=model_dropdown,persist_directory=DB_directory)
118
+ answer = rag.mult_chat(chat_history)
119
+ if chat_knowledge_base_dropdown and chat_knowledge_base_dropdown != "仅使用模型":
120
+ rag = RAG_class(model=model_dropdown, embed=vector_dropdown, c_name=chat_knowledge_base_dropdown, persist_directory=DB_directory)
121
+ if chain_dropdown == "复杂召回方式":
122
+ questions = rag.decomposition_chain(message)
123
+ answer = rag.rag_chain(questions)
124
+ elif chain_dropdown == "简单召回方式":
125
+ answer = rag.simple_chain(message)
126
+ else:
127
+ answer = rag.rerank_chain(message)
128
+
129
+ response = f" {answer}"
130
+ chat_history.append(("Bot", response))
131
+ return format_chat_history(chat_history), ""
132
+
133
+ def clear_chat():
134
+ global chat_history
135
+ chat_history = []
136
+ return format_chat_history(chat_history)
137
+
138
+ def format_chat_history(history):
139
+ formatted_history = ""
140
+ for user, msg in history:
141
+ if user == "User":
142
+ formatted_history += f'''
143
+ <div style="text-align: right; margin: 10px;">
144
+ <div style="display: inline-block; background-color: #DCF8C6; padding: 10px; border-radius: 10px; max-width: 60%;">
145
+ {msg}
146
+ </div>
147
+ <b>:User</b>
148
+ </div>
149
+ '''
150
+ else:
151
+ if "```" in msg: # 检测是否包含代码片段
152
+ code_content = msg.split("```")[1]
153
+ formatted_history += f'''
154
+ <div style="text-align: left; margin: 10px;">
155
+ <b>Bot:</b>
156
+ <div style="display: inline-block; background-color: #F1F0F0; padding: 10px; border-radius: 10px; max-width: 60%;">
157
+ <pre><code>{code_content}</code></pre>
158
+ </div>
159
+ </div>
160
+ '''
161
+ else:
162
+ formatted_history += f'''
163
+ <div style="text-align: left; margin: 10px;">
164
+ <b>Bot:</b>
165
+ <div style="display: inline-block; background-color: #F1F0F0; padding: 10px; border-radius: 10px; max-width: 60%;">
166
+ {msg}
167
+ </div>
168
+ </div>
169
+ '''
170
+ return formatted_history
171
+
172
+ def clear_status():
173
+ upload_status.update("")
174
+ delete_status.update("")
175
+ vectorize_status.update("")
176
+ delete_collection_status.update("")
177
+
178
+ def handle_knowledge_base_selection(selected_knowledge_base):
179
+ if selected_knowledge_base == "创建知识库":
180
+ return gr.update(visible=True, interactive=True), gr.update(choices=[], value=[]), gr.update(visible=False)
181
+ elif selected_knowledge_base == "仅使用模型":
182
+ return gr.update(visible=False, interactive=False), gr.update(choices=[], value=[]), gr.update(visible=False)
183
+ else:
184
+ return gr.update(visible=False, interactive=False), search_knowledge_base(selected_knowledge_base), gr.update(visible=True)
185
+
186
+ def update_knowledge_base_dropdown():
187
+ global knowledge_base_files
188
+ choices = ["创建知识库"] + list(knowledge_base_files.keys())
189
+ return gr.update(choices=choices)
190
+
191
+ def update_chat_knowledge_base_dropdown():
192
+ global knowledge_base_files
193
+ choices = ["仅使用模型"] + list(knowledge_base_files.keys())
194
+ return gr.update(choices=choices)
195
+
196
+
197
+ # SearxNG搜索函数
198
+ def search_searxng(query):
199
+ searxng_url = 'http://localhost:8080/search' # 替换为你的SearxNG实例URL
200
+ params = {
201
+ 'q': query,
202
+ 'format': 'json'
203
+ }
204
+ response = requests.get(searxng_url, params=params)
205
+ response.raise_for_status()
206
+ return response.json()
207
+
208
+
209
+ # Ollama总结函数
210
+ def summarize_with_ollama(model_dropdown,text, question):
211
+ prompt = """
212
+ 根据下边的内容,回答用户问题,
213
+ 内容为:‘{0}‘\n
214
+ 问题为:{1}
215
+ """.format(text, question)
216
+ ollama_url = 'http://localhost:11434/api/generate' # 替换为你的Ollama实例URL
217
+ data = {
218
+ 'model': model_dropdown,
219
+ "prompt": prompt,
220
+ "stream": False
221
+ }
222
+ response = requests.post(ollama_url, json=data)
223
+ response.raise_for_status()
224
+ return response.json()
225
+
226
+
227
+ # 处理函数
228
+ def ai_web_search(model_dropdown,user_query):
229
+ # 使用SearxNG进行搜索
230
+ search_results = search_searxng(user_query)
231
+ search_texts = [result['title'] + "\n" + result['content'] for result in search_results['results']]
232
+ combined_text = "\n\n".join(search_texts)
233
+
234
+ # 使用Ollama进行总结
235
+ summary = summarize_with_ollama(model_dropdown,combined_text, user_query)
236
+ # print(summary)
237
+ # 返回结果
238
+ return summary['response']
239
+ # 添加新的函数来处理AI网络搜索
240
+ # def ai_web_search(model_dropdown, query):
241
+ # try:
242
+ # # 这里添加实际的网络搜索和AI处理逻辑
243
+ # # 这只是一个示例,您需要根据实际情况实现
244
+ # search_result = f"搜索结果: {query}"
245
+ # ai_response = f"AI回答: 基于搜索结果,对于'{query}'的回答是..."
246
+ # return f"{search_result}\n\n{ai_response}"
247
+ # except Exception as e:
248
+ # logger.error(f"Error in AI web search: {str(e)}")
249
+ # return f"<div style='color: red;'>Error: {str(e)}</div>"
250
+
251
+ # 创建 Gradio 界面
252
+ with gr.Blocks() as demo:
253
+ with gr.Column():
254
+ # 添加标题
255
+ title = gr.HTML("<h1 style='text-align: center; font-size: 32px; font-weight: bold;'>RAG精致系统</h1>")
256
+ # 添加公告栏
257
+ announcement = gr.HTML("<div style='text-align: center; font-size: 18px; color: red;'>公告栏: RAG精致系统,【检索增强生成】系统!<br/>莫大大</div>")
258
+
259
+ with gr.Tabs():
260
+ with gr.TabItem("知识库"):
261
+ knowledge_base_dropdown = gr.Dropdown(choices=["创建知识库"] + list(knowledge_base_files.keys()),
262
+ label="选择知识库")
263
+ new_kb_input = gr.Textbox(label="输入新的知识库名称", visible=False, interactive=True)
264
+ file_input = gr.Files(label="Upload files")
265
+ upload_btn = gr.Button("Upload")
266
+ file_list = gr.CheckboxGroup(label="Uploaded Files")
267
+ delete_btn = gr.Button("Delete Selected Files")
268
+ with gr.Row():
269
+ chunk_size_dropdown = gr.Dropdown(choices=[50, 100, 200, 300, 500, 700], label="chunk_size", value=200)
270
+ chunk_overlap_dropdown = gr.Dropdown(choices=[20, 50, 100, 200], label="chunk_overlap", value=50)
271
+ vectorize_btn = gr.Button("Vectorize Selected Files")
272
+ delete_collection_btn = gr.Button("Delete Collection")
273
+ upload_status = gr.HTML()
274
+ delete_status = gr.HTML()
275
+ vectorize_status = gr.HTML()
276
+ delete_collection_status = gr.HTML()
277
+
278
+ with gr.TabItem("Chat"):
279
+ with gr.Row():
280
+ model_dropdown = gr.Dropdown(choices=get_llm(), label="模型")
281
+ vector_dropdown = gr.Dropdown(choices=get_embeding_model(), label="向量")
282
+ chat_knowledge_base_dropdown = gr.Dropdown(choices=["仅使用模型"] + vectordb.get_all_collections_name(), label="知识库")
283
+ chain_dropdown = gr.Dropdown(choices=["复杂召回方式", "简单召回方式","rerank"], label="chain方式", visible=False)
284
+ chat_display = gr.HTML(label="Chat History")
285
+ chat_input = gr.Textbox(label="Type a message")
286
+ chat_btn = gr.Button("Send")
287
+ clear_btn = gr.Button("Clear Chat History")
288
+
289
+ with gr.TabItem("AI网络搜索"):
290
+ with gr.Row():
291
+ web_search_model_dropdown = gr.Dropdown(choices=get_llm(), label="模型")
292
+ web_search_output = gr.Textbox(label="搜索结果和AI回答", lines=10)
293
+ web_search_input = gr.Textbox(label="输入搜索查询")
294
+
295
+ web_search_btn = gr.Button("搜索")
296
+
297
+ def handle_upload(files):
298
+ upload_result, new_files, status = upload_files(files)
299
+ threading.Thread(target=clear_status).start()
300
+ return upload_result, new_files, status, update_chat_knowledge_base_dropdown()
301
+
302
+ def handle_delete(selected_knowledge_base, selected_files):
303
+ tmp = []
304
+ cols_files_tmp = vectordb.get_collcetion_content_files(c_name=selected_knowledge_base)
305
+ for i in selected_files:
306
+ if i in cols_files_tmp:
307
+ tmp.append(i)
308
+ del cols_files_tmp
309
+ if tmp:
310
+ vectordb.del_files(tmp, c_name=selected_knowledge_base)
311
+ del tmp
312
+ delete_result, status = delete_files(selected_files)
313
+ threading.Thread(target=clear_status).start()
314
+ return delete_result, status, update_chat_knowledge_base_dropdown()
315
+
316
+ def handle_vectorize(selected_files, selected_knowledge_base, new_kb_name, chunk_size, chunk_overlap):
317
+ vectorize_result, status = asyncio.run(async_vectorize_files(selected_files, selected_knowledge_base, new_kb_name, chunk_size, chunk_overlap))
318
+ threading.Thread(target=clear_status).start()
319
+ return vectorize_result, status, update_knowledge_base_dropdown(), update_chat_knowledge_base_dropdown()
320
+
321
+ def handle_delete_collection(selected_knowledge_base):
322
+ result, status = delete_collection(selected_knowledge_base)
323
+ threading.Thread(target=clear_status).start()
324
+ return result, status, update_chat_knowledge_base_dropdown()
325
+
326
+ knowledge_base_dropdown.change(
327
+ handle_knowledge_base_selection,
328
+ inputs=knowledge_base_dropdown,
329
+ outputs=[new_kb_input, file_list, chain_dropdown]
330
+ )
331
+ upload_btn.click(handle_upload, inputs=file_input, outputs=[file_list, file_list, upload_status, chat_knowledge_base_dropdown])
332
+ delete_btn.click(handle_delete, inputs=[knowledge_base_dropdown, file_list], outputs=[file_list, delete_status, chat_knowledge_base_dropdown])
333
+ vectorize_btn.click(handle_vectorize, inputs=[file_list, knowledge_base_dropdown, new_kb_input, chunk_size_dropdown, chunk_overlap_dropdown],
334
+ outputs=[gr.Textbox(visible=False), vectorize_status, knowledge_base_dropdown, chat_knowledge_base_dropdown])
335
+ delete_collection_btn.click(handle_delete_collection, inputs=knowledge_base_dropdown,
336
+ outputs=[knowledge_base_dropdown, delete_collection_status, chat_knowledge_base_dropdown])
337
+
338
+ chat_btn.click(chat_response, inputs=[model_dropdown, vector_dropdown, chat_knowledge_base_dropdown, chain_dropdown, chat_input], outputs=[chat_display, chat_input])
339
+ clear_btn.click(clear_chat, outputs=chat_display)
340
+
341
+ chat_knowledge_base_dropdown.change(
342
+ fn=lambda selected: gr.update(visible=selected != "仅使用模型"),
343
+ inputs=chat_knowledge_base_dropdown,
344
+ outputs=chain_dropdown
345
+ )
346
+
347
+ # 添加新的点击事件处理
348
+ web_search_btn.click(
349
+ ai_web_search,
350
+ inputs=[web_search_model_dropdown, web_search_input],
351
+ outputs=web_search_output
352
+ )
353
+
354
+ demo.launch(debug=True,share=True)
graph_demo_ui.py ADDED
@@ -0,0 +1,87 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # -*- coding: utf-8 -*-
2
+ from flask import Flask, render_template, request, jsonify
3
+ import json
4
+ from dotenv import load_dotenv
5
+ from langchain_community.llms import Ollama
6
+
7
+
8
+ load_dotenv()
9
+
10
+ app = Flask(__name__)
11
+
12
+ # 测试了 llama3:8b,gemma2:9b,qwen2:7b,glm4:9b,arcee-ai/arcee-agent:latest 目前来看 qwen2:7 效果最好
13
+ llm = Ollama(model="qwen2:7b")
14
+
15
+
16
+ json_example = {'edges': [{'data': {'color': '#FFA07A',
17
+ 'label': 'label 1',
18
+ 'source': 'source 1',
19
+ 'target': 'target 1'}},
20
+ {'data': {'color': '#FFA07A',
21
+ 'label': 'label 2',
22
+ 'source': 'source 2',
23
+ 'target': 'target 2'}}
24
+ ],
25
+ 'nodes': [{'data': {'color': '#FFC0CB', 'id': 'id 1', 'label': 'label 1'}},
26
+ {'data': {'color': '#90EE90', 'id': 'id 2', 'label': 'label 2'}},
27
+ {'data': {'color': '#87CEEB', 'id': 'id 3', 'label': 'label 3'}}]}
28
+
29
+
30
+
31
+ __retriever_prompt = f"""
32
+ 您是一名专门从事知识图谱创建的人工智能专家,目标是根据给定的输入或请求捕获关系。
33
+ 基于各种形式的用户输入,如段落、电子邮件、文本文件等。
34
+ 你的任务是根据输入创建一个知识图谱。
35
+ nodes必须具有label参数,并且label是来自输入的词语或短语,nodes必须具有id参数,id的格式是"id_数字",不能重复。
36
+ edges还必须有一个label参数,其中label是输入中的直接词语或短语,edges中的source和target取自nodes中的id。
37
+ 仅使用JSON进行响应,其格式可以在python中进行jsonify,并直接输入cy.add(data),包括“color”属性,以在前端显示图形。
38
+ 您可以参考给定的示例:{json_example}。存储node和edge的数组中,最后一个元素后边不要有逗号,
39
+ 确保边的目标和源与现有节点匹配。
40
+ 不要在JSON的上方和下方包含markdown三引号,直接用花括号括起来。
41
+ """
42
+
43
+
44
+ def generate_graph_info(raw_text: str) -> str | None:
45
+ """
46
+ generate graph info from raw text
47
+ :param raw_text:
48
+ :return:
49
+ """
50
+ messages = [
51
+ {"role": "system", "content": "你现在扮演信息抽取的角色,要求根据用户输入和AI的回答,正确提取出信息,记得不多对实体进行翻译。"},
52
+ {"role": "user", "content": raw_text},
53
+ {"role": "user", "content": __retriever_prompt}
54
+ ]
55
+ print("解析中....")
56
+ for i in range(3):
57
+ graph_info_result = llm.invoke(messages)
58
+ if len(graph_info_result)<10:
59
+ print("-------",i,"-------------------")
60
+ continue
61
+ else:
62
+ break
63
+ print(graph_info_result)
64
+ return graph_info_result
65
+
66
+
67
+ @app.route('/')
68
+ def index():
69
+ return render_template('index.html')
70
+
71
+
72
+ @app.route('/update_graph', methods=['POST'])
73
+ def update_graph():
74
+ raw_text = request.json.get('text', '')
75
+ try:
76
+ result = generate_graph_info(raw_text)
77
+ if '```' in result:
78
+ graph_data=json.loads(result.split('```',2)[1].replace("json", ''))
79
+ else:
80
+ graph_data=json.loads(result)
81
+ return graph_data
82
+ except Exception as e:
83
+ return {'error': f"Error parsing graph data: {str(e)}"}
84
+
85
+
86
+ if __name__ == '__main__':
87
+ app.run(host='0.0.0.0', port=7860)
requirements.txt CHANGED
@@ -1 +1,10 @@
1
- huggingface_hub==0.25.2
 
 
 
 
 
 
 
 
 
 
1
+ gradio==4.29.0
2
+ langchain-community==0.2.6
3
+ langchain==0.2.6
4
+ langchain-core==0.2.11
5
+ requests
6
+ transformers==4.41.1
7
+ unstructured==0.7.12
8
+ funasr==1.0.24
9
+ modelscope
10
+ chromadb
webui-test-graph.py ADDED
@@ -0,0 +1,283 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import threading
3
+ from Config.config import VECTOR_DB,DB_directory
4
+
5
+ if VECTOR_DB==1:
6
+ from embeding.chromadb import ChromaDB as vectorDB
7
+ vectordb = vectorDB(persist_directory=DB_directory)
8
+ elif VECTOR_DB==2:
9
+ from embeding.faissdb import FaissDB as vectorDB
10
+ vectordb = vectorDB(persist_directory=DB_directory)
11
+ from Ollama_api.ollama_api import *
12
+ from rag.rag_class import *
13
+
14
+ # 存储上传的文件
15
+ uploaded_files = []
16
+
17
+ # 模拟获取最新的知识库文件
18
+ def get_knowledge_base_files():
19
+ cl_dict = {}
20
+ cols = vectordb.get_all_collections_name()
21
+ for c_name in cols:
22
+ cl_dict[c_name] = vectordb.get_collcetion_content_files(c_name)
23
+ return cl_dict
24
+
25
+ knowledge_base_files = get_knowledge_base_files()
26
+
27
+ def upload_files(files):
28
+ if files:
29
+ new_files = [file.name for file in files]
30
+ uploaded_files.extend(new_files)
31
+ update_knowledge_base_files()
32
+ return update_file_list(), new_files, "<div style='color: green; padding: 10px; border: 2px solid green; border-radius: 5px;'>Upload successful!</div>"
33
+ update_knowledge_base_files()
34
+ return update_file_list(), [], "<div style='color: red; padding: 10px; border: 2px solid red; border-radius: 5px;'>Upload failed!</div>"
35
+
36
+ def delete_files(selected_files):
37
+ global uploaded_files
38
+ uploaded_files = [f for f in uploaded_files if f not in selected_files]
39
+ if selected_files:
40
+ update_knowledge_base_files()
41
+ return update_file_list(), "<div style='color: green; padding: 10px; border: 2px solid green; border-radius: 5px;'>Delete successful!</div>"
42
+ update_knowledge_base_files()
43
+ return update_file_list(), "<div style='color: red; padding: 10px; border: 2px solid red; border-radius: 5px;'>Delete failed!</div>"
44
+
45
+ def delete_collection(selected_knowledge_base):
46
+ if selected_knowledge_base and selected_knowledge_base != "创建知识库":
47
+ vectordb.delete_collection(selected_knowledge_base)
48
+ update_knowledge_base_files()
49
+ return update_knowledge_base_dropdown(), "<div style='color: green; padding: 10px; border: 2px solid green; border-radius: 5px;'>Collection deleted successfully!</div>"
50
+ return update_knowledge_base_dropdown(), "<div style='color: red; padding: 10px; border: 2px solid red; border-radius: 5px;'>Delete collection failed!</div>"
51
+
52
+ def create_graph(selected_files):
53
+ from Neo4j.neo4j_op import KnowledgeGraph
54
+ from Neo4j.graph_extract import update_graph
55
+ from Config.config import neo4j_host, neo4j_name, neo4j_pwd
56
+ import tqdm
57
+
58
+ kg = KnowledgeGraph(neo4j_host,neo4j_name,neo4j_pwd)
59
+ data = kg.split_files(selected_files)
60
+ for doc in tqdm.tqdm(data):
61
+ text = doc.page_content
62
+ try:
63
+ res = update_graph(text)
64
+ # 批量创建节点
65
+ nodes = kg.create_nodes("node", res["nodes"])
66
+
67
+ # 批量创建关系
68
+ relationships = kg.create_relationships([
69
+ ("node", {"name": edge["source"]}, "node", {"name": edge["target"]}, edge["label"]) for edge in res["edges"]
70
+ ])
71
+ except:
72
+ print("错误----------------------------------")
73
+
74
+
75
+ def vectorize_files(selected_files, selected_knowledge_base, new_kb_name,choice_graph, chunk_size, chunk_overlap):
76
+ if selected_files:
77
+ if selected_knowledge_base == "创建知识库":
78
+ knowledge_base = new_kb_name
79
+ vectordb.create_collection(selected_files, knowledge_base, chunk_size=chunk_size, chunk_overlap=chunk_overlap)
80
+ if choice_graph=='是':
81
+ create_graph(selected_files)
82
+ else:
83
+ knowledge_base = selected_knowledge_base
84
+ vectordb.add_chroma(selected_files, knowledge_base, chunk_size=chunk_size, chunk_overlap=chunk_overlap)
85
+ if choice_graph == '是':
86
+ create_graph(selected_files)
87
+ if knowledge_base not in knowledge_base_files:
88
+ knowledge_base_files[knowledge_base] = []
89
+ knowledge_base_files[knowledge_base].extend(selected_files)
90
+
91
+ return f"Vectorized files: {', '.join(selected_files)}\nKnowledge Base: {knowledge_base}\nUploaded Files: {', '.join(uploaded_files)}", "<div style='color: green; padding: 10px; border: 2px solid green; border-radius: 5px;'>Vectorization successful!</div>"
92
+ return "", "<div style='color: red; padding: 10px; border: 2px solid red; border-radius: 5px;'>Vectorization failed!</div>"
93
+
94
+ def update_file_list():
95
+ return gr.update(choices=uploaded_files, value=[])
96
+
97
+ def search_knowledge_base(selected_knowledge_base):
98
+ if selected_knowledge_base in knowledge_base_files:
99
+ kb_files = knowledge_base_files[selected_knowledge_base]
100
+ return gr.update(choices=kb_files, value=[])
101
+ return gr.update(choices=[], value=[])
102
+
103
+ def update_knowledge_base_files():
104
+ global knowledge_base_files
105
+ knowledge_base_files = get_knowledge_base_files()
106
+
107
+ # 处理聊天消息的函数
108
+ chat_history = []
109
+
110
+ def chat_response(model_dropdown, vector_dropdown, chat_knowledge_base_dropdown, chain_dropdown, message):
111
+ global chat_history
112
+ if message:
113
+ chat_history.append(("User", message))
114
+ if chat_knowledge_base_dropdown == "仅使用模型":
115
+ rag = RAG_class(model=model_dropdown,persist_directory=DB_directory)
116
+ answer = rag.mult_chat(chat_history)
117
+ if chat_knowledge_base_dropdown and chat_knowledge_base_dropdown != "仅使用模型":
118
+ rag = RAG_class(model=model_dropdown, embed=vector_dropdown, c_name=chat_knowledge_base_dropdown, persist_directory=DB_directory)
119
+ if chain_dropdown == "复杂召回方式":
120
+ questions = rag.decomposition_chain(message)
121
+ answer = rag.rag_chain(questions)
122
+ elif chain_dropdown == "简单召回方式":
123
+ answer = rag.simple_chain(message)
124
+ else:
125
+ answer = rag.rerank_chain(message)
126
+
127
+ response = f" {answer}"
128
+ chat_history.append(("Bot", response))
129
+ return format_chat_history(chat_history), ""
130
+
131
+ def clear_chat():
132
+ global chat_history
133
+ chat_history = []
134
+ return format_chat_history(chat_history)
135
+
136
+ def format_chat_history(history):
137
+ formatted_history = ""
138
+ for user, msg in history:
139
+ if user == "User":
140
+ formatted_history += f'''
141
+ <div style="text-align: right; margin: 10px;">
142
+ <div style="display: inline-block; background-color: #DCF8C6; padding: 10px; border-radius: 10px; max-width: 60%;">
143
+ {msg}
144
+ </div>
145
+ <b>:User</b>
146
+ </div>
147
+ '''
148
+ else:
149
+ if "```" in msg: # 检测是否包含代码片段
150
+ code_content = msg.split("```")[1]
151
+ formatted_history += f'''
152
+ <div style="text-align: left; margin: 10px;">
153
+ <b>Bot:</b>
154
+ <div style="display: inline-block; background-color: #F1F0F0; padding: 10px; border-radius: 10px; max-width: 60%;">
155
+ <pre><code>{code_content}</code></pre>
156
+ </div>
157
+ </div>
158
+ '''
159
+ else:
160
+ formatted_history += f'''
161
+ <div style="text-align: left; margin: 10px;">
162
+ <b>Bot:</b>
163
+ <div style="display: inline-block; background-color: #F1F0F0; padding: 10px; border-radius: 10px; max-width: 60%;">
164
+ {msg}
165
+ </div>
166
+ </div>
167
+ '''
168
+ return formatted_history
169
+
170
+ def clear_status():
171
+ upload_status.update("")
172
+ delete_status.update("")
173
+ vectorize_status.update("")
174
+ delete_collection_status.update("")
175
+
176
+ def handle_knowledge_base_selection(selected_knowledge_base):
177
+ if selected_knowledge_base == "创建知识库":
178
+ return gr.update(visible=True, interactive=True), gr.update(choices=[], value=[]), gr.update(visible=False)
179
+ elif selected_knowledge_base == "仅使用模型":
180
+ return gr.update(visible=False, interactive=False), gr.update(choices=[], value=[]), gr.update(visible=False)
181
+ else:
182
+ return gr.update(visible=False, interactive=False), search_knowledge_base(selected_knowledge_base), gr.update(visible=True)
183
+
184
+ def update_knowledge_base_dropdown():
185
+ global knowledge_base_files
186
+ choices = ["创建知识库"] + list(knowledge_base_files.keys())
187
+ return gr.update(choices=choices)
188
+
189
+ def update_chat_knowledge_base_dropdown():
190
+ global knowledge_base_files
191
+ choices = ["仅使用模型"] + list(knowledge_base_files.keys())
192
+ return gr.update(choices=choices)
193
+
194
+ # 创建 Gradio 界面
195
+ with gr.Blocks() as demo:
196
+ with gr.Column():
197
+ # 添加标题
198
+ title = gr.HTML("<h1 style='text-align: center; font-size: 32px; font-weight: bold;'>RAG精致系统</h1>")
199
+ # 添加公告栏
200
+ announcement = gr.HTML("<div style='text-align: center; font-size: 18px; color: red;'>公告栏: 欢迎使用RAG精致系统</div>")
201
+
202
+ with gr.Tabs():
203
+ with gr.TabItem("知识库"):
204
+ knowledge_base_dropdown = gr.Dropdown(choices=["创建知识库"] + list(knowledge_base_files.keys()),
205
+ label="选择知识库")
206
+ new_kb_input = gr.Textbox(label="输入新的知识库名称", visible=False, interactive=True)
207
+ choice_graph = gr.Radio(choices=["否", "是"], value="否",label="是否同时提取知识图谱(会比较慢)")
208
+ file_input = gr.Files(label="Upload files")
209
+ upload_btn = gr.Button("Upload")
210
+ file_list = gr.CheckboxGroup(label="Uploaded Files")
211
+ delete_btn = gr.Button("Delete Selected Files")
212
+ with gr.Row():
213
+ chunk_size_dropdown = gr.Dropdown(choices=[50, 100, 200, 300, 500, 700], label="chunk_size", value=200)
214
+ chunk_overlap_dropdown = gr.Dropdown(choices=[20, 50, 100, 200], label="chunk_overlap", value=50)
215
+ vectorize_btn = gr.Button("Vectorize Selected Files")
216
+ delete_collection_btn = gr.Button("Delete Collection")
217
+ upload_status = gr.HTML()
218
+ delete_status = gr.HTML()
219
+ vectorize_status = gr.HTML()
220
+ delete_collection_status = gr.HTML()
221
+
222
+ with gr.TabItem("Chat"):
223
+ with gr.Row():
224
+ model_dropdown = gr.Dropdown(choices=get_llm(), label="模型")
225
+ vector_dropdown = gr.Dropdown(choices=get_embeding_model(), label="向量")
226
+ chat_knowledge_base_dropdown = gr.Dropdown(choices=["仅使用模型"] + vectordb.get_all_collections_name(), label="知识库")
227
+ chain_dropdown = gr.Dropdown(choices=["复杂召回方式", "简单召回方式","rerank"], label="chain方式", visible=False)
228
+ chat_display = gr.HTML(label="Chat History")
229
+ chat_input = gr.Textbox(label="Type a message")
230
+ chat_btn = gr.Button("Send")
231
+ clear_btn = gr.Button("Clear Chat History")
232
+
233
+ def handle_upload(files):
234
+ upload_result, new_files, status = upload_files(files)
235
+ threading.Thread(target=clear_status).start()
236
+ return upload_result, new_files, status, update_chat_knowledge_base_dropdown()
237
+
238
+ def handle_delete(selected_knowledge_base, selected_files):
239
+ tmp = []
240
+ cols_files_tmp = vectordb.get_collcetion_content_files(c_name=selected_knowledge_base)
241
+ for i in selected_files:
242
+ if i in cols_files_tmp:
243
+ tmp.append(i)
244
+ del cols_files_tmp
245
+ if tmp:
246
+ vectordb.del_files(tmp, c_name=selected_knowledge_base)
247
+ del tmp
248
+ delete_result, status = delete_files(selected_files)
249
+ threading.Thread(target=clear_status).start()
250
+ return delete_result, status, update_chat_knowledge_base_dropdown()
251
+
252
+ def handle_vectorize(selected_files, selected_knowledge_base, new_kb_name, choice_graph,chunk_size, chunk_overlap):
253
+ vectorize_result, status = vectorize_files(selected_files, selected_knowledge_base, new_kb_name, choice_graph,chunk_size, chunk_overlap)
254
+ threading.Thread(target=clear_status).start()
255
+ return vectorize_result, status, update_knowledge_base_dropdown(), update_chat_knowledge_base_dropdown()
256
+
257
+ def handle_delete_collection(selected_knowledge_base):
258
+ result, status = delete_collection(selected_knowledge_base)
259
+ threading.Thread(target=clear_status).start()
260
+ return result, status, update_chat_knowledge_base_dropdown()
261
+
262
+ knowledge_base_dropdown.change(
263
+ handle_knowledge_base_selection,
264
+ inputs=knowledge_base_dropdown,
265
+ outputs=[new_kb_input, file_list, chain_dropdown]
266
+ )
267
+ upload_btn.click(handle_upload, inputs=file_input, outputs=[file_list, file_list, upload_status, chat_knowledge_base_dropdown])
268
+ delete_btn.click(handle_delete, inputs=[knowledge_base_dropdown, file_list], outputs=[file_list, delete_status, chat_knowledge_base_dropdown])
269
+ vectorize_btn.click(handle_vectorize, inputs=[file_list, knowledge_base_dropdown, new_kb_input,choice_graph, chunk_size_dropdown, chunk_overlap_dropdown],
270
+ outputs=[gr.Textbox(visible=False), vectorize_status, knowledge_base_dropdown, chat_knowledge_base_dropdown])
271
+ delete_collection_btn.click(handle_delete_collection, inputs=knowledge_base_dropdown,
272
+ outputs=[knowledge_base_dropdown, delete_collection_status, chat_knowledge_base_dropdown])
273
+
274
+ chat_btn.click(chat_response, inputs=[model_dropdown, vector_dropdown, chat_knowledge_base_dropdown, chain_dropdown, chat_input], outputs=[chat_display, chat_input])
275
+ clear_btn.click(clear_chat, outputs=chat_display)
276
+
277
+ chat_knowledge_base_dropdown.change(
278
+ fn=lambda selected: gr.update(visible=selected != "仅使用模型"),
279
+ inputs=chat_knowledge_base_dropdown,
280
+ outputs=chain_dropdown
281
+ )
282
+
283
+ demo.launch(debug=True,share=True)
webui-test.py ADDED
@@ -0,0 +1,354 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import threading
3
+ import asyncio
4
+ import logging
5
+ from concurrent.futures import ThreadPoolExecutor
6
+ from functools import lru_cache
7
+ import requests
8
+ import json
9
+
10
+ # 假设这些是您的自定义模块,需要根据实际情况进行调整
11
+ from Config.config import VECTOR_DB, DB_directory
12
+ from Ollama_api.ollama_api import *
13
+ from rag.rag_class import *
14
+
15
+ # 设置日志
16
+ logging.basicConfig(level=logging.INFO)
17
+ logger = logging.getLogger(__name__)
18
+
19
+ # 根据VECTOR_DB选择合适的向量数据库
20
+ if VECTOR_DB == 1:
21
+ from embeding.chromadb import ChromaDB as vectorDB
22
+ vectordb = vectorDB(persist_directory=DB_directory)
23
+ elif VECTOR_DB == 2:
24
+ from embeding.faissdb import FaissDB as vectorDB
25
+ vectordb = vectorDB(persist_directory=DB_directory)
26
+ elif VECTOR_DB == 3:
27
+ from embeding.elasticsearchStore import ElsStore as vectorDB
28
+ vectordb = vectorDB()
29
+
30
+ # 存储上传的文件
31
+ uploaded_files = []
32
+
33
+ @lru_cache(maxsize=100)
34
+ def get_knowledge_base_files():
35
+ cl_dict = {}
36
+ cols = vectordb.get_all_collections_name()
37
+ for c_name in cols:
38
+ cl_dict[c_name] = vectordb.get_collcetion_content_files(c_name)
39
+ return cl_dict
40
+
41
+ knowledge_base_files = get_knowledge_base_files()
42
+
43
+ def upload_files(files):
44
+ if files:
45
+ new_files = [file.name for file in files]
46
+ uploaded_files.extend(new_files)
47
+ update_knowledge_base_files()
48
+ logger.info(f"Uploaded files: {new_files}")
49
+ return update_file_list(), new_files, "<div style='color: green; padding: 10px; border: 2px solid green; border-radius: 5px;'>Upload successful!</div>"
50
+ update_knowledge_base_files()
51
+ return update_file_list(), [], "<div style='color: red; padding: 10px; border: 2px solid red; border-radius: 5px;'>Upload failed!</div>"
52
+
53
+ def delete_files(selected_files):
54
+ global uploaded_files
55
+ uploaded_files = [f for f in uploaded_files if f not in selected_files]
56
+ if selected_files:
57
+ update_knowledge_base_files()
58
+ logger.info(f"Deleted files: {selected_files}")
59
+ return update_file_list(), "<div style='color: green; padding: 10px; border: 2px solid green; border-radius: 5px;'>Delete successful!</div>"
60
+ update_knowledge_base_files()
61
+ return update_file_list(), "<div style='color: red; padding: 10px; border: 2px solid red; border-radius: 5px;'>Delete failed!</div>"
62
+
63
+ def delete_collection(selected_knowledge_base):
64
+ if selected_knowledge_base and selected_knowledge_base != "创建知识库":
65
+ vectordb.delete_collection(selected_knowledge_base)
66
+ update_knowledge_base_files()
67
+ logger.info(f"Deleted collection: {selected_knowledge_base}")
68
+ return update_knowledge_base_dropdown(), "<div style='color: green; padding: 10px; border: 2px solid green; border-radius: 5px;'>Collection deleted successfully!</div>"
69
+ return update_knowledge_base_dropdown(), "<div style='color: red; padding: 10px; border: 2px solid red; border-radius: 5px;'>Delete collection failed!</div>"
70
+
71
+ async def async_vectorize_files(selected_files, selected_knowledge_base, new_kb_name, chunk_size, chunk_overlap):
72
+ if selected_files:
73
+ if selected_knowledge_base == "创建知识库":
74
+ knowledge_base = new_kb_name
75
+ vectordb.create_collection(selected_files, knowledge_base, chunk_size=chunk_size, chunk_overlap=chunk_overlap)
76
+ else:
77
+ knowledge_base = selected_knowledge_base
78
+ vectordb.add_chroma(selected_files, knowledge_base, chunk_size=chunk_size, chunk_overlap=chunk_overlap)
79
+
80
+ if knowledge_base not in knowledge_base_files:
81
+ knowledge_base_files[knowledge_base] = []
82
+ knowledge_base_files[knowledge_base].extend(selected_files)
83
+
84
+ logger.info(f"Vectorized files: {selected_files} for knowledge base: {knowledge_base}")
85
+ await asyncio.sleep(0) # 允许其他任务执行
86
+ return f"Vectorized files: {', '.join(selected_files)}\nKnowledge Base: {knowledge_base}\nUploaded Files: {', '.join(uploaded_files)}", "<div style='color: green; padding: 10px; border: 2px solid green; border-radius: 5px;'>Vectorization successful!</div>"
87
+ return "", "<div style='color: red; padding: 10px; border: 2px solid red; border-radius: 5px;'>Vectorization failed!</div>"
88
+
89
+ def update_file_list():
90
+ return gr.update(choices=uploaded_files, value=[])
91
+
92
+ def search_knowledge_base(selected_knowledge_base):
93
+ if selected_knowledge_base in knowledge_base_files:
94
+ kb_files = knowledge_base_files[selected_knowledge_base]
95
+ return gr.update(choices=kb_files, value=[])
96
+ return gr.update(choices=[], value=[])
97
+
98
+ def update_knowledge_base_files():
99
+ global knowledge_base_files
100
+ knowledge_base_files = get_knowledge_base_files()
101
+
102
+ # 处理聊天消息的函数
103
+ chat_history = []
104
+
105
+ def safe_chat_response(model_dropdown, vector_dropdown, chat_knowledge_base_dropdown, chain_dropdown, message):
106
+ try:
107
+ return chat_response(model_dropdown, vector_dropdown, chat_knowledge_base_dropdown, chain_dropdown, message)
108
+ except Exception as e:
109
+ logger.error(f"Error in chat response: {str(e)}")
110
+ return f"<div style='color: red;'>Error: {str(e)}</div>", ""
111
+
112
+ def chat_response(model_dropdown, vector_dropdown, chat_knowledge_base_dropdown, chain_dropdown, message):
113
+ global chat_history
114
+ if message:
115
+ chat_history.append(("User", message))
116
+ if chat_knowledge_base_dropdown == "仅使用模型":
117
+ rag = RAG_class(model=model_dropdown,persist_directory=DB_directory)
118
+ answer = rag.mult_chat(chat_history)
119
+ if chat_knowledge_base_dropdown and chat_knowledge_base_dropdown != "仅使用模型":
120
+ rag = RAG_class(model=model_dropdown, embed=vector_dropdown, c_name=chat_knowledge_base_dropdown, persist_directory=DB_directory)
121
+ if chain_dropdown == "复杂召回方式":
122
+ questions = rag.decomposition_chain(message)
123
+ answer = rag.rag_chain(questions)
124
+ elif chain_dropdown == "简单召回方式":
125
+ answer = rag.simple_chain(message)
126
+ else:
127
+ answer = rag.rerank_chain(message)
128
+
129
+ response = f" {answer}"
130
+ chat_history.append(("Bot", response))
131
+ return format_chat_history(chat_history), ""
132
+
133
+ def clear_chat():
134
+ global chat_history
135
+ chat_history = []
136
+ return format_chat_history(chat_history)
137
+
138
+ def format_chat_history(history):
139
+ formatted_history = ""
140
+ for user, msg in history:
141
+ if user == "User":
142
+ formatted_history += f'''
143
+ <div style="text-align: right; margin: 10px;">
144
+ <div style="display: inline-block; background-color: #DCF8C6; padding: 10px; border-radius: 10px; max-width: 60%;">
145
+ {msg}
146
+ </div>
147
+ <b>:User</b>
148
+ </div>
149
+ '''
150
+ else:
151
+ if "```" in msg: # 检测是否包含代码片段
152
+ code_content = msg.split("```")[1]
153
+ formatted_history += f'''
154
+ <div style="text-align: left; margin: 10px;">
155
+ <b>Bot:</b>
156
+ <div style="display: inline-block; background-color: #F1F0F0; padding: 10px; border-radius: 10px; max-width: 60%;">
157
+ <pre><code>{code_content}</code></pre>
158
+ </div>
159
+ </div>
160
+ '''
161
+ else:
162
+ formatted_history += f'''
163
+ <div style="text-align: left; margin: 10px;">
164
+ <b>Bot:</b>
165
+ <div style="display: inline-block; background-color: #F1F0F0; padding: 10px; border-radius: 10px; max-width: 60%;">
166
+ {msg}
167
+ </div>
168
+ </div>
169
+ '''
170
+ return formatted_history
171
+
172
+ def clear_status():
173
+ upload_status.update("")
174
+ delete_status.update("")
175
+ vectorize_status.update("")
176
+ delete_collection_status.update("")
177
+
178
+ def handle_knowledge_base_selection(selected_knowledge_base):
179
+ if selected_knowledge_base == "创建知识库":
180
+ return gr.update(visible=True, interactive=True), gr.update(choices=[], value=[]), gr.update(visible=False)
181
+ elif selected_knowledge_base == "仅使用模型":
182
+ return gr.update(visible=False, interactive=False), gr.update(choices=[], value=[]), gr.update(visible=False)
183
+ else:
184
+ return gr.update(visible=False, interactive=False), search_knowledge_base(selected_knowledge_base), gr.update(visible=True)
185
+
186
+ def update_knowledge_base_dropdown():
187
+ global knowledge_base_files
188
+ choices = ["创建知识库"] + list(knowledge_base_files.keys())
189
+ return gr.update(choices=choices)
190
+
191
+ def update_chat_knowledge_base_dropdown():
192
+ global knowledge_base_files
193
+ choices = ["仅使用模型"] + list(knowledge_base_files.keys())
194
+ return gr.update(choices=choices)
195
+
196
+
197
+ # SearxNG搜索函数
198
+ def search_searxng(query):
199
+ searxng_url = 'http://localhost:8080/search' # 替换为你的SearxNG实例URL
200
+ params = {
201
+ 'q': query,
202
+ 'format': 'json'
203
+ }
204
+ response = requests.get(searxng_url, params=params)
205
+ response.raise_for_status()
206
+ return response.json()
207
+
208
+
209
+ # Ollama总结函数
210
+ def summarize_with_ollama(model_dropdown,text, question):
211
+ prompt = """
212
+ 根据下边的内容,回答用户问题,
213
+ 内容为:‘{0}‘\n
214
+ 问题为:{1}
215
+ """.format(text, question)
216
+ ollama_url = 'http://localhost:11434/api/generate' # 替换为你的Ollama实例URL
217
+ data = {
218
+ 'model': model_dropdown,
219
+ "prompt": prompt,
220
+ "stream": False
221
+ }
222
+ response = requests.post(ollama_url, json=data)
223
+ response.raise_for_status()
224
+ return response.json()
225
+
226
+
227
+ # 处理函数
228
+ def ai_web_search(model_dropdown,user_query):
229
+ # 使用SearxNG进行搜索
230
+ search_results = search_searxng(user_query)
231
+ search_texts = [result['title'] + "\n" + result['content'] for result in search_results['results']]
232
+ combined_text = "\n\n".join(search_texts)
233
+
234
+ # 使用Ollama进行总结
235
+ summary = summarize_with_ollama(model_dropdown,combined_text, user_query)
236
+ # print(summary)
237
+ # 返回结果
238
+ return summary['response']
239
+ # 添加新的函数来处理AI网络搜索
240
+ # def ai_web_search(model_dropdown, query):
241
+ # try:
242
+ # # 这里添加实际的网络搜索和AI处理逻辑
243
+ # # 这只是一个示例,您需要根据实际情况实现
244
+ # search_result = f"搜索结果: {query}"
245
+ # ai_response = f"AI回答: 基于搜索结果,对于'{query}'的回答是..."
246
+ # return f"{search_result}\n\n{ai_response}"
247
+ # except Exception as e:
248
+ # logger.error(f"Error in AI web search: {str(e)}")
249
+ # return f"<div style='color: red;'>Error: {str(e)}</div>"
250
+
251
+ # 创建 Gradio 界面
252
+ with gr.Blocks() as demo:
253
+ with gr.Column():
254
+ # 添加标题
255
+ title = gr.HTML("<h1 style='text-align: center; font-size: 32px; font-weight: bold;'>RAG精致系统</h1>")
256
+ # 添加公告栏
257
+ announcement = gr.HTML("<div style='text-align: center; font-size: 18px; color: red;'>公告栏: 欢迎使用RAG精致系统,一个适合学习、使用、自主扩展的【检索增强生成】系统!<br/>公众号:世界大模型</div>")
258
+
259
+ with gr.Tabs():
260
+ with gr.TabItem("知识库"):
261
+ knowledge_base_dropdown = gr.Dropdown(choices=["创建知识库"] + list(knowledge_base_files.keys()),
262
+ label="选择知识库")
263
+ new_kb_input = gr.Textbox(label="输入新的知识库名称", visible=False, interactive=True)
264
+ file_input = gr.Files(label="Upload files")
265
+ upload_btn = gr.Button("Upload")
266
+ file_list = gr.CheckboxGroup(label="Uploaded Files")
267
+ delete_btn = gr.Button("Delete Selected Files")
268
+ with gr.Row():
269
+ chunk_size_dropdown = gr.Dropdown(choices=[50, 100, 200, 300, 500, 700], label="chunk_size", value=200)
270
+ chunk_overlap_dropdown = gr.Dropdown(choices=[20, 50, 100, 200], label="chunk_overlap", value=50)
271
+ vectorize_btn = gr.Button("Vectorize Selected Files")
272
+ delete_collection_btn = gr.Button("Delete Collection")
273
+ upload_status = gr.HTML()
274
+ delete_status = gr.HTML()
275
+ vectorize_status = gr.HTML()
276
+ delete_collection_status = gr.HTML()
277
+
278
+ with gr.TabItem("Chat"):
279
+ with gr.Row():
280
+ model_dropdown = gr.Dropdown(choices=get_llm(), label="模型")
281
+ vector_dropdown = gr.Dropdown(choices=get_embeding_model(), label="向量")
282
+ chat_knowledge_base_dropdown = gr.Dropdown(choices=["仅使用模型"] + vectordb.get_all_collections_name(), label="知识库")
283
+ chain_dropdown = gr.Dropdown(choices=["复杂召回方式", "简单召回方式","rerank"], label="chain方式", visible=False)
284
+ chat_display = gr.HTML(label="Chat History")
285
+ chat_input = gr.Textbox(label="Type a message")
286
+ chat_btn = gr.Button("Send")
287
+ clear_btn = gr.Button("Clear Chat History")
288
+
289
+ with gr.TabItem("AI网络搜索"):
290
+ with gr.Row():
291
+ web_search_model_dropdown = gr.Dropdown(choices=get_llm(), label="模型")
292
+ web_search_output = gr.Textbox(label="搜索结果和AI回答", lines=10)
293
+ web_search_input = gr.Textbox(label="输入搜索查询")
294
+
295
+ web_search_btn = gr.Button("搜索")
296
+
297
+ def handle_upload(files):
298
+ upload_result, new_files, status = upload_files(files)
299
+ threading.Thread(target=clear_status).start()
300
+ return upload_result, new_files, status, update_chat_knowledge_base_dropdown()
301
+
302
+ def handle_delete(selected_knowledge_base, selected_files):
303
+ tmp = []
304
+ cols_files_tmp = vectordb.get_collcetion_content_files(c_name=selected_knowledge_base)
305
+ for i in selected_files:
306
+ if i in cols_files_tmp:
307
+ tmp.append(i)
308
+ del cols_files_tmp
309
+ if tmp:
310
+ vectordb.del_files(tmp, c_name=selected_knowledge_base)
311
+ del tmp
312
+ delete_result, status = delete_files(selected_files)
313
+ threading.Thread(target=clear_status).start()
314
+ return delete_result, status, update_chat_knowledge_base_dropdown()
315
+
316
+ def handle_vectorize(selected_files, selected_knowledge_base, new_kb_name, chunk_size, chunk_overlap):
317
+ vectorize_result, status = asyncio.run(async_vectorize_files(selected_files, selected_knowledge_base, new_kb_name, chunk_size, chunk_overlap))
318
+ threading.Thread(target=clear_status).start()
319
+ return vectorize_result, status, update_knowledge_base_dropdown(), update_chat_knowledge_base_dropdown()
320
+
321
+ def handle_delete_collection(selected_knowledge_base):
322
+ result, status = delete_collection(selected_knowledge_base)
323
+ threading.Thread(target=clear_status).start()
324
+ return result, status, update_chat_knowledge_base_dropdown()
325
+
326
+ knowledge_base_dropdown.change(
327
+ handle_knowledge_base_selection,
328
+ inputs=knowledge_base_dropdown,
329
+ outputs=[new_kb_input, file_list, chain_dropdown]
330
+ )
331
+ upload_btn.click(handle_upload, inputs=file_input, outputs=[file_list, file_list, upload_status, chat_knowledge_base_dropdown])
332
+ delete_btn.click(handle_delete, inputs=[knowledge_base_dropdown, file_list], outputs=[file_list, delete_status, chat_knowledge_base_dropdown])
333
+ vectorize_btn.click(handle_vectorize, inputs=[file_list, knowledge_base_dropdown, new_kb_input, chunk_size_dropdown, chunk_overlap_dropdown],
334
+ outputs=[gr.Textbox(visible=False), vectorize_status, knowledge_base_dropdown, chat_knowledge_base_dropdown])
335
+ delete_collection_btn.click(handle_delete_collection, inputs=knowledge_base_dropdown,
336
+ outputs=[knowledge_base_dropdown, delete_collection_status, chat_knowledge_base_dropdown])
337
+
338
+ chat_btn.click(chat_response, inputs=[model_dropdown, vector_dropdown, chat_knowledge_base_dropdown, chain_dropdown, chat_input], outputs=[chat_display, chat_input])
339
+ clear_btn.click(clear_chat, outputs=chat_display)
340
+
341
+ chat_knowledge_base_dropdown.change(
342
+ fn=lambda selected: gr.update(visible=selected != "仅使用模型"),
343
+ inputs=chat_knowledge_base_dropdown,
344
+ outputs=chain_dropdown
345
+ )
346
+
347
+ # 添加新的点击事件处理
348
+ web_search_btn.click(
349
+ ai_web_search,
350
+ inputs=[web_search_model_dropdown, web_search_input],
351
+ outputs=web_search_output
352
+ )
353
+
354
+ demo.launch(debug=True,share=True)