atolat30 commited on
Commit
15620ae
·
1 Parent(s): b5ace3c

Improve context retrieval and prompt templates

Browse files
Files changed (2) hide show
  1. README.md +16 -0
  2. app.py +18 -9
README.md CHANGED
@@ -359,6 +359,22 @@ DeepSeek-R1 and DeepSeek-R1-Zero differ primarily in their performance capabilit
359
  (BAD VIBES!!!)
360
  I don't know the answer.
361
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
362
 
363
  Does this application pass your vibe check? Are there any immediate pitfalls you're noticing?
364
 
 
359
  (BAD VIBES!!!)
360
  I don't know the answer.
361
 
362
+ More details
363
+
364
+ ```
365
+ Using 457 words of context
366
+
367
+ Final messages being sent to the model:
368
+
369
+ System prompt:
370
+ {'role': 'system', 'content': 'You are a helpful AI assistant that answers questions based on the provided context. \nYour task is to:\n1. Carefully read and understand the context\n2. Answer the user\'s question using ONLY the information from the context\n3. If the answer cannot be found in the context, say "I cannot find the answer in the provided context"\n4. If you find partial information, share what you found and indicate if more information might be needed\n\nRemember: Only use information from the provided context to answer questions.'}
371
+
372
+ User prompt:
373
+ {'role': 'user', 'content': 'Context:\nKumar, F. Song, N. Siegel, L. Wang, A. Creswell, G. Irving, and I. Higgins. Solving math word problems with process-and outcome-based feedback. arXiv preprint arXiv:2211.14275, 2022. P . Wang, L. Li, Z. Shao, R. Xu, D. Dai, Y. Li, D. Chen, Y. Wu, and Z. Sui. Math-shepherd: A label- free step-by-step verifier for llms in mathematical reasoning. arXiv preprint arXiv:2312.08935 , 2023. X. Wang, J. Wei, D. Schuurmans, Q. Le, E. Chi, S. Narang, A. Chowdhery, and D. Zhou. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171, 2022. Y. Wang, X. Ma, G. Zhang, Y. Ni, A. Chandra, S. Guo, W. Ren, A. Arulraj, X. He, Z. Jiang, T. Li, M. Ku, K. Wang, A. Zhuang, R. Fan, X. Yue, and W. Chen. Mmlu-pro: A more robust and challenging multi-task language understanding benchmark. CoRR , abs/2406.01574, 2024. URL https://doi.org/10.48550/arXiv.2406.01574 . C. S. Xia, Y. Deng, S. Dunn, and L. Zhang. Agentless: Demystifying llm-based software engineering\nagents. arXiv preprint, 2024. H. Xin, Z. Z. Ren, J. Song, Z. Shao, W. Zhao, H. Wang, B. Liu, L. Zhang, X. Lu, Q. Du, W. Gao, Q. Zhu, D. Yang, Z. Gou, Z. F. Wu, F. Luo, and C. Ruan. Deepseek-prover-v1.5: Harnessing proof assistant feedback for reinforcement learning and monte-carlo tree search, 2024. URL https://arxiv.org/abs/2408.08152 . J. Zhou, T. Lu, S. Mishra, S. Brahma, S. Basu, Y. Luan, D. Zhou, and L. Hou. Instruction-following evaluation for large language models. arXiv preprint arXiv:2311.07911, 2023. 19 Appendix A. Contributions and Acknowledgments Core Contributors Daya Guo Dejian Yang Haowei Zhang Junxiao Song Ruoyu Zhang Runxin Xu Qihao Zhu Shirong Ma Peiyi Wang Xiao Bi Xiaokang Zhang Xingkai Yu Yu Wu Z.F. Wu Zhibin Gou Zhihong Shao Zhuoshu Li Ziyi Gao Contributors Aixin Liu Bing Xue Bingxuan Wang Bochao Wu Bei Feng Chengda Lu Chenggang Zhao Chengqi Deng Chong Ruan Damai Dai Deli Chen Dongjie Ji Erhang Li Fangyun Lin Fucong Dai Fuli Luo* Guangbo Hao Guanting Chen Guowei Li\nGong, N. Duan, and T. Baldwin. CMMLU: Measur- ing massive multitask language understanding in Chinese. arXiv preprint arXiv:2306.09212 , 2023. T. Li, W.-L. Chiang, E. Frick, L. Dunlap, T. Wu, B. Zhu, J. E. Gonzalez, and I. Stoica. From crowdsourced data to high-quality benchmarks: Arena-hard and benchbuilder pipeline. arXiv preprint arXiv:2406.11939, 2024. H. Lightman, V . Kosaraju, Y. Burda, H. Edwards, B. Baker, T. Lee, J. Leike, J. Schulman, I. Sutskever, and K. Cobbe. Let’s verify step by step. arXiv preprint arXiv:2305.20050, 2023. B. Y. Lin. ZeroEval: A Unified Framework for Evaluating Language Models, July 2024. URL https://github.com/WildEval/ZeroEval . MAA. American invitational mathematics examination - aime. In American Invitational Mathematics Examination -AIME 2024 , February 2024. URL https://maa.org/math -competitions/american-invitational-mathematics-examination-aime . OpenAI. Hello GPT-4o, 2024a. URL https://openai.com/index/hello-gpt-4o/ . OpenAI. Learning to reason\n\n\nQuestion:\nWhat is this paper about?\n'}
374
+ Retrieved 3 relevant contexts
375
+ 2025-04-15 02:06:19 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
376
+ ```
377
+
378
 
379
  Does this application pass your vibe check? Are there any immediate pitfalls you're noticing?
380
 
app.py CHANGED
@@ -27,9 +27,12 @@ user_prompt_template = """\
27
  Context:
28
  {context}
29
 
 
 
30
  Question:
31
  {question}
32
- """
 
33
  user_role_prompt = UserRolePrompt(user_prompt_template)
34
 
35
  class RetrievalAugmentedQAPipeline:
@@ -38,23 +41,29 @@ class RetrievalAugmentedQAPipeline:
38
  self.vector_db_retriever = vector_db_retriever
39
 
40
  async def arun_pipeline(self, user_query: str):
41
- # Get more contexts but limit the total length
42
- context_list = self.vector_db_retriever.search_by_text(user_query, k=3) # Reduced from 6 to 3
 
 
43
  print("\nRetrieved contexts:")
44
  for i, (context, score) in enumerate(context_list):
45
  print(f"\nContext {i+1} (score: {score:.3f}):")
46
- print(context[:200] + "..." if len(context) > 200 else context)
47
 
48
  # Limit total context length to approximately 3000 tokens (12000 characters)
49
  context_prompt = ""
50
  total_length = 0
51
  max_length = 12000 # Reduced from 24000 to 12000
52
 
53
- for context in context_list:
54
- if total_length + len(context[0]) > max_length:
55
- break
56
- context_prompt += context[0] + "\n"
57
- total_length += len(context[0])
 
 
 
 
58
 
59
  print(f"\nUsing {len(context_prompt.split())} words of context")
60
 
 
27
  Context:
28
  {context}
29
 
30
+ Based on the above context, please answer the following question. If the answer cannot be found in the context, say "I cannot find the answer in the provided context." If you find partial information, share what you found and indicate if more information might be needed.
31
+
32
  Question:
33
  {question}
34
+
35
+ Please provide a clear and concise answer based ONLY on the information in the context above."""
36
  user_role_prompt = UserRolePrompt(user_prompt_template)
37
 
38
  class RetrievalAugmentedQAPipeline:
 
41
  self.vector_db_retriever = vector_db_retriever
42
 
43
  async def arun_pipeline(self, user_query: str):
44
+ # Get more contexts with a broader search
45
+ print("\nSearching for relevant contexts...")
46
+ context_list = self.vector_db_retriever.search_by_text(user_query, k=5) # Increased from 3 to 5
47
+
48
  print("\nRetrieved contexts:")
49
  for i, (context, score) in enumerate(context_list):
50
  print(f"\nContext {i+1} (score: {score:.3f}):")
51
+ print(context[:500] + "..." if len(context) > 500 else context) # Show more context
52
 
53
  # Limit total context length to approximately 3000 tokens (12000 characters)
54
  context_prompt = ""
55
  total_length = 0
56
  max_length = 12000 # Reduced from 24000 to 12000
57
 
58
+ # Sort contexts by score before truncating
59
+ sorted_contexts = sorted(context_list, key=lambda x: x[1], reverse=True)
60
+
61
+ for context, score in sorted_contexts:
62
+ if total_length + len(context) > max_length:
63
+ print(f"\nSkipping context with score {score:.3f} due to length limit")
64
+ continue
65
+ context_prompt += context + "\n"
66
+ total_length += len(context)
67
 
68
  print(f"\nUsing {len(context_prompt.split())} words of context")
69