Spaces:

atolat30
/

pythonic-rag

Sleeping

App Files Files Community

atolat30 commited on 17 days ago

Commit

15620ae

1 Parent(s): b5ace3c

Improve context retrieval and prompt templates

Browse files

Files changed (2) hide show

README.md +16 -0
app.py +18 -9

README.md CHANGED Viewed

@@ -359,6 +359,22 @@ DeepSeek-R1 and DeepSeek-R1-Zero differ primarily in their performance capabilit
 (BAD VIBES!!!)
 I don't know the answer.
 Does this application pass your vibe check? Are there any immediate pitfalls you're noticing?

 (BAD VIBES!!!)
 I don't know the answer.
+More details
+```
+Using 457 words of context
+Final messages being sent to the model:
+System prompt:
+{'role': 'system', 'content': 'You are a helpful AI assistant that answers questions based on the provided context. \nYour task is to:\n1. Carefully read and understand the context\n2. Answer the user\'s question using ONLY the information from the context\n3. If the answer cannot be found in the context, say "I cannot find the answer in the provided context"\n4. If you find partial information, share what you found and indicate if more information might be needed\n\nRemember: Only use information from the provided context to answer questions.'}
+User prompt:
+{'role': 'user', 'content': 'Context:\nKumar, F. Song, N. Siegel, L. Wang, A. Creswell, G. Irving, and I. Higgins. Solving math word problems with process-and outcome-based feedback. arXiv preprint arXiv:2211.14275, 2022. P . Wang, L. Li, Z. Shao, R. Xu, D. Dai, Y. Li, D. Chen, Y. Wu, and Z. Sui. Math-shepherd: A label- free step-by-step verifier for llms in mathematical reasoning. arXiv preprint arXiv:2312.08935 , 2023. X. Wang, J. Wei, D. Schuurmans, Q. Le, E. Chi, S. Narang, A. Chowdhery, and D. Zhou. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171, 2022. Y. Wang, X. Ma, G. Zhang, Y. Ni, A. Chandra, S. Guo, W. Ren, A. Arulraj, X. He, Z. Jiang, T. Li, M. Ku, K. Wang, A. Zhuang, R. Fan, X. Yue, and W. Chen. Mmlu-pro: A more robust and challenging multi-task language understanding benchmark. CoRR , abs/2406.01574, 2024. URL https://doi.org/10.48550/arXiv.2406.01574 . C. S. Xia, Y. Deng, S. Dunn, and L. Zhang. Agentless: Demystifying llm-based software engineering\nagents. arXiv preprint, 2024. H. Xin, Z. Z. Ren, J. Song, Z. Shao, W. Zhao, H. Wang, B. Liu, L. Zhang, X. Lu, Q. Du, W. Gao, Q. Zhu, D. Yang, Z. Gou, Z. F. Wu, F. Luo, and C. Ruan. Deepseek-prover-v1.5: Harnessing proof assistant feedback for reinforcement learning and monte-carlo tree search, 2024. URL https://arxiv.org/abs/2408.08152 . J. Zhou, T. Lu, S. Mishra, S. Brahma, S. Basu, Y. Luan, D. Zhou, and L. Hou. Instruction-following evaluation for large language models. arXiv preprint arXiv:2311.07911, 2023. 19 Appendix A. Contributions and Acknowledgments Core Contributors Daya Guo Dejian Yang Haowei Zhang Junxiao Song Ruoyu Zhang Runxin Xu Qihao Zhu Shirong Ma Peiyi Wang Xiao Bi Xiaokang Zhang Xingkai Yu Yu Wu Z.F. Wu Zhibin Gou Zhihong Shao Zhuoshu Li Ziyi Gao Contributors Aixin Liu Bing Xue Bingxuan Wang Bochao Wu Bei Feng Chengda Lu Chenggang Zhao Chengqi Deng Chong Ruan Damai Dai Deli Chen Dongjie Ji Erhang Li Fangyun Lin Fucong Dai Fuli Luo* Guangbo Hao Guanting Chen Guowei Li\nGong, N. Duan, and T. Baldwin. CMMLU: Measur- ing massive multitask language understanding in Chinese. arXiv preprint arXiv:2306.09212 , 2023. T. Li, W.-L. Chiang, E. Frick, L. Dunlap, T. Wu, B. Zhu, J. E. Gonzalez, and I. Stoica. From crowdsourced data to high-quality benchmarks: Arena-hard and benchbuilder pipeline. arXiv preprint arXiv:2406.11939, 2024. H. Lightman, V . Kosaraju, Y. Burda, H. Edwards, B. Baker, T. Lee, J. Leike, J. Schulman, I. Sutskever, and K. Cobbe. Let’s verify step by step. arXiv preprint arXiv:2305.20050, 2023. B. Y. Lin. ZeroEval: A Unified Framework for Evaluating Language Models, July 2024. URL https://github.com/WildEval/ZeroEval . MAA. American invitational mathematics examination - aime. In American Invitational Mathematics Examination -AIME 2024 , February 2024. URL https://maa.org/math -competitions/american-invitational-mathematics-examination-aime . OpenAI. Hello GPT-4o, 2024a. URL https://openai.com/index/hello-gpt-4o/ . OpenAI. Learning to reason\n\n\nQuestion:\nWhat is this paper about?\n'}
+Retrieved 3 relevant contexts
+2025-04-15 02:06:19 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
+```
 Does this application pass your vibe check? Are there any immediate pitfalls you're noticing?

app.py CHANGED Viewed

@@ -27,9 +27,12 @@ user_prompt_template = """\
 Context:
 {context}
 Question:
 {question}
-"""
 user_role_prompt = UserRolePrompt(user_prompt_template)
 class RetrievalAugmentedQAPipeline:
@@ -38,23 +41,29 @@ class RetrievalAugmentedQAPipeline:
         self.vector_db_retriever = vector_db_retriever
     async def arun_pipeline(self, user_query: str):
-        # Get more contexts but limit the total length
-        context_list = self.vector_db_retriever.search_by_text(user_query, k=3)  # Reduced from 6 to 3
         print("\nRetrieved contexts:")
         for i, (context, score) in enumerate(context_list):
             print(f"\nContext {i+1} (score: {score:.3f}):")
-            print(context[:200] + "..." if len(context) > 200 else context)
         # Limit total context length to approximately 3000 tokens (12000 characters)
         context_prompt = ""
         total_length = 0
         max_length = 12000  # Reduced from 24000 to 12000
-        for context in context_list:
-            if total_length + len(context[0]) > max_length:
-                break
-            context_prompt += context[0] + "\n"
-            total_length += len(context[0])
         print(f"\nUsing {len(context_prompt.split())} words of context")

 Context:
 {context}
+Based on the above context, please answer the following question. If the answer cannot be found in the context, say "I cannot find the answer in the provided context." If you find partial information, share what you found and indicate if more information might be needed.
 Question:
 {question}
+Please provide a clear and concise answer based ONLY on the information in the context above."""
 user_role_prompt = UserRolePrompt(user_prompt_template)
 class RetrievalAugmentedQAPipeline:
         self.vector_db_retriever = vector_db_retriever
     async def arun_pipeline(self, user_query: str):
+        # Get more contexts with a broader search
+        print("\nSearching for relevant contexts...")
+        context_list = self.vector_db_retriever.search_by_text(user_query, k=5)  # Increased from 3 to 5
         print("\nRetrieved contexts:")
         for i, (context, score) in enumerate(context_list):
             print(f"\nContext {i+1} (score: {score:.3f}):")
+            print(context[:500] + "..." if len(context) > 500 else context)  # Show more context
         # Limit total context length to approximately 3000 tokens (12000 characters)
         context_prompt = ""
         total_length = 0
         max_length = 12000  # Reduced from 24000 to 12000
+        # Sort contexts by score before truncating
+        sorted_contexts = sorted(context_list, key=lambda x: x[1], reverse=True)
+        for context, score in sorted_contexts:
+            if total_length + len(context) > max_length:
+                print(f"\nSkipping context with score {score:.3f} due to length limit")
+                continue
+            context_prompt += context + "\n"
+            total_length += len(context)
         print(f"\nUsing {len(context_prompt.split())} words of context")