Spaces:
Sleeping
Sleeping
File size: 2,667 Bytes
68a9b68 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 |
# Sorbobot: Expert Finder Chatbot Documentation
## Overview
Sorbobot is a chatbot designed for Sorbonne Université to assist their administration in locating academic experts within the university. This document outlines the structure, functionality, and implementation details of Sorbobot.
### Context
Sorbobot centers around identifying experts with precision, avoiding confusion with individuals sharing similar names. It leverages HAL unique identifiers to distinguish between experts.
## System Architecture
Sorbobot operates on a Retrieval Augmented Generation (RAG) system, composed of two primary steps:
1. **Retrieval**: Identifies publications most similar to the user queries.
2. **Generation**: Produces responses based on the context extracted from relevant publications.
## Implementation Details
### Programming Language and Libraries
- **Language**: Python
- **Frontend**: Streamlit
- **Database**: PostgreSQL with pgvector for similarity search
- **NLP Processing**: langchain and GPT4all libraries
### Database
- **Postgres with pgvector**: Used for storing data and performing similarity searches based on cosine similarity metrics.
### Natural Language Processing
- **Abstracts as Data Source**: The chatbot utilizes publication abstracts to identify experts.
- **GPT4all for Word Embedding**: Converts text from author publications into word embeddings, enhancing the accuracy of expert identification.
### Retrieval Process
1. **Query Processing**: User queries are processed to extract key terms.
2. **Similarity Search**: The system searches the database using pgvector to find publications with low cosine distance to the query.
3. **Expert Identification**: The system identifies authors of these publications, ensuring unique identification of experts.
### Generation Process
1. **Context Extraction**: Relevant information is extracted from the identified publications.
2. **Response Generation**: Utilizes a LLM to generate informative responses based on the extracted context.
## User Interaction Flow
1. **Query Submission**: Users submit queries related to their expert search.
2. **Chatbot Processing**: Sorbobot processes the query, retrieves relevant publications, and identifies experts.
3. **Response Presentation**: The system presents a list of experts, including unique identifiers and relevant publication abstracts.
## Conclusion
Sorbobot is a powerful tool for Sorbonne Université, streamlining the process of finding academic experts. Its advanced NLP capabilities, combined with a robust database and intelligent retrieval-generation framework, ensure accurate and efficient expert identification.
|