Spaces:
Sleeping
Sleeping
# Sorbobot: Expert Finder Chatbot Documentation | |
## Overview | |
Sorbobot is a chatbot designed for Sorbonne Université to assist their administration in locating academic experts within the university. This document outlines the structure, functionality, and implementation details of Sorbobot. | |
### Context | |
Sorbobot centers around identifying experts with precision, avoiding confusion with individuals sharing similar names. It leverages HAL unique identifiers to distinguish between experts. | |
## System Architecture | |
Sorbobot operates on a Retrieval Augmented Generation (RAG) system, composed of two primary steps: | |
1. **Retrieval**: Identifies publications most similar to the user queries. | |
2. **Generation**: Produces responses based on the context extracted from relevant publications. | |
## Implementation Details | |
### Programming Language and Libraries | |
- **Language**: Python | |
- **Frontend**: Streamlit | |
- **Database**: PostgreSQL with pgvector for similarity search | |
- **NLP Processing**: langchain and GPT4all libraries | |
### Database | |
- **Postgres with pgvector**: Used for storing data and performing similarity searches based on cosine similarity metrics. | |
### Natural Language Processing | |
- **Abstracts as Data Source**: The chatbot utilizes publication abstracts to identify experts. | |
- **GPT4all for Word Embedding**: Converts text from author publications into word embeddings, enhancing the accuracy of expert identification. | |
### Retrieval Process | |
1. **Query Processing**: User queries are processed to extract key terms. | |
2. **Similarity Search**: The system searches the database using pgvector to find publications with low cosine distance to the query. | |
3. **Expert Identification**: The system identifies authors of these publications, ensuring unique identification of experts. | |
### Generation Process | |
1. **Context Extraction**: Relevant information is extracted from the identified publications. | |
2. **Response Generation**: Utilizes a LLM to generate informative responses based on the extracted context. | |
## User Interaction Flow | |
1. **Query Submission**: Users submit queries related to their expert search. | |
2. **Chatbot Processing**: Sorbobot processes the query, retrieves relevant publications, and identifies experts. | |
3. **Response Presentation**: The system presents a list of experts, including unique identifiers and relevant publication abstracts. | |
## Conclusion | |
Sorbobot is a powerful tool for Sorbonne Université, streamlining the process of finding academic experts. Its advanced NLP capabilities, combined with a robust database and intelligent retrieval-generation framework, ensure accurate and efficient expert identification. | |