Spaces:

SorbonneUniversity
/

SorboBot

Sleeping

App Files Files Community

SorboBot / docs /sorbobot.md

leo-bourrel

!feat: Import new sorbobot version

68a9b68 over 1 year ago

preview code

raw

history blame contribute delete

2.67 kB

	# Sorbobot: Expert Finder Chatbot Documentation

	## Overview

	Sorbobot is a chatbot designed for Sorbonne Université to assist their administration in locating academic experts within the university. This document outlines the structure, functionality, and implementation details of Sorbobot.

	### Context

	Sorbobot centers around identifying experts with precision, avoiding confusion with individuals sharing similar names. It leverages HAL unique identifiers to distinguish between experts.

	## System Architecture

	Sorbobot operates on a Retrieval Augmented Generation (RAG) system, composed of two primary steps:

	1. Retrieval: Identifies publications most similar to the user queries.
	2. Generation: Produces responses based on the context extracted from relevant publications.

	## Implementation Details

	### Programming Language and Libraries

	- Language: Python
	- Frontend: Streamlit
	- Database: PostgreSQL with pgvector for similarity search
	- NLP Processing: langchain and GPT4all libraries

	### Database

	- Postgres with pgvector: Used for storing data and performing similarity searches based on cosine similarity metrics.

	### Natural Language Processing

	- Abstracts as Data Source: The chatbot utilizes publication abstracts to identify experts.
	- GPT4all for Word Embedding: Converts text from author publications into word embeddings, enhancing the accuracy of expert identification.

	### Retrieval Process

	1. Query Processing: User queries are processed to extract key terms.
	2. Similarity Search: The system searches the database using pgvector to find publications with low cosine distance to the query.
	3. Expert Identification: The system identifies authors of these publications, ensuring unique identification of experts.

	### Generation Process

	1. Context Extraction: Relevant information is extracted from the identified publications.
	2. Response Generation: Utilizes a LLM to generate informative responses based on the extracted context.

	## User Interaction Flow

	1. Query Submission: Users submit queries related to their expert search.
	2. Chatbot Processing: Sorbobot processes the query, retrieves relevant publications, and identifies experts.
	3. Response Presentation: The system presents a list of experts, including unique identifiers and relevant publication abstracts.

	## Conclusion

	Sorbobot is a powerful tool for Sorbonne Université, streamlining the process of finding academic experts. Its advanced NLP capabilities, combined with a robust database and intelligent retrieval-generation framework, ensure accurate and efficient expert identification.

	# Sorbobot: Expert Finder Chatbot Documentation

	## Overview

	Sorbobot is a chatbot designed for Sorbonne Université to assist their administration in locating academic experts within the university. This document outlines the structure, functionality, and implementation details of Sorbobot.

	### Context

	Sorbobot centers around identifying experts with precision, avoiding confusion with individuals sharing similar names. It leverages HAL unique identifiers to distinguish between experts.

	## System Architecture

	Sorbobot operates on a Retrieval Augmented Generation (RAG) system, composed of two primary steps:

	1. Retrieval: Identifies publications most similar to the user queries.
	2. Generation: Produces responses based on the context extracted from relevant publications.

	## Implementation Details

	### Programming Language and Libraries

	- Language: Python
	- Frontend: Streamlit
	- Database: PostgreSQL with pgvector for similarity search
	- NLP Processing: langchain and GPT4all libraries

	### Database

	- Postgres with pgvector: Used for storing data and performing similarity searches based on cosine similarity metrics.

	### Natural Language Processing

	- Abstracts as Data Source: The chatbot utilizes publication abstracts to identify experts.
	- GPT4all for Word Embedding: Converts text from author publications into word embeddings, enhancing the accuracy of expert identification.

	### Retrieval Process

	1. Query Processing: User queries are processed to extract key terms.
	2. Similarity Search: The system searches the database using pgvector to find publications with low cosine distance to the query.
	3. Expert Identification: The system identifies authors of these publications, ensuring unique identification of experts.

	### Generation Process

	1. Context Extraction: Relevant information is extracted from the identified publications.
	2. Response Generation: Utilizes a LLM to generate informative responses based on the extracted context.

	## User Interaction Flow

	1. Query Submission: Users submit queries related to their expert search.
	2. Chatbot Processing: Sorbobot processes the query, retrieves relevant publications, and identifies experts.
	3. Response Presentation: The system presents a list of experts, including unique identifiers and relevant publication abstracts.

	## Conclusion

	Sorbobot is a powerful tool for Sorbonne Université, streamlining the process of finding academic experts. Its advanced NLP capabilities, combined with a robust database and intelligent retrieval-generation framework, ensure accurate and efficient expert identification.