File size: 2,667 Bytes
68a9b68
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
# Sorbobot: Expert Finder Chatbot Documentation

## Overview

Sorbobot is a chatbot designed for Sorbonne Université to assist their administration in locating academic experts within the university. This document outlines the structure, functionality, and implementation details of Sorbobot.

### Context

Sorbobot centers around identifying experts with precision, avoiding confusion with individuals sharing similar names. It leverages HAL unique identifiers to distinguish between experts.

## System Architecture

Sorbobot operates on a Retrieval Augmented Generation (RAG) system, composed of two primary steps:

1. **Retrieval**: Identifies publications most similar to the user queries.
2. **Generation**: Produces responses based on the context extracted from relevant publications.

## Implementation Details

### Programming Language and Libraries

- **Language**: Python
- **Frontend**: Streamlit
- **Database**: PostgreSQL with pgvector for similarity search
- **NLP Processing**: langchain and GPT4all libraries

### Database

- **Postgres with pgvector**: Used for storing data and performing similarity searches based on cosine similarity metrics.

### Natural Language Processing

- **Abstracts as Data Source**: The chatbot utilizes publication abstracts to identify experts.
- **GPT4all for Word Embedding**: Converts text from author publications into word embeddings, enhancing the accuracy of expert identification.

### Retrieval Process

1. **Query Processing**: User queries are processed to extract key terms.
2. **Similarity Search**: The system searches the database using pgvector to find publications with low cosine distance to the query.
3. **Expert Identification**: The system identifies authors of these publications, ensuring unique identification of experts.

### Generation Process

1. **Context Extraction**: Relevant information is extracted from the identified publications.
2. **Response Generation**: Utilizes a LLM to generate informative responses based on the extracted context.

## User Interaction Flow

1. **Query Submission**: Users submit queries related to their expert search.
2. **Chatbot Processing**: Sorbobot processes the query, retrieves relevant publications, and identifies experts.
3. **Response Presentation**: The system presents a list of experts, including unique identifiers and relevant publication abstracts.

## Conclusion

Sorbobot is a powerful tool for Sorbonne Université, streamlining the process of finding academic experts. Its advanced NLP capabilities, combined with a robust database and intelligent retrieval-generation framework, ensure accurate and efficient expert identification.