Feelings to Emoji: Technical Reference

This document provides technical details about the implementation of the Feelings to Emoji application.

Project Structure

The application is organized into several Python modules:

app.py - Main application file with Gradio interface
emoji_processor.py - Core processing logic for emoji matching
config.py - Configuration settings
utils.py - Utility functions
generate_embeddings.py - Standalone tool to pre-generate embeddings

Embedding Models

The system uses the following sentence embedding models from the Sentence Transformers library:

Model Key	Model ID	Size	Description
mpnet	all-mpnet-base-v2	110M	Balanced, great general-purpose model
gte	thenlper/gte-large	335M	Context-rich, good for emotion & nuance
bge	BAAI/bge-large-en-v1.5	350M	Tuned for ranking & high-precision similarity

Emoji Matching Algorithm

The application uses cosine similarity between sentence embeddings to match text with emojis:

For each emoji category (emotion and event):
- Embed descriptions using the selected model
- Calculate cosine similarity between the input text embedding and each emoji description embedding
- Return the emoji with the highest similarity score
The embeddings are pre-computed and cached to improve performance:
- Stored as pickle files in the embeddings/ directory
- Generated using generate_embeddings.py
- Loaded at startup to minimize processing time

Module Reference

`config.py`

Contains configuration settings including:

CONFIG: Dictionary with basic application settings (model name, file paths, etc.)
EMBEDDING_MODELS: Dictionary defining the available embedding models

`utils.py`

Utility functions including:

setup_logging(): Configures application logging
kitchen_txt_to_dict(filepath): Parses emoji dictionary files
save_embeddings_to_pickle(embeddings, filepath): Saves embeddings to pickle files
load_embeddings_from_pickle(filepath): Loads embeddings from pickle files
get_embeddings_pickle_path(model_id, emoji_type): Generates consistent paths for embedding files

`emoji_processor.py`

Core processing logic:

EmojiProcessor: Main class for emoji matching and processing
- __init__(model_name=None, model_key=None, use_cached_embeddings=True): Initializes the processor with a specific model
- load_emoji_dictionaries(emotion_file, item_file): Loads emoji dictionaries from text files
- switch_model(model_key): Switches to a different embedding model
- sentence_to_emojis(sentence): Processes text to find matching emojis and generate mashup
- find_top_emojis(embedding, emoji_embeddings, top_n=1): Finds top matching emojis using cosine similarity

`app.py`

Gradio interface:

EmojiMashupApp: Main application class
- create_interface(): Creates the Gradio interface
- process_with_model(model_selection, text, use_cached_embeddings): Processes text with selected model
- get_random_example(): Gets a random example sentence for demonstration

`generate_embeddings.py`

Standalone utility to pre-generate embeddings:

generate_embeddings_for_model(model_key, model_info): Generates embeddings for a specific model
main(): Main function that processes all models and saves embeddings

Emoji Data Files

google-emoji-kitchen-emotion.txt: Emotion emojis with descriptions
google-emoji-kitchen-item.txt: Event/object emojis with descriptions
google-emoji-kitchen-compatible.txt: Compatibility information for emoji combinations

Embedding Cache Structure

The embeddings/ directory contains pre-generated embeddings in pickle format:

[model_id]_emotion.pkl: Embeddings for emotion emojis
[model_id]_event.pkl: Embeddings for event/object emojis

API Usage Examples

Using the EmojiProcessor Directly

from emoji_processor import EmojiProcessor

# Initialize with default model (mpnet)
processor = EmojiProcessor()
processor.load_emoji_dictionaries()

# Process a sentence
emotion, event, image = processor.sentence_to_emojis("I'm feeling happy today!")
print(f"Emotion emoji: {emotion}")
print(f"Event emoji: {event}")
# image contains the PIL Image object of the mashup

Switching Models

# Switch to a different model
processor.switch_model("gte")

# Process with the new model
emotion, event, image = processor.sentence_to_emojis("I'm feeling anxious about tomorrow.")

Performance Considerations

Embedding generation is computationally intensive but only happens once per model
Using cached embeddings significantly improves response time
Larger models (GTE, BGE) may provide better accuracy but require more resources
The MPNet model offers a good balance of performance and accuracy for most use cases