Spaces:
Sleeping
A newer version of the Gradio SDK is available:
5.28.0
Feelings to Emoji: Technical Reference
This document provides technical details about the implementation of the Feelings to Emoji application.
Project Structure
The application is organized into several Python modules:
app.py
- Main application file with Gradio interfaceemoji_processor.py
- Core processing logic for emoji matchingconfig.py
- Configuration settingsutils.py
- Utility functionsgenerate_embeddings.py
- Standalone tool to pre-generate embeddings
Embedding Models
The system uses the following sentence embedding models from the Sentence Transformers library:
Model Key | Model ID | Size | Description |
---|---|---|---|
mpnet | all-mpnet-base-v2 | 110M | Balanced, great general-purpose model |
gte | thenlper/gte-large | 335M | Context-rich, good for emotion & nuance |
bge | BAAI/bge-large-en-v1.5 | 350M | Tuned for ranking & high-precision similarity |
Emoji Matching Algorithm
The application uses cosine similarity between sentence embeddings to match text with emojis:
For each emoji category (emotion and event):
- Embed descriptions using the selected model
- Calculate cosine similarity between the input text embedding and each emoji description embedding
- Return the emoji with the highest similarity score
The embeddings are pre-computed and cached to improve performance:
- Stored as pickle files in the
embeddings/
directory - Generated using
generate_embeddings.py
- Loaded at startup to minimize processing time
- Stored as pickle files in the
Module Reference
config.py
Contains configuration settings including:
CONFIG
: Dictionary with basic application settings (model name, file paths, etc.)EMBEDDING_MODELS
: Dictionary defining the available embedding models
utils.py
Utility functions including:
setup_logging()
: Configures application loggingkitchen_txt_to_dict(filepath)
: Parses emoji dictionary filessave_embeddings_to_pickle(embeddings, filepath)
: Saves embeddings to pickle filesload_embeddings_from_pickle(filepath)
: Loads embeddings from pickle filesget_embeddings_pickle_path(model_id, emoji_type)
: Generates consistent paths for embedding files
emoji_processor.py
Core processing logic:
EmojiProcessor
: Main class for emoji matching and processing__init__(model_name=None, model_key=None, use_cached_embeddings=True)
: Initializes the processor with a specific modelload_emoji_dictionaries(emotion_file, item_file)
: Loads emoji dictionaries from text filesswitch_model(model_key)
: Switches to a different embedding modelsentence_to_emojis(sentence)
: Processes text to find matching emojis and generate mashupfind_top_emojis(embedding, emoji_embeddings, top_n=1)
: Finds top matching emojis using cosine similarity
app.py
Gradio interface:
EmojiMashupApp
: Main application classcreate_interface()
: Creates the Gradio interfaceprocess_with_model(model_selection, text, use_cached_embeddings)
: Processes text with selected modelget_random_example()
: Gets a random example sentence for demonstration
generate_embeddings.py
Standalone utility to pre-generate embeddings:
generate_embeddings_for_model(model_key, model_info)
: Generates embeddings for a specific modelmain()
: Main function that processes all models and saves embeddings
Emoji Data Files
google-emoji-kitchen-emotion.txt
: Emotion emojis with descriptionsgoogle-emoji-kitchen-item.txt
: Event/object emojis with descriptionsgoogle-emoji-kitchen-compatible.txt
: Compatibility information for emoji combinations
Embedding Cache Structure
The embeddings/
directory contains pre-generated embeddings in pickle format:
[model_id]_emotion.pkl
: Embeddings for emotion emojis[model_id]_event.pkl
: Embeddings for event/object emojis
API Usage Examples
Using the EmojiProcessor Directly
from emoji_processor import EmojiProcessor
# Initialize with default model (mpnet)
processor = EmojiProcessor()
processor.load_emoji_dictionaries()
# Process a sentence
emotion, event, image = processor.sentence_to_emojis("I'm feeling happy today!")
print(f"Emotion emoji: {emotion}")
print(f"Event emoji: {event}")
# image contains the PIL Image object of the mashup
Switching Models
# Switch to a different model
processor.switch_model("gte")
# Process with the new model
emotion, event, image = processor.sentence_to_emojis("I'm feeling anxious about tomorrow.")
Performance Considerations
- Embedding generation is computationally intensive but only happens once per model
- Using cached embeddings significantly improves response time
- Larger models (GTE, BGE) may provide better accuracy but require more resources
- The MPNet model offers a good balance of performance and accuracy for most use cases