Feelings_to_Emoji / REFERENCE.md
Dan Mo
Add comprehensive technical reference documentation for the Feelings to Emoji application
975f207

A newer version of the Gradio SDK is available: 5.28.0

Upgrade

Feelings to Emoji: Technical Reference

This document provides technical details about the implementation of the Feelings to Emoji application.

Project Structure

The application is organized into several Python modules:

  • app.py - Main application file with Gradio interface
  • emoji_processor.py - Core processing logic for emoji matching
  • config.py - Configuration settings
  • utils.py - Utility functions
  • generate_embeddings.py - Standalone tool to pre-generate embeddings

Embedding Models

The system uses the following sentence embedding models from the Sentence Transformers library:

Model Key Model ID Size Description
mpnet all-mpnet-base-v2 110M Balanced, great general-purpose model
gte thenlper/gte-large 335M Context-rich, good for emotion & nuance
bge BAAI/bge-large-en-v1.5 350M Tuned for ranking & high-precision similarity

Emoji Matching Algorithm

The application uses cosine similarity between sentence embeddings to match text with emojis:

  1. For each emoji category (emotion and event):

    • Embed descriptions using the selected model
    • Calculate cosine similarity between the input text embedding and each emoji description embedding
    • Return the emoji with the highest similarity score
  2. The embeddings are pre-computed and cached to improve performance:

    • Stored as pickle files in the embeddings/ directory
    • Generated using generate_embeddings.py
    • Loaded at startup to minimize processing time

Module Reference

config.py

Contains configuration settings including:

  • CONFIG: Dictionary with basic application settings (model name, file paths, etc.)
  • EMBEDDING_MODELS: Dictionary defining the available embedding models

utils.py

Utility functions including:

  • setup_logging(): Configures application logging
  • kitchen_txt_to_dict(filepath): Parses emoji dictionary files
  • save_embeddings_to_pickle(embeddings, filepath): Saves embeddings to pickle files
  • load_embeddings_from_pickle(filepath): Loads embeddings from pickle files
  • get_embeddings_pickle_path(model_id, emoji_type): Generates consistent paths for embedding files

emoji_processor.py

Core processing logic:

  • EmojiProcessor: Main class for emoji matching and processing
    • __init__(model_name=None, model_key=None, use_cached_embeddings=True): Initializes the processor with a specific model
    • load_emoji_dictionaries(emotion_file, item_file): Loads emoji dictionaries from text files
    • switch_model(model_key): Switches to a different embedding model
    • sentence_to_emojis(sentence): Processes text to find matching emojis and generate mashup
    • find_top_emojis(embedding, emoji_embeddings, top_n=1): Finds top matching emojis using cosine similarity

app.py

Gradio interface:

  • EmojiMashupApp: Main application class
    • create_interface(): Creates the Gradio interface
    • process_with_model(model_selection, text, use_cached_embeddings): Processes text with selected model
    • get_random_example(): Gets a random example sentence for demonstration

generate_embeddings.py

Standalone utility to pre-generate embeddings:

  • generate_embeddings_for_model(model_key, model_info): Generates embeddings for a specific model
  • main(): Main function that processes all models and saves embeddings

Emoji Data Files

  • google-emoji-kitchen-emotion.txt: Emotion emojis with descriptions
  • google-emoji-kitchen-item.txt: Event/object emojis with descriptions
  • google-emoji-kitchen-compatible.txt: Compatibility information for emoji combinations

Embedding Cache Structure

The embeddings/ directory contains pre-generated embeddings in pickle format:

  • [model_id]_emotion.pkl: Embeddings for emotion emojis
  • [model_id]_event.pkl: Embeddings for event/object emojis

API Usage Examples

Using the EmojiProcessor Directly

from emoji_processor import EmojiProcessor

# Initialize with default model (mpnet)
processor = EmojiProcessor()
processor.load_emoji_dictionaries()

# Process a sentence
emotion, event, image = processor.sentence_to_emojis("I'm feeling happy today!")
print(f"Emotion emoji: {emotion}")
print(f"Event emoji: {event}")
# image contains the PIL Image object of the mashup

Switching Models

# Switch to a different model
processor.switch_model("gte")

# Process with the new model
emotion, event, image = processor.sentence_to_emojis("I'm feeling anxious about tomorrow.")

Performance Considerations

  • Embedding generation is computationally intensive but only happens once per model
  • Using cached embeddings significantly improves response time
  • Larger models (GTE, BGE) may provide better accuracy but require more resources
  • The MPNet model offers a good balance of performance and accuracy for most use cases