Spaces:
Sleeping
Sleeping
title: Multi-Modal AI Demo | |
emoji: 🤖 | |
colorFrom: blue | |
colorTo: purple | |
sdk: gradio | |
sdk_version: 5.20.1 | |
app_file: app.py | |
pinned: false | |
# Multi-Modal AI Demo | |
This project demonstrates the use of multi-modal AI capabilities using Hugging Face pretrained models. The application provides the following features: | |
1. **Image Captioning**: Generate descriptive captions for images | |
2. **Visual Question Answering**: Answer questions about the content of images | |
3. **Sentiment Analysis**: Analyze the sentiment of text inputs | |
## Requirements | |
- Python 3.8+ | |
- Dependencies listed in `requirements.txt` | |
## Local Installation | |
To run this project locally: | |
1. Clone this repository | |
2. Install dependencies: | |
``` | |
pip install -r requirements.txt | |
``` | |
3. Run the application: | |
``` | |
python app.py | |
``` | |
Then open your browser and navigate to the URL shown in the terminal (typically http://127.0.0.1:7860). | |
## Deploying to Hugging Face Spaces | |
This project is configured for direct deployment to Hugging Face Spaces. The core files needed for deployment are: | |
- `app.py` - Main application file | |
- `model_utils.py` - Utility functions for model operations | |
- `requirements.txt` - Project dependencies | |
- `README.md` - This documentation file with Spaces configuration | |
## Models Used | |
This demo uses the following pretrained models from Hugging Face: | |
- Image Captioning: `nlpconnect/vit-gpt2-image-captioning` | |
- Visual Question Answering: `nlpconnect/vit-gpt2-image-captioning` (simplified) | |
- Sentiment Analysis: `distilbert-base-uncased-finetuned-sst-2-english` |