Spaces:
Runtime error
Runtime error
A newer version of the Streamlit SDK is available:
1.45.0
metadata
title: Visual Question Answering (VQA) System
emoji: 🏞️
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: 1.43.1
app_file: app.py
pinned: false
Visual Question Answering (VQA) System
A multi-modal AI application that allows users to upload images and ask questions about them. This project uses pre-trained models from Hugging Face to analyze images and answer natural language questions.
Features
- Upload images in common formats (jpg, png, etc.)
- Ask questions about image content in natural language
- Get AI-generated answers based on image content
- User-friendly Streamlit interface
- Support for various types of questions (objects, attributes, counting, etc.)
Technical Stack
- Python: Main programming language
- PyTorch & Transformers: Deep learning frameworks for running the models
- Streamlit: Interactive web application framework
- HuggingFace Models: Pre-trained visual question answering models
- PIL: Image processing
Setup Instructions
Clone this repository:
git clone cd visual-question-answering
Create a virtual environment (recommended):
python -m venv venv # On Windows venv\Scripts\activate # On macOS/Linux source venv/bin/activate
Install dependencies:
pip install -r requirements.txt
Run the application:
python app.py
Or directly with Streamlit:
streamlit run app.py
Open a web browser and go to
http://localhost:8501
Usage
- Upload an image using the file upload area
- Type your question about the image in the text field
- Select a model from the sidebar (BLIP or ViLT)
- Click "Get Answer" to get an AI-generated response
- View the answer displayed on the right side of the screen
Models Used
This application uses the following pre-trained models from Hugging Face:
- BLIP: For general visual question answering with free-form answers
- ViLT: For detailed understanding of image content and yes/no questions
Project Structure
models/
: Contains model handling codeutils/
: Utility functions for image processing and morestatic/
: Static files including uploaded imagesapp.py
: Script to run the application
Acknowledgments
- Hugging Face for their excellent pre-trained models
- The open-source community for various libraries used in this project