Spaces:
Runtime error
Runtime error
File size: 2,424 Bytes
aa14c8b f72042c f165d25 aa14c8b b0299fc aa14c8b b0299fc aa14c8b b0299fc aa14c8b f165d25 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
---
title: Visual Question Answering (VQA) System
emoji: 🏞️
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: 1.43.1
app_file: app.py
pinned: false
---
# Visual Question Answering (VQA) System
A multi-modal AI application that allows users to upload images and ask questions about them. This project uses pre-trained models from Hugging Face to analyze images and answer natural language questions.
## Features
- Upload images in common formats (jpg, png, etc.)
- Ask questions about image content in natural language
- Get AI-generated answers based on image content
- User-friendly Streamlit interface
- Support for various types of questions (objects, attributes, counting, etc.)
## Technical Stack
- **Python**: Main programming language
- **PyTorch & Transformers**: Deep learning frameworks for running the models
- **Streamlit**: Interactive web application framework
- **HuggingFace Models**: Pre-trained visual question answering models
- **PIL**: Image processing
## Setup Instructions
1. Clone this repository:
```
git clone
cd visual-question-answering
```
2. Create a virtual environment (recommended):
```
python -m venv venv
# On Windows
venv\Scripts\activate
# On macOS/Linux
source venv/bin/activate
```
3. Install dependencies:
```
pip install -r requirements.txt
```
4. Run the application:
```
python app.py
```
Or directly with Streamlit:
```
streamlit run app.py
```
5. Open a web browser and go to `http://localhost:8501`
## Usage
1. Upload an image using the file upload area
2. Type your question about the image in the text field
3. Select a model from the sidebar (BLIP or ViLT)
4. Click "Get Answer" to get an AI-generated response
5. View the answer displayed on the right side of the screen
## Models Used
This application uses the following pre-trained models from Hugging Face:
- **BLIP**: For general visual question answering with free-form answers
- **ViLT**: For detailed understanding of image content and yes/no questions
## Project Structure
- `models/`: Contains model handling code
- `utils/`: Utility functions for image processing and more
- `static/`: Static files including uploaded images
- `app.py`: Script to run the application
-
## Acknowledgments
- Hugging Face for their excellent pre-trained models
- The open-source community for various libraries used in this project |