metadata

title: Visual Question Answering (VQA) System
emoji: 🏞️
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: 1.43.1
app_file: app.py
pinned: false

Visual Question Answering (VQA) System

A multi-modal AI application that allows users to upload images and ask questions about them. This project uses pre-trained models from Hugging Face to analyze images and answer natural language questions.

Features

Upload images in common formats (jpg, png, etc.)
Ask questions about image content in natural language
Get AI-generated answers based on image content
User-friendly Streamlit interface
Support for various types of questions (objects, attributes, counting, etc.)

Technical Stack

Python: Main programming language
PyTorch & Transformers: Deep learning frameworks for running the models
Streamlit: Interactive web application framework
HuggingFace Models: Pre-trained visual question answering models
PIL: Image processing

Setup Instructions

Clone this repository:

git clone 
cd visual-question-answering

Create a virtual environment (recommended):

python -m venv venv
# On Windows
venv\Scripts\activate
# On macOS/Linux
source venv/bin/activate

Install dependencies:
```
pip install -r requirements.txt
```
Run the application:
```
python app.py
```
Or directly with Streamlit:
```
streamlit run app.py
```
Open a web browser and go to http://localhost:8501

Usage

Upload an image using the file upload area
Type your question about the image in the text field
Select a model from the sidebar (BLIP or ViLT)
Click "Get Answer" to get an AI-generated response
View the answer displayed on the right side of the screen

Models Used

This application uses the following pre-trained models from Hugging Face:

BLIP: For general visual question answering with free-form answers
ViLT: For detailed understanding of image content and yes/no questions

Project Structure

models/: Contains model handling code
utils/: Utility functions for image processing and more
static/: Static files including uploaded images
app.py: Script to run the application

Acknowledgments

Hugging Face for their excellent pre-trained models
The open-source community for various libraries used in this project