File size: 2,424 Bytes
aa14c8b
 
 
 
 
f72042c
f165d25
aa14c8b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b0299fc
aa14c8b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b0299fc
aa14c8b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b0299fc
 
aa14c8b
 
 
f165d25
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
---
title: Visual Question Answering (VQA) System
emoji: 🏞️
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: 1.43.1
app_file: app.py
pinned: false
---
# Visual Question Answering (VQA) System

A multi-modal AI application that allows users to upload images and ask questions about them. This project uses pre-trained models from Hugging Face to analyze images and answer natural language questions.

## Features

- Upload images in common formats (jpg, png, etc.)
- Ask questions about image content in natural language
- Get AI-generated answers based on image content
- User-friendly Streamlit interface
- Support for various types of questions (objects, attributes, counting, etc.)

## Technical Stack

- **Python**: Main programming language
- **PyTorch & Transformers**: Deep learning frameworks for running the models
- **Streamlit**: Interactive web application framework
- **HuggingFace Models**: Pre-trained visual question answering models
- **PIL**: Image processing

## Setup Instructions

1. Clone this repository:
   ```
   git clone 
   cd visual-question-answering
   ```

2. Create a virtual environment (recommended):
   ```
   python -m venv venv
   # On Windows
   venv\Scripts\activate
   # On macOS/Linux
   source venv/bin/activate
   ```

3. Install dependencies:
   ```
   pip install -r requirements.txt
   ```

4. Run the application:
   ```
   python app.py
   ```
   
   Or directly with Streamlit:
   ```
   streamlit run app.py
   ```

5. Open a web browser and go to `http://localhost:8501`

## Usage

1. Upload an image using the file upload area
2. Type your question about the image in the text field
3. Select a model from the sidebar (BLIP or ViLT)
4. Click "Get Answer" to get an AI-generated response
5. View the answer displayed on the right side of the screen

## Models Used

This application uses the following pre-trained models from Hugging Face:
- **BLIP**: For general visual question answering with free-form answers
- **ViLT**: For detailed understanding of image content and yes/no questions

## Project Structure

- `models/`: Contains model handling code
- `utils/`: Utility functions for image processing and more
- `static/`: Static files including uploaded images
- `app.py`: Script to run the application
- 
## Acknowledgments

- Hugging Face for their excellent pre-trained models
- The open-source community for various libraries used in this project