Amarthya7 commited on
Commit
aa14c8b
·
verified ·
1 Parent(s): 54c1eb6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +93 -93
README.md CHANGED
@@ -1,94 +1,94 @@
1
- ---
2
- title: Visual Question Answering (VQA) System
3
- emoji: 🏞️
4
- colorFrom: blue
5
- colorTo: purple
6
- sdk: gradio
7
- sdk_version: 5.20.1
8
- app_file: run.py
9
- pinned: false
10
- ---
11
- # Visual Question Answering (VQA) System
12
-
13
- A multi-modal AI application that allows users to upload images and ask questions about them. This project uses pre-trained models from Hugging Face to analyze images and answer natural language questions.
14
-
15
- ## Features
16
-
17
- - Upload images in common formats (jpg, png, etc.)
18
- - Ask questions about image content in natural language
19
- - Get AI-generated answers based on image content
20
- - User-friendly Streamlit interface
21
- - Support for various types of questions (objects, attributes, counting, etc.)
22
-
23
- ## Technical Stack
24
-
25
- - **Python**: Main programming language
26
- - **PyTorch & Transformers**: Deep learning frameworks for running the models
27
- - **Streamlit**: Interactive web application framework
28
- - **HuggingFace Models**: Pre-trained visual question answering models
29
- - **PIL**: Image processing
30
-
31
- ## Setup Instructions
32
-
33
- 1. Clone this repository:
34
- ```
35
- git clone https://github.com/your-username/visual-question-answering.git
36
- cd visual-question-answering
37
- ```
38
-
39
- 2. Create a virtual environment (recommended):
40
- ```
41
- python -m venv venv
42
- # On Windows
43
- venv\Scripts\activate
44
- # On macOS/Linux
45
- source venv/bin/activate
46
- ```
47
-
48
- 3. Install dependencies:
49
- ```
50
- pip install -r requirements.txt
51
- ```
52
-
53
- 4. Run the application:
54
- ```
55
- python run.py
56
- ```
57
-
58
- Or directly with Streamlit:
59
- ```
60
- streamlit run app.py
61
- ```
62
-
63
- 5. Open a web browser and go to `http://localhost:8501`
64
-
65
- ## Usage
66
-
67
- 1. Upload an image using the file upload area
68
- 2. Type your question about the image in the text field
69
- 3. Select a model from the sidebar (BLIP or ViLT)
70
- 4. Click "Get Answer" to get an AI-generated response
71
- 5. View the answer displayed on the right side of the screen
72
-
73
- ## Models Used
74
-
75
- This application uses the following pre-trained models from Hugging Face:
76
- - **BLIP**: For general visual question answering with free-form answers
77
- - **ViLT**: For detailed understanding of image content and yes/no questions
78
-
79
- ## Project Structure
80
-
81
- - `app.py`: Main Streamlit application
82
- - `models/`: Contains model handling code
83
- - `utils/`: Utility functions for image processing and more
84
- - `static/`: Static files including uploaded images
85
- - `run.py`: Script to run the application
86
-
87
- ## License
88
-
89
- This project is licensed under the MIT License - see the LICENSE file for details.
90
-
91
- ## Acknowledgments
92
-
93
- - Hugging Face for their excellent pre-trained models
94
  - The open-source community for various libraries used in this project
 
1
+ ---
2
+ title: Visual Question Answering (VQA) System
3
+ emoji: 🏞️
4
+ colorFrom: blue
5
+ colorTo: purple
6
+ sdk: gradio
7
+ sdk_version: 5.20.1
8
+ app_file: app.py
9
+ pinned: false
10
+ ---
11
+ # Visual Question Answering (VQA) System
12
+
13
+ A multi-modal AI application that allows users to upload images and ask questions about them. This project uses pre-trained models from Hugging Face to analyze images and answer natural language questions.
14
+
15
+ ## Features
16
+
17
+ - Upload images in common formats (jpg, png, etc.)
18
+ - Ask questions about image content in natural language
19
+ - Get AI-generated answers based on image content
20
+ - User-friendly Streamlit interface
21
+ - Support for various types of questions (objects, attributes, counting, etc.)
22
+
23
+ ## Technical Stack
24
+
25
+ - **Python**: Main programming language
26
+ - **PyTorch & Transformers**: Deep learning frameworks for running the models
27
+ - **Streamlit**: Interactive web application framework
28
+ - **HuggingFace Models**: Pre-trained visual question answering models
29
+ - **PIL**: Image processing
30
+
31
+ ## Setup Instructions
32
+
33
+ 1. Clone this repository:
34
+ ```
35
+ git clone https://github.com/your-username/visual-question-answering.git
36
+ cd visual-question-answering
37
+ ```
38
+
39
+ 2. Create a virtual environment (recommended):
40
+ ```
41
+ python -m venv venv
42
+ # On Windows
43
+ venv\Scripts\activate
44
+ # On macOS/Linux
45
+ source venv/bin/activate
46
+ ```
47
+
48
+ 3. Install dependencies:
49
+ ```
50
+ pip install -r requirements.txt
51
+ ```
52
+
53
+ 4. Run the application:
54
+ ```
55
+ python run.py
56
+ ```
57
+
58
+ Or directly with Streamlit:
59
+ ```
60
+ streamlit run app.py
61
+ ```
62
+
63
+ 5. Open a web browser and go to `http://localhost:8501`
64
+
65
+ ## Usage
66
+
67
+ 1. Upload an image using the file upload area
68
+ 2. Type your question about the image in the text field
69
+ 3. Select a model from the sidebar (BLIP or ViLT)
70
+ 4. Click "Get Answer" to get an AI-generated response
71
+ 5. View the answer displayed on the right side of the screen
72
+
73
+ ## Models Used
74
+
75
+ This application uses the following pre-trained models from Hugging Face:
76
+ - **BLIP**: For general visual question answering with free-form answers
77
+ - **ViLT**: For detailed understanding of image content and yes/no questions
78
+
79
+ ## Project Structure
80
+
81
+ - `app.py`: Main Streamlit application
82
+ - `models/`: Contains model handling code
83
+ - `utils/`: Utility functions for image processing and more
84
+ - `static/`: Static files including uploaded images
85
+ - `run.py`: Script to run the application
86
+
87
+ ## License
88
+
89
+ This project is licensed under the MIT License - see the LICENSE file for details.
90
+
91
+ ## Acknowledgments
92
+
93
+ - Hugging Face for their excellent pre-trained models
94
  - The open-source community for various libraries used in this project