Update README.md
Browse files
README.md
CHANGED
@@ -1,143 +1,14 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
|
12 |
-
|
13 |
-
|
14 |
-
|
15 |
-
|
16 |
-
```
|
17 |
-
.
|
18 |
-
βββ app.py # Main Streamlit application
|
19 |
-
βββ api.py # API endpoints
|
20 |
-
βββ utils.py # Utility functions for scraping, sentiment analysis, etc.
|
21 |
-
βββ healthcheck.py # Script to verify all dependencies and services
|
22 |
-
βββ requirements.txt # Project dependencies
|
23 |
-
βββ Dockerfile # Docker configuration for deployment
|
24 |
-
βββ Spacefile # Hugging Face Spaces configuration
|
25 |
-
βββ README.md # Project documentation
|
26 |
-
```
|
27 |
-
|
28 |
-
## Setup Instructions
|
29 |
-
|
30 |
-
1. **Clone the repository**:
|
31 |
-
```
|
32 |
-
git clone https://github.com/yourusername/news-summarization-tts.git
|
33 |
-
cd news-summarization-tts
|
34 |
-
```
|
35 |
-
|
36 |
-
2. **Create a virtual environment** (recommended):
|
37 |
-
```
|
38 |
-
python -m venv venv
|
39 |
-
source venv/bin/activate # On Windows: venv\Scripts\activate
|
40 |
-
```
|
41 |
-
|
42 |
-
3. **Install dependencies**:
|
43 |
-
```
|
44 |
-
pip install -r requirements.txt
|
45 |
-
```
|
46 |
-
|
47 |
-
4. **Install system dependencies** (for text-to-speech functionality):
|
48 |
-
- On Ubuntu/Debian:
|
49 |
-
```
|
50 |
-
sudo apt-get install espeak ffmpeg
|
51 |
-
```
|
52 |
-
- On Windows:
|
53 |
-
Download and install espeak from http://espeak.sourceforge.net/download.html
|
54 |
-
|
55 |
-
5. **Run the healthcheck** (to verify all dependencies are working):
|
56 |
-
```
|
57 |
-
python healthcheck.py
|
58 |
-
```
|
59 |
-
|
60 |
-
6. **Run the API server**:
|
61 |
-
```
|
62 |
-
uvicorn api:app --reload
|
63 |
-
```
|
64 |
-
|
65 |
-
7. **Run the Streamlit application** (in a separate terminal):
|
66 |
-
```
|
67 |
-
streamlit run app.py
|
68 |
-
```
|
69 |
-
|
70 |
-
## Models Used
|
71 |
-
|
72 |
-
- **News Summarization**: Extractive summarization using NLTK and NetworkX
|
73 |
-
- **Sentiment Analysis**: VADER for sentiment analysis and Hugging Face Transformers
|
74 |
-
- **Translation**: Google Translate API via deep-translator library
|
75 |
-
- **Text-to-Speech**: Google Text-to-Speech (gTTS) and pyttsx3 as fallback for Hindi conversion
|
76 |
-
|
77 |
-
## API Documentation
|
78 |
-
|
79 |
-
### Endpoints
|
80 |
-
|
81 |
-
- `POST /api/get_news`: Fetches news articles about a company
|
82 |
-
- Request body: `{"company_name": "Tesla"}`
|
83 |
-
- Returns a list of articles with metadata
|
84 |
-
|
85 |
-
- `POST /api/analyze_sentiment`: Performs sentiment analysis on articles
|
86 |
-
- Request body: `{"articles": [article_list]}`
|
87 |
-
- Returns sentiment analysis for each article
|
88 |
-
|
89 |
-
- `POST /api/generate_speech`: Converts text to Hindi speech
|
90 |
-
- Request body: `{"text": "summarized_text"}`
|
91 |
-
- Returns a URL to the generated audio file
|
92 |
-
|
93 |
-
- `POST /api/complete_analysis`: Performs complete analysis including fetching news, sentiment analysis, and generating speech
|
94 |
-
- Request body: `{"company_name": "Tesla"}`
|
95 |
-
- Returns complete analysis results
|
96 |
-
|
97 |
-
## Assumptions & Limitations
|
98 |
-
|
99 |
-
- The application scrapes publicly available news articles that don't require JavaScript rendering
|
100 |
-
- Sentiment analysis accuracy depends on the model used and may not capture context-specific nuances
|
101 |
-
- Hindi translation and TTS quality may vary based on technical terms
|
102 |
-
- The application requires an internet connection to fetch news articles and use cloud-based services
|
103 |
-
|
104 |
-
## Troubleshooting
|
105 |
-
|
106 |
-
If you encounter any issues:
|
107 |
-
|
108 |
-
1. Run the healthcheck script to verify all dependencies are working:
|
109 |
-
```
|
110 |
-
python healthcheck.py
|
111 |
-
```
|
112 |
-
|
113 |
-
2. Check that you have all the required system dependencies installed (espeak, ffmpeg).
|
114 |
-
|
115 |
-
3. If you encounter issues with specific components:
|
116 |
-
- Translation service requires an internet connection
|
117 |
-
- Text-to-speech uses gTTS by default, but falls back to pyttsx3 if needed
|
118 |
-
- Transformer models may take time to download on first run
|
119 |
-
|
120 |
-
## Deployment
|
121 |
-
|
122 |
-
This application is deployed on Hugging Face Spaces: [Link to deployment]
|
123 |
-
|
124 |
-
### Using Docker
|
125 |
-
|
126 |
-
You can also run the application using Docker:
|
127 |
-
|
128 |
-
```
|
129 |
-
docker build -t news-summarization-tts .
|
130 |
-
docker run -p 8501:8501 -p 8000:8000 news-summarization-tts
|
131 |
-
```
|
132 |
-
|
133 |
-
## Future Improvements
|
134 |
-
|
135 |
-
- Add support for more languages
|
136 |
-
- Implement advanced NLP techniques for better summarization
|
137 |
-
- Improve the user interface with more interactive visualizations
|
138 |
-
- Add historical data analysis for tracking sentiment over time
|
139 |
-
- Enhance TTS quality with dedicated Hindi speech models
|
140 |
-
|
141 |
-
## License
|
142 |
-
|
143 |
-
MIT
|
|
|
1 |
+
---
|
2 |
+
title: News Summarization and TTS
|
3 |
+
emoji: π°
|
4 |
+
colorFrom: blue
|
5 |
+
colorTo: indigo
|
6 |
+
sdk: streamlit
|
7 |
+
sdk_version: 1.27.0
|
8 |
+
app_file: app_spaces.py
|
9 |
+
pinned: false
|
10 |
+
---
|
11 |
+
|
12 |
+
# News Summarization and Text-to-Speech Application
|
13 |
+
|
14 |
+
This application extracts news articles about companies, performs sentiment analysis, conducts comparative analysis, and generates a text-to-speech output in Hindi.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|