BSJ2004 commited on
Commit
21dd833
Β·
verified Β·
1 Parent(s): b0ab805

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -143
README.md CHANGED
@@ -1,143 +1,14 @@
1
- # News Summarization and Text-to-Speech Application
2
-
3
- A web-based application that extracts news articles related to companies, performs sentiment analysis, conducts comparative analysis, and generates a text-to-speech output in Hindi.
4
-
5
- ## Features
6
-
7
- - **News Extraction**: Scrapes at least 10 unique news articles about a given company using BeautifulSoup
8
- - **Sentiment Analysis**: Analyzes the sentiment of each article (positive, negative, neutral)
9
- - **Comparative Analysis**: Compares sentiment across articles to derive insights
10
- - **Text-to-Speech**: Converts summarized content to Hindi speech
11
- - **User Interface**: Simple web interface built with Streamlit
12
- - **API Communication**: Backend and frontend communicate through APIs
13
-
14
- ## Project Structure
15
-
16
- ```
17
- .
18
- β”œβ”€β”€ app.py # Main Streamlit application
19
- β”œβ”€β”€ api.py # API endpoints
20
- β”œβ”€β”€ utils.py # Utility functions for scraping, sentiment analysis, etc.
21
- β”œβ”€β”€ healthcheck.py # Script to verify all dependencies and services
22
- β”œβ”€β”€ requirements.txt # Project dependencies
23
- β”œβ”€β”€ Dockerfile # Docker configuration for deployment
24
- β”œβ”€β”€ Spacefile # Hugging Face Spaces configuration
25
- └── README.md # Project documentation
26
- ```
27
-
28
- ## Setup Instructions
29
-
30
- 1. **Clone the repository**:
31
- ```
32
- git clone https://github.com/yourusername/news-summarization-tts.git
33
- cd news-summarization-tts
34
- ```
35
-
36
- 2. **Create a virtual environment** (recommended):
37
- ```
38
- python -m venv venv
39
- source venv/bin/activate # On Windows: venv\Scripts\activate
40
- ```
41
-
42
- 3. **Install dependencies**:
43
- ```
44
- pip install -r requirements.txt
45
- ```
46
-
47
- 4. **Install system dependencies** (for text-to-speech functionality):
48
- - On Ubuntu/Debian:
49
- ```
50
- sudo apt-get install espeak ffmpeg
51
- ```
52
- - On Windows:
53
- Download and install espeak from http://espeak.sourceforge.net/download.html
54
-
55
- 5. **Run the healthcheck** (to verify all dependencies are working):
56
- ```
57
- python healthcheck.py
58
- ```
59
-
60
- 6. **Run the API server**:
61
- ```
62
- uvicorn api:app --reload
63
- ```
64
-
65
- 7. **Run the Streamlit application** (in a separate terminal):
66
- ```
67
- streamlit run app.py
68
- ```
69
-
70
- ## Models Used
71
-
72
- - **News Summarization**: Extractive summarization using NLTK and NetworkX
73
- - **Sentiment Analysis**: VADER for sentiment analysis and Hugging Face Transformers
74
- - **Translation**: Google Translate API via deep-translator library
75
- - **Text-to-Speech**: Google Text-to-Speech (gTTS) and pyttsx3 as fallback for Hindi conversion
76
-
77
- ## API Documentation
78
-
79
- ### Endpoints
80
-
81
- - `POST /api/get_news`: Fetches news articles about a company
82
- - Request body: `{"company_name": "Tesla"}`
83
- - Returns a list of articles with metadata
84
-
85
- - `POST /api/analyze_sentiment`: Performs sentiment analysis on articles
86
- - Request body: `{"articles": [article_list]}`
87
- - Returns sentiment analysis for each article
88
-
89
- - `POST /api/generate_speech`: Converts text to Hindi speech
90
- - Request body: `{"text": "summarized_text"}`
91
- - Returns a URL to the generated audio file
92
-
93
- - `POST /api/complete_analysis`: Performs complete analysis including fetching news, sentiment analysis, and generating speech
94
- - Request body: `{"company_name": "Tesla"}`
95
- - Returns complete analysis results
96
-
97
- ## Assumptions & Limitations
98
-
99
- - The application scrapes publicly available news articles that don't require JavaScript rendering
100
- - Sentiment analysis accuracy depends on the model used and may not capture context-specific nuances
101
- - Hindi translation and TTS quality may vary based on technical terms
102
- - The application requires an internet connection to fetch news articles and use cloud-based services
103
-
104
- ## Troubleshooting
105
-
106
- If you encounter any issues:
107
-
108
- 1. Run the healthcheck script to verify all dependencies are working:
109
- ```
110
- python healthcheck.py
111
- ```
112
-
113
- 2. Check that you have all the required system dependencies installed (espeak, ffmpeg).
114
-
115
- 3. If you encounter issues with specific components:
116
- - Translation service requires an internet connection
117
- - Text-to-speech uses gTTS by default, but falls back to pyttsx3 if needed
118
- - Transformer models may take time to download on first run
119
-
120
- ## Deployment
121
-
122
- This application is deployed on Hugging Face Spaces: [Link to deployment]
123
-
124
- ### Using Docker
125
-
126
- You can also run the application using Docker:
127
-
128
- ```
129
- docker build -t news-summarization-tts .
130
- docker run -p 8501:8501 -p 8000:8000 news-summarization-tts
131
- ```
132
-
133
- ## Future Improvements
134
-
135
- - Add support for more languages
136
- - Implement advanced NLP techniques for better summarization
137
- - Improve the user interface with more interactive visualizations
138
- - Add historical data analysis for tracking sentiment over time
139
- - Enhance TTS quality with dedicated Hindi speech models
140
-
141
- ## License
142
-
143
- MIT
 
1
+ ---
2
+ title: News Summarization and TTS
3
+ emoji: πŸ“°
4
+ colorFrom: blue
5
+ colorTo: indigo
6
+ sdk: streamlit
7
+ sdk_version: 1.27.0
8
+ app_file: app_spaces.py
9
+ pinned: false
10
+ ---
11
+
12
+ # News Summarization and Text-to-Speech Application
13
+
14
+ This application extracts news articles about companies, performs sentiment analysis, conducts comparative analysis, and generates a text-to-speech output in Hindi.