whisperforctc / pages /Workflow & Model Overview.py
abdullahmubeen10's picture
Upload 15 files
e6f3c3d verified
raw
history blame
6.75 kB
import streamlit as st
# Custom CSS for better styling
st.markdown("""
<style>
.main-title {
font-size: 36px;
color: #4A90E2;
font-weight: bold;
text-align: center;
}
.sub-title {
font-size: 24px;
color: #4A90E2;
margin-top: 20px;
}
.section {
background-color: #f9f9f9;
padding: 15px;
border-radius: 10px;
margin-top: 20px;
}
.section p, .section ul {
color: #666666;
}
.link {
color: #4A90E2;
text-decoration: none;
}
.benchmark-table {
width: 100%;
border-collapse: collapse;
margin-top: 20px;
}
.benchmark-table th, .benchmark-table td {
border: 1px solid #ddd;
padding: 8px;
text-align: left;
}
.benchmark-table th {
background-color: #4A90E2;
color: white;
}
.benchmark-table td {
background-color: #f2f2f2;
}
</style>
""", unsafe_allow_html=True)
# Main Title
st.markdown('<div class="main-title">Whisper: Advanced Speech Recognition</div>', unsafe_allow_html=True)
# Overview Section
st.markdown("""
<div class="section">
<p>The <strong>Whisper</strong> model, developed by OpenAI, was introduced in the paper <em>Robust Speech Recognition via Large-Scale Weak Supervision</em>. Whisper is a cutting-edge speech recognition model designed to handle a wide range of tasks by learning from an extensive dataset of 680,000 hours of multilingual and multitask audio transcripts.</p>
<p>Whisper's robust architecture allows it to perform well across different speech processing tasks without the need for fine-tuning. Its zero-shot transfer capabilities enable it to generalize effectively, making it a versatile tool for developers and researchers alike.</p>
</div>
""", unsafe_allow_html=True)
# Use Cases Section
st.markdown('<div class="sub-title">Use Cases</div>', unsafe_allow_html=True)
st.markdown("""
<div class="section">
<ul>
<li><strong>Transcription Services:</strong> Automate transcription of audio files in English for media, legal, and academic purposes.</li>
<li><strong>Voice-Activated Assistants:</strong> Enhance voice command recognition in smart devices and applications.</li>
<li><strong>Broadcast Media:</strong> Provide real-time transcription and subtitling for live broadcasts.</li>
<li><strong>Multilingual Translation:</strong> Use as a base for developing multilingual speech-to-text and translation services.</li>
</ul>
</div>
""", unsafe_allow_html=True)
# How to Use Section
st.markdown('<div class="sub-title">How to Use Whisper</div>', unsafe_allow_html=True)
st.code('''
audioAssembler = AudioAssembler() \\
.setInputCol("audio_content") \\
.setOutputCol("audio_assembler")
speechToText = WhisperForCTC \\
.pretrained("asr_whisper_small_english")\\
.setInputCols("audio_assembler") \\
.setOutputCol("text")
pipeline = Pipeline().setStages([audioAssembler, speechToText])
pipelineModel = pipeline.fit(data)
pipelineDF = pipelineModel.transform(data)
''', language='python')
st.markdown("""
<div class="section">
<p>This example demonstrates how to use Whisper in a Spark NLP pipeline to convert raw audio content into text. The model processes the input audio sampled at 16 kHz and outputs the corresponding text transcription, making it ideal for tasks like transcription, voice command recognition, and more.</p>
</div>
""", unsafe_allow_html=True)
# Model Information Section
st.markdown('<div class="sub-title">Model Information</div>', unsafe_allow_html=True)
st.markdown("""
<div class="section">
<table class="benchmark-table">
<tr>
<th>Attribute</th>
<th>Description</th>
</tr>
<tr>
<td><strong>Model Name</strong></td>
<td>asr_whisper_small_english</td>
</tr>
<tr>
<td><strong>Compatibility</strong></td>
<td>Spark NLP 5.1.4+, PySpark 3.4+</td>
</tr>
<tr>
<td><strong>License</strong></td>
<td>Open Source</td>
</tr>
<tr>
<td><strong>Edition</strong></td>
<td>Official</td>
</tr>
<tr>
<td><strong>Input Labels</strong></td>
<td>[audio_assembler]</td>
</tr>
<tr>
<td><strong>Output Labels</strong></td>
<td>[text]</td>
</tr>
<tr>
<td><strong>Language</strong></td>
<td>en</td>
</tr>
<tr>
<td><strong>Model Size</strong></td>
<td>1.1 GB</td>
</tr>
</table>
</div>
""", unsafe_allow_html=True)
# References Section
st.markdown('<div class="sub-title">References</div>', unsafe_allow_html=True)
st.markdown("""
<div class="section">
<ul>
<li><a class="link" href="https://sparknlp.org/2023/10/17/asr_whisper_small_english_en.html" target="_blank">Whisper Model on Spark NLP</a></li>
<li><a class="link" href="https://huggingface.co./openai/whisper-small.en" target="_blank">Whisper Model on Hugging Face</a></li>
<li><a class="link" href="https://arxiv.org/abs/2212.04356" target="_blank">Whisper Paper</a></li>
<li><a class="link" href="https://github.com/openai/whisper" target="_blank">Whisper GitHub Repository</a></li>
</ul>
</div>
""", unsafe_allow_html=True)
# Community & Support
st.markdown('<div class="sub-title">Community & Support</div>', unsafe_allow_html=True)
st.markdown("""
<div class="section">
<ul>
<li><a class="link" href="https://sparknlp.org/" target="_blank">Official Website</a>: Documentation and examples</li>
<li><a class="link" href="https://join.slack.com/t/spark-nlp/shared_invite/zt-198dipu77-L3UWNe_AJ8xqDk0ivmih5Q" target="_blank">Slack</a>: Live discussion with the community and team</li>
<li><a class="link" href="https://github.com/JohnSnowLabs/spark-nlp" target="_blank">GitHub</a>: Bug reports, feature requests, and contributions</li>
<li><a class="link" href="https://medium.com/spark-nlp" target="_blank">Medium</a>: Spark NLP articles</li>
<li><a class="link" href="https://www.youtube.com/channel/UCmFOjlpYEhxf_wJUDuz6xxQ/videos" target="_blank">YouTube</a>: Video tutorials</li>
</ul>
</div>
""", unsafe_allow_html=True)