import streamlit as st
# Custom CSS for better styling
st.markdown("""
""", unsafe_allow_html=True)
# Main Title
st.markdown('
Whisper: Advanced Speech Recognition
', unsafe_allow_html=True)
# Overview Section
st.markdown("""
The Whisper model, developed by OpenAI, was introduced in the paper Robust Speech Recognition via Large-Scale Weak Supervision. Whisper is a cutting-edge speech recognition model designed to handle a wide range of tasks by learning from an extensive dataset of 680,000 hours of multilingual and multitask audio transcripts.
Whisper's robust architecture allows it to perform well across different speech processing tasks without the need for fine-tuning. Its zero-shot transfer capabilities enable it to generalize effectively, making it a versatile tool for developers and researchers alike.
""", unsafe_allow_html=True)
# Use Cases Section
st.markdown('Use Cases
', unsafe_allow_html=True)
st.markdown("""
- Transcription Services: Automate transcription of audio files in English for media, legal, and academic purposes.
- Voice-Activated Assistants: Enhance voice command recognition in smart devices and applications.
- Broadcast Media: Provide real-time transcription and subtitling for live broadcasts.
- Multilingual Translation: Use as a base for developing multilingual speech-to-text and translation services.
""", unsafe_allow_html=True)
# How to Use Section
st.markdown('How to Use Whisper
', unsafe_allow_html=True)
st.code('''
audioAssembler = AudioAssembler() \\
.setInputCol("audio_content") \\
.setOutputCol("audio_assembler")
speechToText = WhisperForCTC \\
.pretrained("asr_whisper_small_english")\\
.setInputCols("audio_assembler") \\
.setOutputCol("text")
pipeline = Pipeline().setStages([audioAssembler, speechToText])
pipelineModel = pipeline.fit(data)
pipelineDF = pipelineModel.transform(data)
''', language='python')
st.markdown("""
This example demonstrates how to use Whisper in a Spark NLP pipeline to convert raw audio content into text. The model processes the input audio sampled at 16 kHz and outputs the corresponding text transcription, making it ideal for tasks like transcription, voice command recognition, and more.
""", unsafe_allow_html=True)
# Model Information Section
st.markdown('Model Information
', unsafe_allow_html=True)
st.markdown("""
Attribute |
Description |
Model Name |
asr_whisper_small_english |
Compatibility |
Spark NLP 5.1.4+, PySpark 3.4+ |
License |
Open Source |
Edition |
Official |
Input Labels |
[audio_assembler] |
Output Labels |
[text] |
Language |
en |
Model Size |
1.1 GB |
""", unsafe_allow_html=True)
# References Section
st.markdown('References
', unsafe_allow_html=True)
st.markdown("""
""", unsafe_allow_html=True)
# Community & Support
st.markdown('Community & Support
', unsafe_allow_html=True)
st.markdown("""
- Official Website: Documentation and examples
- Slack: Live discussion with the community and team
- GitHub: Bug reports, feature requests, and contributions
- Medium: Spark NLP articles
- YouTube: Video tutorials
""", unsafe_allow_html=True)