Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available:
5.28.0
metadata
title: Transcription
emoji: π
colorFrom: yellow
colorTo: pink
sdk: gradio
sdk_version: 5.15.0
app_file: app.py
pinned: false
short_description: This tool is intended to help transcribing interviews.
Audio Transcription App
A Gradio-based web application for transcribing audio files (MP3 or M4A) using OpenAI's Whisper model. Perfect for transcribing interviews and long audio recordings with features like silence removal and audio chunking.
Features
- Multiple Audio File Support: Process multiple MP3 or M4A files simultaneously
- Silence Removal: Option to remove silence from audio to reduce processing time and improve accuracy
- Audio Chunking: Split long audio files into manageable chunks for better processing
- Multiple Language Support: Supports German (de), English (en), French (fr), Spanish (es), and Italian (it)
- Multiple Whisper Models: Choose from various Whisper model sizes (tiny to large-v3-turbo) based on your needs
- Detailed Output: Get both full transcriptions and segment-wise transcriptions with timestamps
- Download Results: All processed files and transcripts are provided in a convenient ZIP file
Setup
- Clone the repository
- Install the required dependencies:
pip install -r requirements.txt
- Make sure you have ffmpeg installed on your system
Usage
- Run the application:
python app.py
- Open the provided local URL in your web browser
- Upload your audio file(s)
- Configure the settings:
- Enable/disable silence removal
- Enable/disable audio chunking
- Select the Whisper model size
- Choose the target language
- Click "Process" to start transcription
- View the results and download the ZIP file containing all processed files
Settings
Silence Removal
- Minimum Silence Length: 100-2000ms (default: 500ms)
- Silence Threshold: -70 to -30dB (default: -50dB)
Chunking
- Chunk Duration: 60-3600 seconds (default: 600 seconds/10 minutes)
- FFmpeg Path: Path to ffmpeg executable (default: "ffmpeg")
Transcription
- Model Size: Choose from tiny, base, small, medium, large, large-v2, large-v3, turbo, or large-v3-turbo
- Language: German (de), English (en), French (fr), Spanish (es), Italian (it)
Output
- Full Transcription: Complete text of the audio file
- Segmented Transcription: Text segments with timestamps
- ZIP File: Contains:
- Processed audio files
- Individual transcript files
- Combined transcript file
Deployment on Hugging Face Spaces
- Create a new Space on Hugging Face
- Choose "Gradio" as the SDK
- Upload the following files:
- app.py
- requirements.txt
- The app will automatically deploy and be available at your Space's URL
Requirements
- Python 3.7+
- ffmpeg
- See requirements.txt for Python package dependencies
License
This project is open source and available under the MIT License.