metadata

title: Transcription
emoji: 👀
colorFrom: yellow
colorTo: pink
sdk: gradio
sdk_version: 5.15.0
app_file: app.py
pinned: false
short_description: This tool is intended to help transcribing interviews.

Audio Transcription App

A Gradio-based web application for transcribing audio files (MP3 or M4A) using OpenAI's Whisper model. Perfect for transcribing interviews and long audio recordings with features like silence removal and audio chunking.

Features

Multiple Audio File Support: Process multiple MP3 or M4A files simultaneously
Silence Removal: Option to remove silence from audio to reduce processing time and improve accuracy
Audio Chunking: Split long audio files into manageable chunks for better processing
Multiple Language Support: Supports German (de), English (en), French (fr), Spanish (es), and Italian (it)
Multiple Whisper Models: Choose from various Whisper model sizes (tiny to large-v3-turbo) based on your needs
Detailed Output: Get both full transcriptions and segment-wise transcriptions with timestamps
Download Results: All processed files and transcripts are provided in a convenient ZIP file

Setup

Clone the repository
Install the required dependencies:
```
pip install -r requirements.txt
```
Make sure you have ffmpeg installed on your system

Usage

Run the application:
```
python app.py
```
Open the provided local URL in your web browser
Upload your audio file(s)
Configure the settings:
- Enable/disable silence removal
- Enable/disable audio chunking
- Select the Whisper model size
- Choose the target language
Click "Process" to start transcription
View the results and download the ZIP file containing all processed files

Settings

Silence Removal

Minimum Silence Length: 100-2000ms (default: 500ms)
Silence Threshold: -70 to -30dB (default: -50dB)

Chunking

Chunk Duration: 60-3600 seconds (default: 600 seconds/10 minutes)
FFmpeg Path: Path to ffmpeg executable (default: "ffmpeg")

Transcription

Model Size: Choose from tiny, base, small, medium, large, large-v2, large-v3, turbo, or large-v3-turbo
Language: German (de), English (en), French (fr), Spanish (es), Italian (it)

Output

Full Transcription: Complete text of the audio file
Segmented Transcription: Text segments with timestamps
ZIP File: Contains:
- Processed audio files
- Individual transcript files
- Combined transcript file

Deployment on Hugging Face Spaces

Create a new Space on Hugging Face
Choose "Gradio" as the SDK
Upload the following files:
- app.py
- requirements.txt
The app will automatically deploy and be available at your Space's URL

Requirements

Python 3.7+
ffmpeg
See requirements.txt for Python package dependencies

License

This project is open source and available under the MIT License.

Spaces:

doyouknowmarc
/

Transcription

Sleeping