Spaces:
Runtime error
Runtime error
title: Qwen2.5 Omni 7B Demo | |
emoji: 🏆 | |
colorFrom: indigo | |
colorTo: gray | |
sdk: gradio | |
sdk_version: 5.23.1 | |
app_file: app.py | |
pinned: false | |
license: mit | |
short_description: A space exploring omni modality capabilities | |
# Qwen2.5-Omni Multimodal Chat Demo | |
This Space demonstrates the capabilities of Qwen2.5-Omni, an end-to-end multimodal model that can perceive and generate text, images, audio, and video. | |
## Features | |
- **Omni-modal Understanding**: Process text, images, audio, and video inputs | |
- **Multimodal Responses**: Generate both text and natural speech outputs | |
- **Real-time Interaction**: Stream responses as they're generated | |
- **Customizable Voice**: Choose between male and female voice outputs | |
## How to Use | |
1. **Text Input**: Type your message in the text box and click "Send Text" | |
2. **Multimodal Input**: | |
- Upload images, audio files, or videos | |
- Optionally add accompanying text | |
- Click "Send Multimodal Input" | |
3. **Voice Settings**: | |
- Toggle audio output on/off | |
- Select preferred voice type | |
## Examples | |
Try these interactions: | |
- Upload an image and ask "Describe what you see" | |
- Upload an audio clip and ask "What is being said here?" | |
- Upload a video and ask "What's happening in this video?" | |
- Ask complex questions like "Explain quantum computing in simple terms" | |
## Technical Details | |
This demo uses: | |
- Qwen2.5-Omni-7B model | |
- FlashAttention-2 for accelerated inference | |
- Gradio for the interactive interface | |