aimeri's picture
Add application file
039d869
---
title: Qwen2.5 Omni 7B Demo
emoji: 🏆
colorFrom: indigo
colorTo: gray
sdk: gradio
sdk_version: 5.23.1
app_file: app.py
pinned: false
license: mit
short_description: A space exploring omni modality capabilities
---
# Qwen2.5-Omni Multimodal Chat Demo
This Space demonstrates the capabilities of Qwen2.5-Omni, an end-to-end multimodal model that can perceive and generate text, images, audio, and video.
## Features
- **Omni-modal Understanding**: Process text, images, audio, and video inputs
- **Multimodal Responses**: Generate both text and natural speech outputs
- **Real-time Interaction**: Stream responses as they're generated
- **Customizable Voice**: Choose between male and female voice outputs
## How to Use
1. **Text Input**: Type your message in the text box and click "Send Text"
2. **Multimodal Input**:
- Upload images, audio files, or videos
- Optionally add accompanying text
- Click "Send Multimodal Input"
3. **Voice Settings**:
- Toggle audio output on/off
- Select preferred voice type
## Examples
Try these interactions:
- Upload an image and ask "Describe what you see"
- Upload an audio clip and ask "What is being said here?"
- Upload a video and ask "What's happening in this video?"
- Ask complex questions like "Explain quantum computing in simple terms"
## Technical Details
This demo uses:
- Qwen2.5-Omni-7B model
- FlashAttention-2 for accelerated inference
- Gradio for the interactive interface