--- title: Qwen2.5 Omni 7B Demo emoji: 🏆 colorFrom: indigo colorTo: gray sdk: gradio sdk_version: 5.23.1 app_file: app.py pinned: false license: mit short_description: A space exploring omni modality capabilities --- # Qwen2.5-Omni Multimodal Chat Demo This Space demonstrates the capabilities of Qwen2.5-Omni, an end-to-end multimodal model that can perceive and generate text, images, audio, and video. ## Features - **Omni-modal Understanding**: Process text, images, audio, and video inputs - **Multimodal Responses**: Generate both text and natural speech outputs - **Real-time Interaction**: Stream responses as they're generated - **Customizable Voice**: Choose between male and female voice outputs ## How to Use 1. **Text Input**: Type your message in the text box and click "Send Text" 2. **Multimodal Input**: - Upload images, audio files, or videos - Optionally add accompanying text - Click "Send Multimodal Input" 3. **Voice Settings**: - Toggle audio output on/off - Select preferred voice type ## Examples Try these interactions: - Upload an image and ask "Describe what you see" - Upload an audio clip and ask "What is being said here?" - Upload a video and ask "What's happening in this video?" - Ask complex questions like "Explain quantum computing in simple terms" ## Technical Details This demo uses: - Qwen2.5-Omni-7B model - FlashAttention-2 for accelerated inference - Gradio for the interactive interface