Qwen2.5-Omni-7B-Demo

Runtime error

Qwen2.5-Omni-7B-Demo / README.md

Add application file

039d869 about 1 month ago

1.46 kB

	---
	title: Qwen2.5 Omni 7B Demo
	emoji: 🏆
	colorFrom: indigo
	colorTo: gray
	sdk: gradio
	sdk_version: 5.23.1
	app_file: app.py
	pinned: false
	license: mit
	short_description: A space exploring omni modality capabilities
	---

	# Qwen2.5-Omni Multimodal Chat Demo

	This Space demonstrates the capabilities of Qwen2.5-Omni, an end-to-end multimodal model that can perceive and generate text, images, audio, and video.

	## Features

	- Omni-modal Understanding: Process text, images, audio, and video inputs
	- Multimodal Responses: Generate both text and natural speech outputs
	- Real-time Interaction: Stream responses as they're generated
	- Customizable Voice: Choose between male and female voice outputs

	## How to Use

	1. Text Input: Type your message in the text box and click "Send Text"
	2. Multimodal Input:
	- Upload images, audio files, or videos
	- Optionally add accompanying text
	- Click "Send Multimodal Input"
	3. Voice Settings:
	- Toggle audio output on/off
	- Select preferred voice type

	## Examples

	Try these interactions:
	- Upload an image and ask "Describe what you see"
	- Upload an audio clip and ask "What is being said here?"
	- Upload a video and ask "What's happening in this video?"
	- Ask complex questions like "Explain quantum computing in simple terms"

	## Technical Details

	This demo uses:
	- Qwen2.5-Omni-7B model
	- FlashAttention-2 for accelerated inference
	- Gradio for the interactive interface