nari-labs
/

Dia-1.6B

Text-to-Speech

Safetensors

English

model_hub_mixin

pytorch_model_hub_mixin

Model card Files Files and versions Community

clem HF Staff commited on 5 days ago

Commit

bd25172

verified ·

1 Parent(s): ed6eb51

add to the readme

Browse files

Files changed (1) hide show

README.md +110 -4

README.md CHANGED Viewed

@@ -5,8 +5,114 @@ tags:
 - model_hub_mixin
 - pytorch_model_hub_mixin
 ---
-This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
-- Code: https://github.com/nari-labs/dia
-- Paper: [More Information Needed]
-- Docs: [More Information Needed]

 - model_hub_mixin
 - pytorch_model_hub_mixin
 ---
+# Dia: Open-Weight Text-to-Speech Dialogue Model (1.6B)
+**Dia** is a 1.6B parameter open-weight text-to-speech model developed by Nari Labs.
+It generates highly realistic *dialogue* directly from transcripts, with support for both spoken and **nonverbal** cues (e.g., `(laughs)`, `(sighs)`), and can be **conditioned on audio** for emotional tone or voice consistency.
+Currently, Dia supports **English** and is optimized for GPU inference. This model is designed for research and educational purposes only.
+---
+## 🔥 Try It Out
+- 🖥️ [ZeroGPU demo on Spaces](https://huggingface.co/spaces/nari-labs/Dia-1.6B)
+- 📊 [Comparison demos](https://yummy-fir-7a4.notion.site/dia) with ElevenLabs and Sesame CSM-1B
+- 🎧 Try voice remixing and conversations with a larger version — [join the waitlist](https://tally.so/r/meokbo)
+- 💬 [Join the community on Discord](https://discord.gg/pgdB5YRe)
+---
+## 🧠 Capabilities
+- Multispeaker support using `[S1]`, `[S2]`, etc.
+- Rich nonverbal cue synthesis: `(laughs)`, `(clears throat)`, `(gasps)`, etc.
+- Voice conditioning (via transcript + audio example)
+- Outputs high-fidelity `.mp3` files directly from text
+Example input:
+```text
+[S1] Dia is an open weights text-to-dialogue model. [S2] You get full control over scripts and voices. (laughs)
+```
+---
+## 🚀 Quickstart
+Install via pip:
+```bash
+pip install git+https://github.com/nari-labs/dia.git
+```
+Launch the Gradio UI:
+```bash
+git clone https://github.com/nari-labs/dia.git
+cd dia && uv run app.py
+```
+Or manually set up:
+```bash
+git clone https://github.com/nari-labs/dia.git
+cd dia
+python -m venv .venv
+source .venv/bin/activate
+pip install -e .
+python app.py
+```
+---
+## 🐍 Python Example
+```python
+from dia.model import Dia
+model = Dia.from_pretrained("nari-labs/Dia-1.6B", compute_dtype="float16")
+text = "[S1] Hello! This is Dia. [S2] Nice to meet you. (laughs)"
+output = model.generate(text, use_torch_compile=True, verbose=True)
+model.save_audio("output.mp3", output)
+```
+> Coming soon: PyPI package and CLI support
+---
+## 💻 Inference Performance (on RTX 4090)
+| Precision | Realtime Factor (w/ compile) | w/o Compile | VRAM Usage |
+|-----------|------------------------------|-------------|------------|
+| bfloat16  | 2.1×                          | 1.5×        | ~10GB      |
+| float16   | 2.2×                          | 1.3×        | ~10GB      |
+| float32   | 1.0×                          | 0.9×        | ~13GB      |
+> CPU support and quantized version coming soon.
+---
+## ⚠️ Ethical Use
+This model is for **research and educational use only**. Prohibited uses include:
+- Impersonating individuals (e.g., cloning real voices without consent)
+- Generating misleading or malicious content
+- Illegal or harmful activities
+Please use responsibly.
+---
+## 📄 License
+Apache 2.0
+See the [LICENSE](https://github.com/nari-labs/dia/blob/main/LICENSE) for details.
+---
+## 🛠️ Roadmap
+- 🔧 Inference speed optimization
+- 💾 CPU & quantized model support
+- 📦 PyPI + CLI tools