add to the readme
Browse files
README.md
CHANGED
@@ -5,8 +5,114 @@ tags:
|
|
5 |
- model_hub_mixin
|
6 |
- pytorch_model_hub_mixin
|
7 |
---
|
|
|
8 |
|
9 |
-
|
10 |
-
|
11 |
-
|
12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
- model_hub_mixin
|
6 |
- pytorch_model_hub_mixin
|
7 |
---
|
8 |
+
# Dia: Open-Weight Text-to-Speech Dialogue Model (1.6B)
|
9 |
|
10 |
+
**Dia** is a 1.6B parameter open-weight text-to-speech model developed by Nari Labs.
|
11 |
+
It generates highly realistic *dialogue* directly from transcripts, with support for both spoken and **nonverbal** cues (e.g., `(laughs)`, `(sighs)`), and can be **conditioned on audio** for emotional tone or voice consistency.
|
12 |
+
|
13 |
+
Currently, Dia supports **English** and is optimized for GPU inference. This model is designed for research and educational purposes only.
|
14 |
+
|
15 |
+
---
|
16 |
+
|
17 |
+
## π₯ Try It Out
|
18 |
+
|
19 |
+
- π₯οΈ [ZeroGPU demo on Spaces](https://huggingface.co/spaces/nari-labs/Dia-1.6B)
|
20 |
+
- π [Comparison demos](https://yummy-fir-7a4.notion.site/dia) with ElevenLabs and Sesame CSM-1B
|
21 |
+
- π§ Try voice remixing and conversations with a larger version β [join the waitlist](https://tally.so/r/meokbo)
|
22 |
+
- π¬ [Join the community on Discord](https://discord.gg/pgdB5YRe)
|
23 |
+
|
24 |
+
---
|
25 |
+
|
26 |
+
## π§ Capabilities
|
27 |
+
|
28 |
+
- Multispeaker support using `[S1]`, `[S2]`, etc.
|
29 |
+
- Rich nonverbal cue synthesis: `(laughs)`, `(clears throat)`, `(gasps)`, etc.
|
30 |
+
- Voice conditioning (via transcript + audio example)
|
31 |
+
- Outputs high-fidelity `.mp3` files directly from text
|
32 |
+
|
33 |
+
Example input:
|
34 |
+
```text
|
35 |
+
[S1] Dia is an open weights text-to-dialogue model. [S2] You get full control over scripts and voices. (laughs)
|
36 |
+
```
|
37 |
+
|
38 |
+
---
|
39 |
+
|
40 |
+
## π Quickstart
|
41 |
+
|
42 |
+
Install via pip:
|
43 |
+
|
44 |
+
```bash
|
45 |
+
pip install git+https://github.com/nari-labs/dia.git
|
46 |
+
```
|
47 |
+
|
48 |
+
Launch the Gradio UI:
|
49 |
+
```bash
|
50 |
+
git clone https://github.com/nari-labs/dia.git
|
51 |
+
cd dia && uv run app.py
|
52 |
+
```
|
53 |
+
|
54 |
+
Or manually set up:
|
55 |
+
|
56 |
+
```bash
|
57 |
+
git clone https://github.com/nari-labs/dia.git
|
58 |
+
cd dia
|
59 |
+
python -m venv .venv
|
60 |
+
source .venv/bin/activate
|
61 |
+
pip install -e .
|
62 |
+
python app.py
|
63 |
+
```
|
64 |
+
|
65 |
+
---
|
66 |
+
|
67 |
+
## π Python Example
|
68 |
+
|
69 |
+
```python
|
70 |
+
from dia.model import Dia
|
71 |
+
|
72 |
+
model = Dia.from_pretrained("nari-labs/Dia-1.6B", compute_dtype="float16")
|
73 |
+
|
74 |
+
text = "[S1] Hello! This is Dia. [S2] Nice to meet you. (laughs)"
|
75 |
+
output = model.generate(text, use_torch_compile=True, verbose=True)
|
76 |
+
model.save_audio("output.mp3", output)
|
77 |
+
```
|
78 |
+
|
79 |
+
> Coming soon: PyPI package and CLI support
|
80 |
+
|
81 |
+
---
|
82 |
+
|
83 |
+
## π» Inference Performance (on RTX 4090)
|
84 |
+
|
85 |
+
| Precision | Realtime Factor (w/ compile) | w/o Compile | VRAM Usage |
|
86 |
+
|-----------|------------------------------|-------------|------------|
|
87 |
+
| bfloat16 | 2.1Γ | 1.5Γ | ~10GB |
|
88 |
+
| float16 | 2.2Γ | 1.3Γ | ~10GB |
|
89 |
+
| float32 | 1.0Γ | 0.9Γ | ~13GB |
|
90 |
+
|
91 |
+
> CPU support and quantized version coming soon.
|
92 |
+
|
93 |
+
---
|
94 |
+
|
95 |
+
## β οΈ Ethical Use
|
96 |
+
|
97 |
+
This model is for **research and educational use only**. Prohibited uses include:
|
98 |
+
|
99 |
+
- Impersonating individuals (e.g., cloning real voices without consent)
|
100 |
+
- Generating misleading or malicious content
|
101 |
+
- Illegal or harmful activities
|
102 |
+
|
103 |
+
Please use responsibly.
|
104 |
+
|
105 |
+
---
|
106 |
+
|
107 |
+
## π License
|
108 |
+
|
109 |
+
Apache 2.0
|
110 |
+
See the [LICENSE](https://github.com/nari-labs/dia/blob/main/LICENSE) for details.
|
111 |
+
|
112 |
+
---
|
113 |
+
|
114 |
+
## π οΈ Roadmap
|
115 |
+
|
116 |
+
- π§ Inference speed optimization
|
117 |
+
- πΎ CPU & quantized model support
|
118 |
+
- π¦ PyPI + CLI tools
|