model card completion

#23
by clem HF Staff - opened
Files changed (1) hide show
  1. README.md +110 -4
README.md CHANGED
@@ -5,8 +5,114 @@ tags:
5
  - model_hub_mixin
6
  - pytorch_model_hub_mixin
7
  ---
 
8
 
9
- This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
10
- - Code: https://github.com/nari-labs/dia
11
- - Paper: [More Information Needed]
12
- - Docs: [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  - model_hub_mixin
6
  - pytorch_model_hub_mixin
7
  ---
8
+ # Dia: Open-Weight Text-to-Speech Dialogue Model (1.6B)
9
 
10
+ **Dia** is a 1.6B parameter open-weight text-to-speech model developed by Nari Labs.
11
+ It generates highly realistic *dialogue* directly from transcripts, with support for both spoken and **nonverbal** cues (e.g., `(laughs)`, `(sighs)`), and can be **conditioned on audio** for emotional tone or voice consistency.
12
+
13
+ Currently, Dia supports **English** and is optimized for GPU inference. This model is designed for research and educational purposes only.
14
+
15
+ ---
16
+
17
+ ## πŸ”₯ Try It Out
18
+
19
+ - πŸ–₯️ [ZeroGPU demo on Spaces](https://huggingface.co/spaces/nari-labs/Dia-1.6B)
20
+ - πŸ“Š [Comparison demos](https://yummy-fir-7a4.notion.site/dia) with ElevenLabs and Sesame CSM-1B
21
+ - 🎧 Try voice remixing and conversations with a larger version β€” [join the waitlist](https://tally.so/r/meokbo)
22
+ - πŸ’¬ [Join the community on Discord](https://discord.gg/pgdB5YRe)
23
+
24
+ ---
25
+
26
+ ## 🧠 Capabilities
27
+
28
+ - Multispeaker support using `[S1]`, `[S2]`, etc.
29
+ - Rich nonverbal cue synthesis: `(laughs)`, `(clears throat)`, `(gasps)`, etc.
30
+ - Voice conditioning (via transcript + audio example)
31
+ - Outputs high-fidelity `.mp3` files directly from text
32
+
33
+ Example input:
34
+ ```text
35
+ [S1] Dia is an open weights text-to-dialogue model. [S2] You get full control over scripts and voices. (laughs)
36
+ ```
37
+
38
+ ---
39
+
40
+ ## πŸš€ Quickstart
41
+
42
+ Install via pip:
43
+
44
+ ```bash
45
+ pip install git+https://github.com/nari-labs/dia.git
46
+ ```
47
+
48
+ Launch the Gradio UI:
49
+ ```bash
50
+ git clone https://github.com/nari-labs/dia.git
51
+ cd dia && uv run app.py
52
+ ```
53
+
54
+ Or manually set up:
55
+
56
+ ```bash
57
+ git clone https://github.com/nari-labs/dia.git
58
+ cd dia
59
+ python -m venv .venv
60
+ source .venv/bin/activate
61
+ pip install -e .
62
+ python app.py
63
+ ```
64
+
65
+ ---
66
+
67
+ ## 🐍 Python Example
68
+
69
+ ```python
70
+ from dia.model import Dia
71
+
72
+ model = Dia.from_pretrained("nari-labs/Dia-1.6B", compute_dtype="float16")
73
+
74
+ text = "[S1] Hello! This is Dia. [S2] Nice to meet you. (laughs)"
75
+ output = model.generate(text, use_torch_compile=True, verbose=True)
76
+ model.save_audio("output.mp3", output)
77
+ ```
78
+
79
+ > Coming soon: PyPI package and CLI support
80
+
81
+ ---
82
+
83
+ ## πŸ’» Inference Performance (on RTX 4090)
84
+
85
+ | Precision | Realtime Factor (w/ compile) | w/o Compile | VRAM Usage |
86
+ |-----------|------------------------------|-------------|------------|
87
+ | bfloat16 | 2.1Γ— | 1.5Γ— | ~10GB |
88
+ | float16 | 2.2Γ— | 1.3Γ— | ~10GB |
89
+ | float32 | 1.0Γ— | 0.9Γ— | ~13GB |
90
+
91
+ > CPU support and quantized version coming soon.
92
+
93
+ ---
94
+
95
+ ## ⚠️ Ethical Use
96
+
97
+ This model is for **research and educational use only**. Prohibited uses include:
98
+
99
+ - Impersonating individuals (e.g., cloning real voices without consent)
100
+ - Generating misleading or malicious content
101
+ - Illegal or harmful activities
102
+
103
+ Please use responsibly.
104
+
105
+ ---
106
+
107
+ ## πŸ“„ License
108
+
109
+ Apache 2.0
110
+ See the [LICENSE](https://github.com/nari-labs/dia/blob/main/LICENSE) for details.
111
+
112
+ ---
113
+
114
+ ## πŸ› οΈ Roadmap
115
+
116
+ - πŸ”§ Inference speed optimization
117
+ - πŸ’Ύ CPU & quantized model support
118
+ - πŸ“¦ PyPI + CLI tools