metacritical commited on
Commit
73705e5
·
verified ·
1 Parent(s): bf1e8f3

Update index.html

Browse files
Files changed (1) hide show
  1. index.html +11 -97
index.html CHANGED
@@ -61,81 +61,6 @@
61
  <div class="columns is-centered">
62
  <div class="column is-10">
63
 
64
- <!-- Native Sparse Attention -->
65
- <div class="card paper-card">
66
- <div class="card-content">
67
- <h3 class="title is-4">
68
- <a href="https://arxiv.org/abs/2502.11089">Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention</a>
69
- <span class="coming-soon-badge">Deep Dive Coming Soon</span>
70
- </h3>
71
- <p class="release-date">Released: February 2025</p>
72
- <p class="paper-description">
73
- Introduces a new approach to sparse attention that is both hardware-efficient and natively trainable,
74
- improving the performance of large language models.
75
- </p>
76
- </div>
77
- </div>
78
-
79
- <!-- DeepSeek-R1 -->
80
- <div class="card paper-card">
81
- <div class="card-content">
82
- <h3 class="title is-4">
83
- DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
84
- <span class="coming-soon-badge">Deep Dive Coming Soon</span>
85
- </h3>
86
- <p class="release-date">Released: January 20, 2025</p>
87
- <p class="paper-description">
88
- The R1 model builds on previous work to enhance reasoning capabilities through large-scale
89
- reinforcement learning, competing directly with leading models like OpenAI's o1.
90
- </p>
91
- </div>
92
- </div>
93
-
94
- <!-- DeepSeek-V3 -->
95
- <div class="card paper-card">
96
- <div class="card-content">
97
- <h3 class="title is-4">
98
- DeepSeek-V3 Technical Report
99
- <span class="coming-soon-badge">Deep Dive Coming Soon</span>
100
- </h3>
101
- <p class="release-date">Released: December 2024</p>
102
- <p class="paper-description">
103
- Discusses the scaling of sparse MoE networks to 671 billion parameters, utilizing mixed precision
104
- training and high-performance computing (HPC) co-design strategies.
105
- </p>
106
- </div>
107
- </div>
108
-
109
- <!-- DeepSeek-V2 -->
110
- <div class="card paper-card">
111
- <div class="card-content">
112
- <h3 class="title is-4">
113
- DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
114
- <span class="coming-soon-badge">Deep Dive Coming Soon</span>
115
- </h3>
116
- <p class="release-date">Released: May 2024</p>
117
- <p class="paper-description">
118
- Introduces a Mixture-of-Experts (MoE) architecture, enhancing performance while reducing
119
- training costs by 42%. Emphasizes strong performance characteristics and efficiency improvements.
120
- </p>
121
- </div>
122
- </div>
123
-
124
- <!-- DeepSeekMath -->
125
- <div class="card paper-card">
126
- <div class="card-content">
127
- <h3 class="title is-4">
128
- DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
129
- <span class="coming-soon-badge">Deep Dive Coming Soon</span>
130
- </h3>
131
- <p class="release-date">Released: April 2024</p>
132
- <p class="paper-description">
133
- This paper presents methods to improve mathematical reasoning in LLMs, introducing the
134
- Group Relative Policy Optimization (GRPO) algorithm during reinforcement learning stages.
135
- </p>
136
- </div>
137
- </div>
138
-
139
  <!-- DeepSeekLLM -->
140
  <div class="card paper-card">
141
  <div class="card-content">
@@ -151,48 +76,37 @@
151
  </div>
152
  </div>
153
 
154
- <!-- Papers without specific dates -->
155
- <!-- DeepSeek-Prover -->
156
  <div class="card paper-card">
157
  <div class="card-content">
158
  <h3 class="title is-4">
159
- DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
160
  <span class="coming-soon-badge">Deep Dive Coming Soon</span>
161
  </h3>
 
162
  <p class="paper-description">
163
- Focuses on enhancing theorem proving capabilities in language models using synthetic data
164
- for training, establishing new benchmarks in automated mathematical reasoning.
165
  </p>
166
  </div>
167
  </div>
168
 
169
- <!-- DeepSeek-Coder-V2 -->
170
  <div class="card paper-card">
171
  <div class="card-content">
172
  <h3 class="title is-4">
173
- DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
174
  <span class="coming-soon-badge">Deep Dive Coming Soon</span>
175
  </h3>
 
176
  <p class="paper-description">
177
- This paper details advancements in code-related tasks with an emphasis on open-source
178
- methodologies, improving upon earlier coding models with enhanced capabilities.
179
  </p>
180
  </div>
181
  </div>
182
 
183
- <!-- DeepSeekMoE -->
184
- <div class="card paper-card">
185
- <div class="card-content">
186
- <h3 class="title is-4">
187
- DeepSeekMoE: Advancing Mixture-of-Experts Architecture
188
- <span class="coming-soon-badge">Deep Dive Coming Soon</span>
189
- </h3>
190
- <p class="paper-description">
191
- Discusses the integration and benefits of the Mixture-of-Experts approach within the
192
- DeepSeek framework, focusing on scalability and efficiency improvements.
193
- </p>
194
- </div>
195
- </div>
196
 
197
  </div>
198
  </div>
 
61
  <div class="columns is-centered">
62
  <div class="column is-10">
63
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
64
  <!-- DeepSeekLLM -->
65
  <div class="card paper-card">
66
  <div class="card-content">
 
76
  </div>
77
  </div>
78
 
79
+ <!-- DeepSeek-V2 -->
 
80
  <div class="card paper-card">
81
  <div class="card-content">
82
  <h3 class="title is-4">
83
+ DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
84
  <span class="coming-soon-badge">Deep Dive Coming Soon</span>
85
  </h3>
86
+ <p class="release-date">Released: May 2024</p>
87
  <p class="paper-description">
88
+ Introduces a Mixture-of-Experts (MoE) architecture, enhancing performance while reducing
89
+ training costs by 42%. Emphasizes strong performance characteristics and efficiency improvements.
90
  </p>
91
  </div>
92
  </div>
93
 
94
+ <!-- Continue with other papers... -->
95
  <div class="card paper-card">
96
  <div class="card-content">
97
  <h3 class="title is-4">
98
+ DeepSeek-V3 Technical Report
99
  <span class="coming-soon-badge">Deep Dive Coming Soon</span>
100
  </h3>
101
+ <p class="release-date">Released: December 2024</p>
102
  <p class="paper-description">
103
+ Discusses the scaling of sparse MoE networks to 671 billion parameters, utilizing mixed precision
104
+ training and high-performance computing (HPC) co-design strategies.
105
  </p>
106
  </div>
107
  </div>
108
 
109
+ <!-- Add remaining papers following the same pattern -->
 
 
 
 
 
 
 
 
 
 
 
 
110
 
111
  </div>
112
  </div>