metacritical commited on
Commit
bf1e8f3
·
verified ·
1 Parent(s): 4af19e1

Updated Links and chronology

Browse files
Files changed (1) hide show
  1. index.html +156 -129
index.html CHANGED
@@ -2,37 +2,53 @@
2
  <html>
3
  <head>
4
  <meta charset="utf-8">
5
- <meta name="description" content="DeepSeek: Advancing Open-Source Language Models">
6
- <meta name="keywords" content="DeepSeek, LLM, AI">
7
  <meta name="viewport" content="width=device-width, initial-scale=1">
8
- <title>DeepSeek: Advancing Open-Source Language Models</title>
9
 
10
  <link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro" rel="stylesheet">
11
- <link rel="stylesheet" href="./static/css/bulma.min.css">
12
- <link rel="stylesheet" href="./static/css/bulma-carousel.min.css">
13
- <link rel="stylesheet" href="./static/css/bulma-slider.min.css">
14
- <link rel="stylesheet" href="./static/css/fontawesome.all.min.css">
15
- <link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css">
16
- <link rel="stylesheet" href="./static/css/index.css">
17
- <link rel="icon" href="./static/images/favicon.svg">
18
 
19
- <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
20
- <script defer src="./static/js/fontawesome.all.min.js"></script>
21
- <script src="./static/js/bulma-carousel.min.js"></script>
22
- <script src="./static/js/bulma-slider.min.js"></script>
23
- <script src="./static/js/index.js"></script>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
  </head>
25
  <body>
26
 
27
- <section class="hero">
28
  <div class="hero-body">
29
  <div class="container is-max-desktop">
30
  <div class="columns is-centered">
31
  <div class="column has-text-centered">
32
- <h1 class="title is-1 publication-title">DeepSeek: Advancing Open-Source Language Models</h1>
33
- <div class="is-size-5 publication-authors">
34
- A collection of groundbreaking research papers in AI and language models
35
- </div>
36
  </div>
37
  </div>
38
  </div>
@@ -41,123 +57,143 @@
41
 
42
  <section class="section">
43
  <div class="container is-max-desktop">
44
- <!-- Abstract. -->
45
- <div class="columns is-centered has-text-centered">
46
- <div class="column is-four-fifths">
47
- <h2 class="title is-3">Overview</h2>
48
- <div class="content has-text-justified">
49
- <p>
50
- DeepSeek has released a series of significant papers detailing advancements in large language models (LLMs).
51
- Each paper represents a step forward in making AI more capable, efficient, and accessible.
52
- </p>
53
- </div>
54
- </div>
55
- </div>
56
- <!--/ Abstract. -->
57
-
58
- <!-- Paper Collection -->
59
- <div class="columns is-centered has-text-centered">
60
- <div class="column is-four-fifths">
61
- <h2 class="title is-3">Research Papers</h2>
62
-
63
- <!-- Paper 1 -->
64
- <div class="publication-block">
65
- <div class="publication-header">
66
- <h3 class="title is-4">DeepSeekLLM: Scaling Open-Source Language Models with Longer-termism</h3>
67
- <span class="tag is-primary is-medium">Deep Dive Coming Soon</span>
68
- <div class="is-size-5 publication-authors">
69
- Released: November 29, 2023
70
  </div>
71
  </div>
72
- <div class="content has-text-justified">
73
- <p>This foundational paper explores scaling laws and the trade-offs between data and model size,
74
- establishing the groundwork for subsequent models.</p>
75
- </div>
76
- </div>
77
 
78
- <!-- Paper 2 -->
79
- <div class="publication-block">
80
- <div class="publication-header">
81
- <h3 class="title is-4">DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model</h3>
82
- <span class="tag is-primary is-medium">Deep Dive Coming Soon</span>
83
- <div class="is-size-5 publication-authors">
84
- Released: May 2024
 
 
 
 
 
85
  </div>
86
  </div>
87
- <div class="content has-text-justified">
88
- <p>Introduces a Mixture-of-Experts (MoE) architecture, enhancing performance while reducing
89
- training costs by 42%.</p>
90
- </div>
91
- </div>
92
 
93
- <!-- Additional papers following same structure -->
94
- <div class="publication-block">
95
- <div class="publication-header">
96
- <h3 class="title is-4">DeepSeek-V3 Technical Report</h3>
97
- <span class="tag is-primary is-medium">Deep Dive Coming Soon</span>
98
- <div class="is-size-5 publication-authors">
99
- Released: December 2024
 
 
 
 
 
100
  </div>
101
  </div>
102
- <div class="content has-text-justified">
103
- <p>Discusses the scaling of sparse MoE networks to 671 billion parameters.</p>
104
- </div>
105
- </div>
106
 
107
- <div class="publication-block">
108
- <div class="publication-header">
109
- <h3 class="title is-4">DeepSeek-R1: Incentivizing Reasoning Capability in LLMs</h3>
110
- <span class="tag is-primary is-medium">Deep Dive Coming Soon</span>
111
- <div class="is-size-5 publication-authors">
112
- Released: January 20, 2025
 
 
 
 
 
 
113
  </div>
114
  </div>
115
- <div class="content has-text-justified">
116
- <p>Enhances reasoning capabilities through large-scale reinforcement learning.</p>
117
- </div>
118
- </div>
119
 
120
- <div class="publication-block">
121
- <div class="publication-header">
122
- <h3 class="title is-4">DeepSeekMath: Pushing the Limits of Mathematical Reasoning</h3>
123
- <span class="tag is-primary is-medium">Deep Dive Coming Soon</span>
124
- <div class="is-size-5 publication-authors">
125
- Released: April 2024
 
 
 
 
 
 
126
  </div>
127
  </div>
128
- <div class="content has-text-justified">
129
- <p>Presents methods to improve mathematical reasoning in LLMs.</p>
130
- </div>
131
- </div>
132
 
133
- <div class="publication-block">
134
- <div class="publication-header">
135
- <h3 class="title is-4">DeepSeek-Prover: Advancing Theorem Proving in LLMs</h3>
136
- <span class="tag is-primary is-medium">Deep Dive Coming Soon</span>
137
- </div>
138
- <div class="content has-text-justified">
139
- <p>Focuses on enhancing theorem proving capabilities using synthetic data for training.</p>
 
 
 
 
 
 
140
  </div>
141
- </div>
142
 
143
- <div class="publication-block">
144
- <div class="publication-header">
145
- <h3 class="title is-4">DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models</h3>
146
- <span class="tag is-primary is-medium">Deep Dive Coming Soon</span>
147
- </div>
148
- <div class="content has-text-justified">
149
- <p>Details advancements in code-related tasks with emphasis on open-source methodologies.</p>
 
 
 
 
 
 
150
  </div>
151
- </div>
152
 
153
- <div class="publication-block">
154
- <div class="publication-header">
155
- <h3 class="title is-4">DeepSeekMoE: Advancing Mixture-of-Experts Architecture</h3>
156
- <span class="tag is-primary is-medium">Deep Dive Coming Soon</span>
 
 
 
 
 
 
 
 
157
  </div>
158
- <div class="content has-text-justified">
159
- <p>Discusses the integration and benefits of the Mixture-of-Experts approach.</p>
 
 
 
 
 
 
 
 
 
 
 
160
  </div>
 
161
  </div>
162
  </div>
163
  </div>
@@ -167,19 +203,10 @@
167
  <footer class="footer">
168
  <div class="container">
169
  <div class="content has-text-centered">
170
- <a class="icon-link" href="https://github.com/deepseek-ai" target="_blank" class="external-link">
171
- <i class="fab fa-github"></i>
172
- </a>
173
- </div>
174
- <div class="columns is-centered">
175
- <div class="column is-8">
176
- <div class="content">
177
- <p>
178
- This website is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative
179
- Commons Attribution-ShareAlike 4.0 International License</a>.
180
- </p>
181
- </div>
182
- </div>
183
  </div>
184
  </div>
185
  </footer>
 
2
  <html>
3
  <head>
4
  <meta charset="utf-8">
5
+ <meta name="description" content="DeepSeek Papers: Advancing Open-Source Language Models">
6
+ <meta name="keywords" content="DeepSeek, LLM, AI, Research">
7
  <meta name="viewport" content="width=device-width, initial-scale=1">
8
+ <title>DeepSeek Papers: Advancing Open-Source Language Models</title>
9
 
10
  <link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro" rel="stylesheet">
11
+ <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/bulma/0.9.3/css/bulma.min.css">
12
+ <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.0.0/css/all.min.css">
 
 
 
 
 
13
 
14
+ <style>
15
+ .publication-title {
16
+ color: #363636;
17
+ }
18
+ .paper-card {
19
+ margin-bottom: 2rem;
20
+ transition: transform 0.2s;
21
+ }
22
+ .paper-card:hover {
23
+ transform: translateY(-5px);
24
+ }
25
+ .coming-soon-badge {
26
+ background-color: #3273dc;
27
+ color: white;
28
+ padding: 0.25rem 0.75rem;
29
+ border-radius: 4px;
30
+ font-size: 0.8rem;
31
+ margin-left: 1rem;
32
+ }
33
+ .paper-description {
34
+ color: #4a4a4a;
35
+ margin-top: 0.5rem;
36
+ }
37
+ .release-date {
38
+ color: #7a7a7a;
39
+ font-size: 0.9rem;
40
+ }
41
+ </style>
42
  </head>
43
  <body>
44
 
45
+ <section class="hero is-light">
46
  <div class="hero-body">
47
  <div class="container is-max-desktop">
48
  <div class="columns is-centered">
49
  <div class="column has-text-centered">
50
+ <h1 class="title is-1 publication-title">DeepSeek Papers</h1>
51
+ <h2 class="subtitle is-3">Advancing Open-Source Language Models</h2>
 
 
52
  </div>
53
  </div>
54
  </div>
 
57
 
58
  <section class="section">
59
  <div class="container is-max-desktop">
60
+ <div class="content">
61
+ <div class="columns is-centered">
62
+ <div class="column is-10">
63
+
64
+ <!-- Native Sparse Attention -->
65
+ <div class="card paper-card">
66
+ <div class="card-content">
67
+ <h3 class="title is-4">
68
+ <a href="https://arxiv.org/abs/2502.11089">Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention</a>
69
+ <span class="coming-soon-badge">Deep Dive Coming Soon</span>
70
+ </h3>
71
+ <p class="release-date">Released: February 2025</p>
72
+ <p class="paper-description">
73
+ Introduces a new approach to sparse attention that is both hardware-efficient and natively trainable,
74
+ improving the performance of large language models.
75
+ </p>
 
 
 
 
 
 
 
 
 
 
76
  </div>
77
  </div>
 
 
 
 
 
78
 
79
+ <!-- DeepSeek-R1 -->
80
+ <div class="card paper-card">
81
+ <div class="card-content">
82
+ <h3 class="title is-4">
83
+ DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
84
+ <span class="coming-soon-badge">Deep Dive Coming Soon</span>
85
+ </h3>
86
+ <p class="release-date">Released: January 20, 2025</p>
87
+ <p class="paper-description">
88
+ The R1 model builds on previous work to enhance reasoning capabilities through large-scale
89
+ reinforcement learning, competing directly with leading models like OpenAI's o1.
90
+ </p>
91
  </div>
92
  </div>
 
 
 
 
 
93
 
94
+ <!-- DeepSeek-V3 -->
95
+ <div class="card paper-card">
96
+ <div class="card-content">
97
+ <h3 class="title is-4">
98
+ DeepSeek-V3 Technical Report
99
+ <span class="coming-soon-badge">Deep Dive Coming Soon</span>
100
+ </h3>
101
+ <p class="release-date">Released: December 2024</p>
102
+ <p class="paper-description">
103
+ Discusses the scaling of sparse MoE networks to 671 billion parameters, utilizing mixed precision
104
+ training and high-performance computing (HPC) co-design strategies.
105
+ </p>
106
  </div>
107
  </div>
 
 
 
 
108
 
109
+ <!-- DeepSeek-V2 -->
110
+ <div class="card paper-card">
111
+ <div class="card-content">
112
+ <h3 class="title is-4">
113
+ DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
114
+ <span class="coming-soon-badge">Deep Dive Coming Soon</span>
115
+ </h3>
116
+ <p class="release-date">Released: May 2024</p>
117
+ <p class="paper-description">
118
+ Introduces a Mixture-of-Experts (MoE) architecture, enhancing performance while reducing
119
+ training costs by 42%. Emphasizes strong performance characteristics and efficiency improvements.
120
+ </p>
121
  </div>
122
  </div>
 
 
 
 
123
 
124
+ <!-- DeepSeekMath -->
125
+ <div class="card paper-card">
126
+ <div class="card-content">
127
+ <h3 class="title is-4">
128
+ DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
129
+ <span class="coming-soon-badge">Deep Dive Coming Soon</span>
130
+ </h3>
131
+ <p class="release-date">Released: April 2024</p>
132
+ <p class="paper-description">
133
+ This paper presents methods to improve mathematical reasoning in LLMs, introducing the
134
+ Group Relative Policy Optimization (GRPO) algorithm during reinforcement learning stages.
135
+ </p>
136
  </div>
137
  </div>
 
 
 
 
138
 
139
+ <!-- DeepSeekLLM -->
140
+ <div class="card paper-card">
141
+ <div class="card-content">
142
+ <h3 class="title is-4">
143
+ DeepSeekLLM: Scaling Open-Source Language Models with Longer-termism
144
+ <span class="coming-soon-badge">Deep Dive Coming Soon</span>
145
+ </h3>
146
+ <p class="release-date">Released: November 29, 2023</p>
147
+ <p class="paper-description">
148
+ This foundational paper explores scaling laws and the trade-offs between data and model size,
149
+ establishing the groundwork for subsequent models.
150
+ </p>
151
+ </div>
152
  </div>
 
153
 
154
+ <!-- Papers without specific dates -->
155
+ <!-- DeepSeek-Prover -->
156
+ <div class="card paper-card">
157
+ <div class="card-content">
158
+ <h3 class="title is-4">
159
+ DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
160
+ <span class="coming-soon-badge">Deep Dive Coming Soon</span>
161
+ </h3>
162
+ <p class="paper-description">
163
+ Focuses on enhancing theorem proving capabilities in language models using synthetic data
164
+ for training, establishing new benchmarks in automated mathematical reasoning.
165
+ </p>
166
+ </div>
167
  </div>
 
168
 
169
+ <!-- DeepSeek-Coder-V2 -->
170
+ <div class="card paper-card">
171
+ <div class="card-content">
172
+ <h3 class="title is-4">
173
+ DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
174
+ <span class="coming-soon-badge">Deep Dive Coming Soon</span>
175
+ </h3>
176
+ <p class="paper-description">
177
+ This paper details advancements in code-related tasks with an emphasis on open-source
178
+ methodologies, improving upon earlier coding models with enhanced capabilities.
179
+ </p>
180
+ </div>
181
  </div>
182
+
183
+ <!-- DeepSeekMoE -->
184
+ <div class="card paper-card">
185
+ <div class="card-content">
186
+ <h3 class="title is-4">
187
+ DeepSeekMoE: Advancing Mixture-of-Experts Architecture
188
+ <span class="coming-soon-badge">Deep Dive Coming Soon</span>
189
+ </h3>
190
+ <p class="paper-description">
191
+ Discusses the integration and benefits of the Mixture-of-Experts approach within the
192
+ DeepSeek framework, focusing on scalability and efficiency improvements.
193
+ </p>
194
+ </div>
195
  </div>
196
+
197
  </div>
198
  </div>
199
  </div>
 
203
  <footer class="footer">
204
  <div class="container">
205
  <div class="content has-text-centered">
206
+ <p>
207
+ This website is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">
208
+ Creative Commons Attribution-ShareAlike 4.0 International License</a>.
209
+ </p>
 
 
 
 
 
 
 
 
 
210
  </div>
211
  </div>
212
  </footer>