GoZion commited on
Commit
00366a8
·
verified ·
1 Parent(s): a1d760a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +482 -0
README.md CHANGED
@@ -41,6 +41,488 @@ You can download the following table to see the various parameters for your use
41
 
42
  </div>
43
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44
  ## Usage
45
 
46
  **Installation**
 
41
 
42
  </div>
43
 
44
+ ## Rodimus+-Coder Evaluation
45
+
46
+ We re-evaluate the metrics of the Qwen series models, and the metrics of other series models are quoted from the original paper.
47
+
48
+ ### Rodimus+-Coder-Base
49
+
50
+ <table>
51
+ <tr align="center">
52
+ <th>Datasets</th>
53
+ <th>Qwen2.5-Coder-1.5B</th>
54
+ <th>Rodimus+-Coder-1.6B-Base</th>
55
+ <th>Gemma2-2B-PT</th>
56
+ <th>Qwen2.5-Coder-3B</th>
57
+ <th>Rodimus+-Coder-4B-Base</th>
58
+ <th>Gemma3-4B-PT</th>
59
+ <th>Qwen2.5-Coder-7B</th>
60
+ </tr>
61
+ <tr align="center">
62
+ <td colspan="8">Coding Tasks</td>
63
+ </tr>
64
+ <tr align="center">
65
+ <td>HumanEval</td>
66
+ <td>41.5</td>
67
+ <td>51.2</td>
68
+ <td>19.5</td>
69
+ <td>51.8</td>
70
+ <th>60.4</th>
71
+ <td>36.0</td>
72
+ <th>60.4</th>
73
+ </tr>
74
+ <tr align="center">
75
+ <td>HumanEval+</td>
76
+ <td>34.8</td>
77
+ <td>45.1</td>
78
+ <td>-</td>
79
+ <td>40.9</td>
80
+ <th>52.4</th>
81
+ <td>-</td>
82
+ <td>50.6</td>
83
+ </tr>
84
+ <tr align="center">
85
+ <td>MBPP</td>
86
+ <td>57.2</td>
87
+ <td>51.2</td>
88
+ <td>31.0</td>
89
+ <td>62.6</td>
90
+ <td>64.6</td>
91
+ <td>46.0</td>
92
+ <th>70.0</th>
93
+ </tr>
94
+ <tr align="center">
95
+ <td>MBPP+</td>
96
+ <td>66.1</td>
97
+ <td>62.2</td>
98
+ <td>-</td>
99
+ <td>65.9</td>
100
+ <th>71.4</th>
101
+ <td>-</td>
102
+ <td>70.1</td>
103
+ </tr>
104
+ <tr align="center">
105
+ <td>BCB<sub>COMPLETION</sub></td>
106
+ <td>21.6</td>
107
+ <td>17.9</td>
108
+ <td>-</td>
109
+ <td>26.2</td>
110
+ <th>30.8</th>
111
+ <td>-</td>
112
+ <td>30.4</td>
113
+ </tr>
114
+ <tr align="center">
115
+ <td>MultiPL-E</td>
116
+ <td>46.1</td>
117
+ <td>52.5</td>
118
+ <td>-</td>
119
+ <td>49.4</td>
120
+ <th>60.7</th>
121
+ <td>-</td>
122
+ <td>56.9</td>
123
+ </tr>
124
+ <tr align="center">
125
+ <td>CRUXEval</td>
126
+ <td>38.5</td>
127
+ <td>45.1</td>
128
+ <td>-</td>
129
+ <td>44.6</td>
130
+ <td>56.4</td>
131
+ <td>-</td>
132
+ <th>56.8</th>
133
+ </tr>
134
+ <tr align="center">
135
+ <th>Coding Avg.</th>
136
+ <td>43.7</td>
137
+ <td>46.5</td>
138
+ <td>-</td>
139
+ <td>48.8</td>
140
+ <th>56.7</th>
141
+ <td>-</td>
142
+ <td>56.4</td>
143
+ </tr>
144
+ <tr align="center">
145
+ <td colspan="8">General Tasks</td>
146
+ </tr>
147
+ <tr align="center">
148
+ <td>C-EVAL</td>
149
+ <td>55.2</td>
150
+ <td>56.7</td>
151
+ <td>-</td>
152
+ <td>65.3</td>
153
+ <th>70.2</th>
154
+ <td>-</td>
155
+ <td>69.1</td>
156
+ </tr>
157
+ <tr align="center">
158
+ <td>CMMLU</td>
159
+ <td>54.5</td>
160
+ <td>52.3</td>
161
+ <td>-</td>
162
+ <td>65.4</td>
163
+ <td>68.3</td>
164
+ <td>-</td>
165
+ <th>72.7</th>
166
+ </tr>
167
+ <tr align="center">
168
+ <td>MMLU</td>
169
+ <td>55.5</td>
170
+ <td>51.1</td>
171
+ <td>52.2</td>
172
+ <td>63.3</td>
173
+ <td>62.6</td>
174
+ <td>59.6</td>
175
+ <th>70.5</th>
176
+ </tr>
177
+ <tr align="center">
178
+ <td>BBH</td>
179
+ <td>21.8</td>
180
+ <td>46.8</td>
181
+ <td>42.4</td>
182
+ <td>32.5</td>
183
+ <td>61.9</td>
184
+ <td>50.9</td>
185
+ <th>67.3</th>
186
+ </tr>
187
+ <tr align="center">
188
+ <th>General Avg.</th>
189
+ <td>46.8</td>
190
+ <td>51.7</td>
191
+ <td>-</td>
192
+ <td>56.6</td>
193
+ <td>65.8</td>
194
+ <td>-</td>
195
+ <td>69.9</td>
196
+ </tr>
197
+ <tr align="center">
198
+ <td colspan="8">Mathematics Tasks</td>
199
+ </tr>
200
+ <tr align="center">
201
+ <td>GSM8K</td>
202
+ <td>60.4</td>
203
+ <td>68.7</td>
204
+ <td>25.0</td>
205
+ <td>72.1</td>
206
+ <td>78.5</td>
207
+ <td>38.4</td>
208
+ <td>83.4</td>
209
+ </tr>
210
+ <tr align="center">
211
+ <td>MATH</td>
212
+ <td>23.7</td>
213
+ <td>29.0</td>
214
+ <td>16.4</td>
215
+ <td>31.9</td>
216
+ <td>37.0</td>
217
+ <td>24.2</td>
218
+ <td>42.2</td>
219
+ </tr>
220
+ <tr align="center">
221
+ <th>Math Avg.</th>
222
+ <td>41.9</td>
223
+ <td>48.9</td>
224
+ <td>20.7</td>
225
+ <td>52.0</td>
226
+ <td>57.8</td>
227
+ <td>31.3</td>
228
+ <td>62.8</td>
229
+ </tr>
230
+ <tr align="center">
231
+ <td colspan="8">Overall</td>
232
+ </tr>
233
+ <tr align="center">
234
+ <th>Overall</th>
235
+ <td>44.4</td>
236
+ <td>48.4</td>
237
+ <td>-</td>
238
+ <td>51.7</td>
239
+ <th>59.6</th>
240
+ <td>-</td>
241
+ <th>61.6</th>
242
+ </tr>
243
+ </table>
244
+
245
+ ### Rodimus+-Coder-Chat
246
+
247
+ <table>
248
+ <tr align="center">
249
+ <th>Datasets</th>
250
+ <th>Qwen2.5-Coder-1.5B-Instruct</th>
251
+ <th>Rodimus+-Coder-1.6B-Chat</th>
252
+ <th>Gemma2-2B-IT</th>
253
+ <th>Qwen2.5-Coder-Instruct</th>
254
+ <th>Phi-4-Mini-3.8B</th>
255
+ <th>Rodimus+-Coder-4B-Chat</th>
256
+ <th>Gemma3-4B-IT</th>
257
+ <th>Qwen2.5-Coder-7B-Instruct</th>
258
+ </tr>
259
+ <tr align="center">
260
+ <td colspan="9">Coding Tasks</td>
261
+ </tr>
262
+ <tr align="center">
263
+ <td>HumanEval</td>
264
+ <td>64.6</td>
265
+ <td>76.8</td>
266
+ <td>20.1</td>
267
+ <td>79.9</td>
268
+ <td>74.4</td>
269
+ <td>86.6</td>
270
+ <td>71.3</td>
271
+ <td>87.2</td>
272
+ </tr>
273
+ <tr align="center">
274
+ <td>HumanEval+</td>
275
+ <td>63.4</td>
276
+ <td>73.8</td>
277
+ <td>-</td>
278
+ <td>80.5</td>
279
+ <td>68.3</td>
280
+ <td>82.9</td>
281
+ <td>-</td>
282
+ <td>82.3</td>
283
+ </tr>
284
+ <tr align="center">
285
+ <td>MBPP</td>
286
+ <td>51.0</td>
287
+ <td>59.0</td>
288
+ <td>36.6</td>
289
+ <td>59.2</td>
290
+ <td>65.3</td>
291
+ <td>68.0</td>
292
+ <td>63.2</td>
293
+ <td>75.8</td>
294
+ </tr>
295
+ <tr align="center">
296
+ <td>MBPP+</td>
297
+ <td>53.0</td>
298
+ <td>66.4</td>
299
+ <td>-</td>
300
+ <td>61.9</td>
301
+ <td>63.8</td>
302
+ <td>68.5</td>
303
+ <td>-</td>
304
+ <td>75.1</td>
305
+ </tr>
306
+ <tr align="center">
307
+ <td>LCB<sub>(24.08-24.11)</sub></td>
308
+ <td>4.0</td>
309
+ <td>10.9</td>
310
+ <td>-</td>
311
+ <td>13.0</td>
312
+ <td>-</td>
313
+ <td>13.9</td>
314
+ <td>-</td>
315
+ <td>22.8</td>
316
+ </tr>
317
+ <tr align="center">
318
+ <td>BCB<sub>INSTRUCT</sub></td>
319
+ <td>10.8</td>
320
+ <td>21.5</td>
321
+ <td>-</td>
322
+ <td>21.7</td>
323
+ <td>33.8</td>
324
+ <td>26.6</td>
325
+ <td>-</td>
326
+ <td>30.6</td>
327
+ </tr>
328
+ <tr align="center">
329
+ <td>HumanEval-Mul</td>
330
+ <td>50.8</td>
331
+ <td>57.3</td>
332
+ <td>-</td>
333
+ <td>67.4</td>
334
+ <td>-</td>
335
+ <td>70.6</td>
336
+ <td>-</td>
337
+ <td>76.1</td>
338
+ </tr>
339
+ <tr align="center">
340
+ <td>MBPP-Mul</td>
341
+ <td>43.4</td>
342
+ <td>52.4</td>
343
+ <td>-</td>
344
+ <td>53.4</td>
345
+ <td>-</td>
346
+ <td>59.6</td>
347
+ <td>-</td>
348
+ <td>61.4</td>
349
+ </tr>
350
+ <tr align="center">
351
+ <td>MBXP-EN</td>
352
+ <td>55.8</td>
353
+ <td>75.5</td>
354
+ <td>-</td>
355
+ <td>76.0</td>
356
+ <td>-</td>
357
+ <td>87.3</td>
358
+ <td>-</td>
359
+ <td>87.7</td>
360
+ </tr>
361
+ <tr align="center">
362
+ <td>MBXP-CN</td>
363
+ <td>48.8</td>
364
+ <td>75.0</td>
365
+ <td>-</td>
366
+ <td>68.7</td>
367
+ <td>-</td>
368
+ <td>84.3</td>
369
+ <td>-</td>
370
+ <td>83.5</td>
371
+ </tr>
372
+ <tr align="center">
373
+ <td>CRUXEval</td>
374
+ <td>28.6</td>
375
+ <td>55.0</td>
376
+ <td>-</td>
377
+ <td>51.6</td>
378
+ <td>-</td>
379
+ <td>63.2</td>
380
+ <td>-</td>
381
+ <td>69.3</td>
382
+ </tr>
383
+ <tr align="center">
384
+ <td>HumanEvalFix</td>
385
+ <td>38.9</td>
386
+ <td>52.6</td>
387
+ <td>-</td>
388
+ <td>55.5</td>
389
+ <td>-</td>
390
+ <td>68.8</td>
391
+ <td>-</td>
392
+ <td>69.3</td>
393
+ </tr>
394
+ <tr align="center">
395
+ <td>Spider</td>
396
+ <td>61.2</td>
397
+ <td>71.4</td>
398
+ <td>-</td>
399
+ <td>71.8</td>
400
+ <td>42.2</td>
401
+ <td>73.5</td>
402
+ <td>-</td>
403
+ <td>82.0</td>
404
+ </tr>
405
+ <tr align="center">
406
+ <th>Coding Avg.</th>
407
+ <td>44.2</td>
408
+ <td>57.5</td>
409
+ <td>-</td>
410
+ <td>58.5</td>
411
+ <td>-</td>
412
+ <th>65.7</th>
413
+ <td>-</td>
414
+ <th>69.5</th>
415
+ </tr>
416
+ <tr align="center">
417
+ <td colspan="9">General Tasks</td>
418
+ </tr>
419
+ <tr align="center">
420
+ <td>C-EVAL</td>
421
+ <td>51.5</td>
422
+ <td>50.8</td>
423
+ <td>-</td>
424
+ <td>62.0</td>
425
+ <td>-</td>
426
+ <td>61.6</td>
427
+ <td>-</td>
428
+ <td>66.4</td>
429
+ </tr>
430
+ <tr align="center">
431
+ <td>CMMLU</td>
432
+ <td>45.2</td>
433
+ <td>50.5</td>
434
+ <td>-</td>
435
+ <td>60.1</td>
436
+ <td>-</td>
437
+ <td>62.0</td>
438
+ <td>-</td>
439
+ <td>64.9</td>
440
+ </tr>
441
+ <tr align="center">
442
+ <td>MMLU</td>
443
+ <td>52.0</td>
444
+ <td>49.3</td>
445
+ <td>56.1</td>
446
+ <td>61.7</td>
447
+ <td>67.3</td>
448
+ <td>57.5</td>
449
+ <td>58.1</td>
450
+ <td>66.1</td>
451
+ </tr>
452
+ <tr align="center">
453
+ <td>BBH</td>
454
+ <td>24.2</td>
455
+ <td>58.7</td>
456
+ <td>41.4</td>
457
+ <td>57.3</td>
458
+ <td>70.4</td>
459
+ <td>63.7</td>
460
+ <td>72.2</td>
461
+ <td>59.1</td>
462
+ </tr>
463
+ <tr align="center">
464
+ <th>General Avg.</th>
465
+ <td>43.2</td>
466
+ <td>52.3</td>
467
+ <td>-</td>
468
+ <td>60.3</td>
469
+ <td>-</td>
470
+ <td>61.2</td>
471
+ <td>-</td>
472
+ <td>64.1</td>
473
+ </tr>
474
+ <tr align="center">
475
+ <td colspan="9">Mathematics Tasks</td>
476
+ </tr>
477
+ <tr align="center">
478
+ <td>GSM8K</td>
479
+ <td>54.4</td>
480
+ <td>68.5</td>
481
+ <td>62.6</td>
482
+ <td>73.5</td>
483
+ <td>88.6</td>
484
+ <td>79.2</td>
485
+ <td>89.2</td>
486
+ <td>79.5</td>
487
+ </tr>
488
+ <tr align="center">
489
+ <td>MATH</td>
490
+ <td>38.1</td>
491
+ <td>33.5</td>
492
+ <td>27.2</td>
493
+ <td>44.1</td>
494
+ <td>64.0</td>
495
+ <td>44.1</td>
496
+ <td>75.6</td>
497
+ <td>60.8</td>
498
+ </tr>
499
+ <tr align="center">
500
+ <th>Math Avg.</th>
501
+ <td>46.2</td>
502
+ <td>51.0</td>
503
+ <td>44.9</td>
504
+ <td>58.8</td>
505
+ <td>68.8</td>
506
+ <td>61.7</td>
507
+ <td>82.4</td>
508
+ <td>70.1</td>
509
+ </tr>
510
+ <tr align="center">
511
+ <td colspan="9">Overall</td>
512
+ </tr>
513
+ <tr align="center">
514
+ <th>Overall</th>
515
+ <td>44.2</td>
516
+ <td>55.8</td>
517
+ <td>-</td>
518
+ <td>58.9</td>
519
+ <td>-</td>
520
+ <th>64.3</th>
521
+ <td>-</td>
522
+ <th>68.4</th>
523
+ </tr>
524
+ </table>
525
+
526
  ## Usage
527
 
528
  **Installation**