CodeV-R1-Distill-Qwen-7B
The paper is coming soon!
1. Introduction
The post-training phase of large language models (LLMs) has advanced rapidly, with models like OpenAI’s GPT-o1, DeepSeek-R1, and Kimi-1.5 showcasing remarkable reasoning capabilities. Notably, DeepSeek-R1 introduced a simple yet powerful rule-based reinforcement learning (RL) approach that enables the emergence of reasoning patterns. While these advancements have primarily targeted software programming languages, there is growing interest in adapting LLMs for hardware description languages (HDLs)—critical tools for chip design and hardware verification.
However, HDLs such as Verilog face challenges akin to low-resource languages, including limited high-quality instruction-following data and constrained model capabilities in generating accurate Register Transfer Level (RTL) code. These limitations hinder the performance and cross-language generalization of specialized code LLMs. To address this, we propose leveraging knowledge distillation to equip smaller, efficient models with DeepSeek-R1-like reasoning abilities.
As a continuation of the work initiated with CodeV, we introduce CodeV-R1-Distill-Qwen-7B, a model distilled from DeepSeek-R1 using our CodeV dataset. This model outperforms prior non-reasoning LLMs across major Verilog benchmarks, demonstrating superior code synthesis and problem-solving capabilities. Intriguingly, distilling Verilog code also enhances the model’s mathematical reasoning, suggesting broader synergies between hardware-centric training and general logical reasoning.
2. Model Summary
- Data Preparation: Initially, we re-summarize and formulate questions from the original CodeV dataset utilizing Deepseek-v3. We then filter out straightforward problems—those solvable by Qwen2.5-Coder-7B-Instruct and Qwen2.5-Coder-32B-Instruct within five attempts—as well as having non-synthesizable issues. For the remaining data, we use DeepSeek-R1 to generate one response per question. Problems with a Rouge-L score greater than 0.5 compared to the benchmark-tested problems are also filtered out. After these processes, approximately 87,000 (problem, code) pairs remain.
- Training: We employ LLaMAFactory to apply supervised fine-tuning (SFT) to Qwen2.5-Coder-7B-Instruct using this refined dataset of 87,000 pairs. Training is conducted over six epochs with a learning rate of 1e-5 and a batch size of 64.
3. Evaluation Results
During the evaluation phase, the maximum generation length is configured to 16,384 tokens. A temperature setting of 0.6 is applied, and 20 responses are generated per query to estimate the pass@1 score.
Our evaluation encompasses Verilog benchmarks, including VerilogEval and RTLLM. For VerilogEval v2, we examine zero-shot scenarios in both specification-to-RTL translation and code completion tasks. Concerning RTLLM, results are reported for version 1.1, which offers a broader spectrum of comparative analyses. Furthermore, we find that the acquisition of the reasoning process in Verilog problems, as facilitated by DeepSeek-R1, enhances the model's out-of-domain mathematical capabilities.
VerilogEval (v2)
Model | Model size | Type | Spec-to-rtl | Completion |
---|---|---|---|---|
GPT-4o | Undisclosed | General | 62.5% | 59.0% |
GPT-4 Turbo | Undisclosed | General | 61.1% | 53.9% |
GPT-4 | Undisclosed | General | 32.0% | 42.3% |
Mistral Large | Undisclosed | General | 37.5% | 34.0% |
Llama3.1 | 405B | General | 57.2% | 56.4% |
Llama3.1 | 70B | General | 42.8% | 35.3% |
Llama3 | 70B | General | 43.9% | 37.8% |
Llama2 | 70B | General | 5.3% | 1.3% |
Llama3.1 | 8B | General | 19.1% | 2.6% |
CodeLlama | 70B | Coding | 34.9% | 37.2% |
DeepSeek Coder | 33B | Coding | 21.7% | 25.0% |
CodeGemma | 7B | Coding | 9.5% | 8.3% |
DeepSeek Coder | 6.7B | Coding | 29.6% | 24.4% |
RTL-Coder | 6.7B | Verilog RTL | 36.8% | 35.9% |
CodeV-R1-distill (ours) | 7B | Verilog RTL | 65.4% | 65.1% |
RTLLM (v1.1)
Model | Model size | Type | Pass@1 |
---|---|---|---|
GPT-4o | Undisclosed | General | 33.8% |
GPT-3.5 Turbo | Undisclosed | General | 28.3% |
Llama3.1 | 405B | General | 38.9% |
Nemotron-4 | 340B | General | 18.9% |
Llama3.1 | 8B | General | 19.1% |
CodeLlama | 7B | Coding | 17.9% |
CodeQwen | 7B | Coding | 24.1% |
Starcoder2 | 15B | Coding | 15.5% |
DeepSeek Coder | 6.7B | Coding | 23.1% |
DeepSeek-Coder-V2 | 16B | Coding | 33.1% |
DeepSeek-Coder-V2 | 236B | Coding | 34.5% |
RTL-Coder | 6.7B | Verilog RTL | 36.8% |
CraftRTL | 6.7B | Verilog RTL | 53.1% |
CodeV-R1-distill (ours) | 7B | Verilog RTL | 56.2% |
Math
Model | AIME | Math | AMC | Minerva | Olympiad Bench | Average |
---|---|---|---|---|---|---|
Qwen2.5-7b-instruct-1M | 11.25% | 72.61% | 41.11% | 25.92% | 34.66% | 37.11% |
Qwen2.5-math-7b-instruct | 12.08% | 82.25% | 49.4% | 27.64% | 37.31% | 41.74% |
Qwen2.5-coder-7b-instruct (baseline) | 5.63% | 63.5% | 35.62% | 21.02% | 28.64% | 30.88% |
CodeV-R1-distill (ours) | 11.04% | 74.35% | 45.86% | 25.79% | 38.7% | 39.15% |
4. Usage
CodeV-R1-Distill-Qwen-7B can be utilized in the same manner as Qwen or Llama models.
For instance, you can easily start a service using vLLM:
vllm serve zhuyaoyu/CodeV-R1-Distill-Qwen-7B --tensor-parallel-size 2 --max-model-len 16384 --enforce-eager
Usage Recommendations
During training and evaluation, we use a system prompt
You are a helpful assistant. The assistant first thinks about the reasoning process in the mind and then provides the user with the answer. The reasoning process and answer are enclosed within <think> </think> and<answer> </answer> tags, respectively, i.e., <think> reasoning process here </think><answer> answer here </answer>. Now the user asks you to write verilog code. After thinking, when you finally reach a conclusion, enclose the final verilog code in ```verilog ``` within <answer> </answer> tags. i.e., <answer> ```verilog\n module top_module(in, out, ...) ... ``` </answer>.\n
It is recommended to use this prompt.
5. License
CodeV-R1-Distill-Qwen-7B is derived from Qwen-2.5 series, which are originally licensed under Apache 2.0 License, and now finetuned with 87k samples curated with DeepSeek-R1.
6. Citation
@misc{CodeV-R1-Distill-Qwen-7B,
author = {IPRC-DIP},
title = {CodeV Model Distilled from DeepSeek-R1},
url = {https://huggingface.co./zhuyaoyu/CodeV-R1-Distill-Qwen-7B},
year = {2025}
}
- Downloads last month
- 21