CodeV-R1-Distill-Qwen-7B

The paper is coming soon!

1. Introduction

The post-training phase of large language models (LLMs) has advanced rapidly, with models like OpenAI’s GPT-o1, DeepSeek-R1, and Kimi-1.5 showcasing remarkable reasoning capabilities. Notably, DeepSeek-R1 introduced a simple yet powerful rule-based reinforcement learning (RL) approach that enables the emergence of reasoning patterns. While these advancements have primarily targeted software programming languages, there is growing interest in adapting LLMs for hardware description languages (HDLs)—critical tools for chip design and hardware verification.

However, HDLs such as Verilog face challenges akin to low-resource languages, including limited high-quality instruction-following data and constrained model capabilities in generating accurate Register Transfer Level (RTL) code. These limitations hinder the performance and cross-language generalization of specialized code LLMs. To address this, we propose leveraging knowledge distillation to equip smaller, efficient models with DeepSeek-R1-like reasoning abilities.

As a continuation of the work initiated with CodeV, we introduce CodeV-R1-Distill-Qwen-7B, a model distilled from DeepSeek-R1 using our CodeV dataset. This model outperforms prior non-reasoning LLMs across major Verilog benchmarks, demonstrating superior code synthesis and problem-solving capabilities. Intriguingly, distilling Verilog code also enhances the model’s mathematical reasoning, suggesting broader synergies between hardware-centric training and general logical reasoning.

2. Model Summary

  • Data Preparation: Initially, we re-summarize and formulate questions from the original CodeV dataset utilizing Deepseek-v3. We then filter out straightforward problems—those solvable by Qwen2.5-Coder-7B-Instruct and Qwen2.5-Coder-32B-Instruct within five attempts—as well as having non-synthesizable issues. For the remaining data, we use DeepSeek-R1 to generate one response per question. Problems with a Rouge-L score greater than 0.5 compared to the benchmark-tested problems are also filtered out. After these processes, approximately 87,000 (problem, code) pairs remain.
  • Training: We employ LLaMAFactory to apply supervised fine-tuning (SFT) to Qwen2.5-Coder-7B-Instruct using this refined dataset of 87,000 pairs. Training is conducted over six epochs with a learning rate of 1e-5 and a batch size of 64.

3. Evaluation Results

During the evaluation phase, the maximum generation length is configured to 16,384 tokens. A temperature setting of 0.6 is applied, and 20 responses are generated per query to estimate the pass@1 score.

Our evaluation encompasses Verilog benchmarks, including VerilogEval and RTLLM. For VerilogEval v2, we examine zero-shot scenarios in both specification-to-RTL translation and code completion tasks. Concerning RTLLM, results are reported for version 1.1, which offers a broader spectrum of comparative analyses. Furthermore, we find that the acquisition of the reasoning process in Verilog problems, as facilitated by DeepSeek-R1, enhances the model's out-of-domain mathematical capabilities.

VerilogEval (v2)

Model Model size Type Spec-to-rtl Completion
GPT-4o Undisclosed General 62.5% 59.0%
GPT-4 Turbo Undisclosed General 61.1% 53.9%
GPT-4 Undisclosed General 32.0% 42.3%
Mistral Large Undisclosed General 37.5% 34.0%
Llama3.1 405B General 57.2% 56.4%
Llama3.1 70B General 42.8% 35.3%
Llama3 70B General 43.9% 37.8%
Llama2 70B General 5.3% 1.3%
Llama3.1 8B General 19.1% 2.6%
CodeLlama 70B Coding 34.9% 37.2%
DeepSeek Coder 33B Coding 21.7% 25.0%
CodeGemma 7B Coding 9.5% 8.3%
DeepSeek Coder 6.7B Coding 29.6% 24.4%
RTL-Coder 6.7B Verilog RTL 36.8% 35.9%
CodeV-R1-distill (ours) 7B Verilog RTL 65.4% 65.1%

RTLLM (v1.1)

Model Model size Type Pass@1
GPT-4o Undisclosed General 33.8%
GPT-3.5 Turbo Undisclosed General 28.3%
Llama3.1 405B General 38.9%
Nemotron-4 340B General 18.9%
Llama3.1 8B General 19.1%
CodeLlama 7B Coding 17.9%
CodeQwen 7B Coding 24.1%
Starcoder2 15B Coding 15.5%
DeepSeek Coder 6.7B Coding 23.1%
DeepSeek-Coder-V2 16B Coding 33.1%
DeepSeek-Coder-V2 236B Coding 34.5%
RTL-Coder 6.7B Verilog RTL 36.8%
CraftRTL 6.7B Verilog RTL 53.1%
CodeV-R1-distill (ours) 7B Verilog RTL 56.2%

Math

Model AIME Math AMC Minerva Olympiad Bench Average
Qwen2.5-7b-instruct-1M 11.25% 72.61% 41.11% 25.92% 34.66% 37.11%
Qwen2.5-math-7b-instruct 12.08% 82.25% 49.4% 27.64% 37.31% 41.74%
Qwen2.5-coder-7b-instruct (baseline) 5.63% 63.5% 35.62% 21.02% 28.64% 30.88%
CodeV-R1-distill (ours) 11.04% 74.35% 45.86% 25.79% 38.7% 39.15%

4. Usage

CodeV-R1-Distill-Qwen-7B can be utilized in the same manner as Qwen or Llama models.

For instance, you can easily start a service using vLLM:

vllm serve zhuyaoyu/CodeV-R1-Distill-Qwen-7B --tensor-parallel-size 2 --max-model-len 16384 --enforce-eager

Usage Recommendations

During training and evaluation, we use a system prompt

You are a helpful assistant. The assistant first thinks about the reasoning process in the mind and then provides the user with the answer. The reasoning process and answer are enclosed within <think> </think> and<answer> </answer> tags, respectively, i.e., <think> reasoning process here </think><answer> answer here </answer>.  Now the user asks you to write verilog code. After thinking, when you finally reach a conclusion, enclose the final verilog code in ```verilog ``` within <answer> </answer> tags. i.e., <answer> ```verilog\n module top_module(in, out, ...) ... ``` </answer>.\n

It is recommended to use this prompt.

5. License

CodeV-R1-Distill-Qwen-7B is derived from Qwen-2.5 series, which are originally licensed under Apache 2.0 License, and now finetuned with 87k samples curated with DeepSeek-R1.

6. Citation

@misc{CodeV-R1-Distill-Qwen-7B,
  author = {IPRC-DIP},
  title = {CodeV Model Distilled from DeepSeek-R1},
  url = {https://huggingface.co./zhuyaoyu/CodeV-R1-Distill-Qwen-7B},
  year = {2025}
}
Downloads last month
21
Safetensors
Model size
7.62B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for zhuyaoyu/CodeV-R1-Distill-Qwen-7B

Base model

Qwen/Qwen2.5-7B
Finetuned
(125)
this model