GreenMind-Medium-14B-R1

We release GreenMind-Medium-14B-R1, a medium-sized Vietnamese language model capable of effectively addressing questions that require intermediate-level reasoning, such as general knowledge, mathematics, natural science and social science topics. By leveraging the Group Relative Policy Optimization strategy for fine-tuning, we guide the model to generate logically coherent responses.

Model Description

Model Type: Causal Language Models
Base Model: Qwen/Qwen2.5-14B-Instruct
Parameters: 14.7B
Context Length: Full 131,072 tokens and generation 8192 tokens
Language: Vietnamese

Quickstart

Here provides a code snippet with apply_chat_template to show you how to load the tokenizer and model and how to generate contents.

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "GreenNode/GreenMind-Medium-14B-R1"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    revision='main',
    trust_remote_code=False,
)
prompt = r"""Vừa gà vừa chó
Bó lại cho tròn
Ba mươi sáu con
Một trăm chân chẵn
Hỏi có bao nhiêu con gà, bao nhiêu con chó?"""

messages = [
    {
    "role": "system",
    "content": "Bạn là một trợ lý ảo hữu ích trong việc trả lời câu hỏi. Hãy suy luận từng bước, và đưa ra đáp án trong thẻ <answer> </answer>."
    },
    {
    "role": "user",
    "content": f"{prompt} Hãy suy luận từng bước trong thẻ <think> </think>. Và trả về đáp án trong thẻ <answer> </answer>."
    },
    {
    "role": "assistant",
    "content": "Hãy để tôi giải quyết từng bước.\n<think>"
    }
]

text = tokenizer.apply_chat_template(
    messages, 
    tokenize=False, 
    continue_final_message=True)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=1024
)

generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
# Đầu tiên, chúng ta cần thiết lập hai phương trình dựa trên thông tin đề bài:
# 1. Tổng số con gà và chó là 36: x + y = 36
# 2. Tổng số chân là 100: 2x + 4y = 100
# Trong đó, x là số con gà và y là số con chó.
# Tiếp theo, chúng ta giải hệ phương trình này:
# Từ phương trình thứ nhất, ta có: x = 36 - y
# Thay vào phương trình thứ hai: 2(36 - y) + 4y = 100
# => 72 - 2y + 4y = 100
# => 2y = 28
# => y = 14 (số con chó)
# Thay y = 14 vào phương trình x + y = 36:
# => x = 36 - 14 = 22 (số con gà)
# Vậy, có 22 con gà và 14 con chó.
# </think>
# <answer>Có 22 con gà và 14 con chó.</answer>

Evaluation

Table 1. SeaExam Dataset. GreenMind-Medium-14B-R1 compared to base model and some models with larger size.

Model	SeaExam-ID	SeaExam-TH	SeaExam-VI	Avg
Meta-Llama-3.1-70B-Instruct	65.8	70.6	72.6	69.7
gemma3-27b-it	64.4	67.5	73.1	68.4
Qwen2.5-14B-Instruct	67.6	68.8	73.1	69.8
GreenMind-Medium-14B-R1	74.36	69.75	74.44	72.79

Table 2. VLSP 2023 Challenge: The performance of our model outperforms most SOTA models.

Model	ComprehensionQA-vi ↑	Exams-vi ↑	LAMBADA-vi ↓	WikiQA-vi ↑	MMLU-vi ↑
cpt-smartbot-13b	0.6633	0.3473	21.9864	0.4455	0.414
ura-llama-13b	0.6556	0.342	17.5614	0.438	0.3973
greennode-7b (prior work)	0.6122	0.2892	189.7782	0.3335	0.387
greennode-14b (prior work)	0.6711	0.3672	29.5967	0.468	0.5281
GreenMind-Medium-14B-R1 (Ours)	0.8689	0.7796	10.7609	0.7915	0.7124

Table 3. VMLU Dataset. The performance compared to fine-tuned models.

Model	Access	STEM	Social Science	Humanities	Others	Avg
VNPTAI.IO-Medium-R1	Private	77.09	82.3	78.85	69.98	77.43
MISA-Llama3-v1.1	Private	77.5	80.75	76.62	71.6	76.87
BnK-AI-Medium-v2	Private	80.94	80.76	70.7	74.06	76.66
VNPTAI.IO-Large-v4	Private	78.05	79.05	75.39	70.37	76.21
GreenNode-xMedium-v1	Private	75.7	81.09	75.25	69.33	75.5
GreenMind-Medium-14B-R1 (Ours)	Weight	76.78	77.36	72.32	69.03	74.29
CakebyVPBank-Large	Private	77.75	78.11	70.38	67.82	73.99
DeepSeek-R1-Distill-Llama-70B	Weight	76.77	76.23	67.98	66.82	72.41

https://x.com/greennode23

Support

https://discord.gg/B6MJFM3J3a

License

This repository and the model weights are licensed under the MIT License.

Citation

If you find our work helpful, feel free to give us a cite.

@misc{tung2025greenmindnextgenerationvietnameselarge,
      title={GreenMind: A Next-Generation Vietnamese Large Language Model for Structured and Logical Reasoning}, 
      author={Luu Quy Tung and Hoang Quoc Viet and Vo Trong Thu},
      year={2025},
      eprint={2504.16832},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2504.16832}, 
}

Contact Us

General & Collaboration: [email protected], [email protected]
Technical: [email protected]

GreenNode
/

GreenMind-Medium-14B-R1

GreenMind-Medium-14B-R1

Model Description

Quickstart

Evaluation

Follow us

Support

License

Citation

Contact Us

Model tree for GreenNode/GreenMind-Medium-14B-R1

Collection including GreenNode/GreenMind-Medium-14B-R1

GreenNode NIM