|
[INFO|2025-04-28 12:29:05] configuration_utils.py:696 >> loading configuration file config.json from cache at /home/kiho/.cache/huggingface/hub/models--Qwen--Qwen2.5-Coder-7B-Instruct/snapshots/c03e6d358207e414f1eca0bb1891e29f1db0e242/config.json |
|
|
|
[INFO|2025-04-28 12:29:05] configuration_utils.py:768 >> Model config Qwen2Config { |
|
"_name_or_path": "Qwen/Qwen2.5-Coder-7B-Instruct", |
|
"architectures": [ |
|
"Qwen2ForCausalLM" |
|
], |
|
"attention_dropout": 0.0, |
|
"bos_token_id": 151643, |
|
"eos_token_id": 151645, |
|
"hidden_act": "silu", |
|
"hidden_size": 3584, |
|
"initializer_range": 0.02, |
|
"intermediate_size": 18944, |
|
"max_position_embeddings": 32768, |
|
"max_window_layers": 28, |
|
"model_type": "qwen2", |
|
"num_attention_heads": 28, |
|
"num_hidden_layers": 28, |
|
"num_key_value_heads": 4, |
|
"rms_norm_eps": 1e-06, |
|
"rope_scaling": null, |
|
"rope_theta": 1000000.0, |
|
"sliding_window": null, |
|
"tie_word_embeddings": false, |
|
"torch_dtype": "bfloat16", |
|
"transformers_version": "4.48.2", |
|
"use_cache": true, |
|
"use_sliding_window": false, |
|
"vocab_size": 152064 |
|
} |
|
|
|
|
|
[INFO|2025-04-28 12:29:05] tokenization_utils_base.py:2034 >> loading file vocab.json from cache at /home/kiho/.cache/huggingface/hub/models--Qwen--Qwen2.5-Coder-7B-Instruct/snapshots/c03e6d358207e414f1eca0bb1891e29f1db0e242/vocab.json |
|
|
|
[INFO|2025-04-28 12:29:05] tokenization_utils_base.py:2034 >> loading file merges.txt from cache at /home/kiho/.cache/huggingface/hub/models--Qwen--Qwen2.5-Coder-7B-Instruct/snapshots/c03e6d358207e414f1eca0bb1891e29f1db0e242/merges.txt |
|
|
|
[INFO|2025-04-28 12:29:05] tokenization_utils_base.py:2034 >> loading file tokenizer.json from cache at /home/kiho/.cache/huggingface/hub/models--Qwen--Qwen2.5-Coder-7B-Instruct/snapshots/c03e6d358207e414f1eca0bb1891e29f1db0e242/tokenizer.json |
|
|
|
[INFO|2025-04-28 12:29:05] tokenization_utils_base.py:2034 >> loading file added_tokens.json from cache at None |
|
|
|
[INFO|2025-04-28 12:29:05] tokenization_utils_base.py:2034 >> loading file special_tokens_map.json from cache at None |
|
|
|
[INFO|2025-04-28 12:29:05] tokenization_utils_base.py:2034 >> loading file tokenizer_config.json from cache at /home/kiho/.cache/huggingface/hub/models--Qwen--Qwen2.5-Coder-7B-Instruct/snapshots/c03e6d358207e414f1eca0bb1891e29f1db0e242/tokenizer_config.json |
|
|
|
[INFO|2025-04-28 12:29:05] tokenization_utils_base.py:2034 >> loading file chat_template.jinja from cache at None |
|
|
|
[INFO|2025-04-28 12:29:05] tokenization_utils_base.py:2304 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. |
|
|
|
[INFO|2025-04-28 12:29:05] logging.py:157 >> Add <|im_end|> to stop words. |
|
|
|
[INFO|2025-04-28 12:29:05] logging.py:157 >> Loading dataset Codes_query_filtered_330k_ns.json... |
|
|
|
[INFO|2025-04-28 12:29:09] configuration_utils.py:696 >> loading configuration file config.json from cache at /home/kiho/.cache/huggingface/hub/models--Qwen--Qwen2.5-Coder-7B-Instruct/snapshots/c03e6d358207e414f1eca0bb1891e29f1db0e242/config.json |
|
|
|
[INFO|2025-04-28 12:29:09] configuration_utils.py:768 >> Model config Qwen2Config { |
|
"_name_or_path": "Qwen/Qwen2.5-Coder-7B-Instruct", |
|
"architectures": [ |
|
"Qwen2ForCausalLM" |
|
], |
|
"attention_dropout": 0.0, |
|
"bos_token_id": 151643, |
|
"eos_token_id": 151645, |
|
"hidden_act": "silu", |
|
"hidden_size": 3584, |
|
"initializer_range": 0.02, |
|
"intermediate_size": 18944, |
|
"max_position_embeddings": 32768, |
|
"max_window_layers": 28, |
|
"model_type": "qwen2", |
|
"num_attention_heads": 28, |
|
"num_hidden_layers": 28, |
|
"num_key_value_heads": 4, |
|
"rms_norm_eps": 1e-06, |
|
"rope_scaling": null, |
|
"rope_theta": 1000000.0, |
|
"sliding_window": null, |
|
"tie_word_embeddings": false, |
|
"torch_dtype": "bfloat16", |
|
"transformers_version": "4.48.2", |
|
"use_cache": true, |
|
"use_sliding_window": false, |
|
"vocab_size": 152064 |
|
} |
|
|
|
|
|
[WARNING|2025-04-28 12:29:09] logging.py:162 >> Input length is smaller than max length. Consider increase input length. |
|
|
|
[INFO|2025-04-28 12:29:09] logging.py:157 >> Using llama3 scaling strategy and setting scaling factor to 1.0. |
|
|
|
[INFO|2025-04-28 12:29:09] logging.py:157 >> Using block diagonal attention for sequence packing without cross-attention. |
|
|
|
[INFO|2025-04-28 12:29:09] logging.py:157 >> Liger kernel has been applied to the model. |
|
|
|
[INFO|2025-04-28 12:29:09] modeling_utils.py:3904 >> loading weights file model.safetensors from cache at /home/kiho/.cache/huggingface/hub/models--Qwen--Qwen2.5-Coder-7B-Instruct/snapshots/c03e6d358207e414f1eca0bb1891e29f1db0e242/model.safetensors.index.json |
|
|
|
[INFO|2025-04-28 12:29:09] modeling_utils.py:1582 >> Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16. |
|
|
|
[INFO|2025-04-28 12:29:09] configuration_utils.py:1140 >> Generate config GenerationConfig { |
|
"bos_token_id": 151643, |
|
"eos_token_id": 151645 |
|
} |
|
|
|
|
|
[INFO|2025-04-28 12:29:14] modeling_utils.py:4888 >> All model checkpoint weights were used when initializing Qwen2ForCausalLM. |
|
|
|
|
|
[INFO|2025-04-28 12:29:14] modeling_utils.py:4896 >> All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at Qwen/Qwen2.5-Coder-7B-Instruct. |
|
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training. |
|
|
|
[INFO|2025-04-28 12:29:14] configuration_utils.py:1095 >> loading configuration file generation_config.json from cache at /home/kiho/.cache/huggingface/hub/models--Qwen--Qwen2.5-Coder-7B-Instruct/snapshots/c03e6d358207e414f1eca0bb1891e29f1db0e242/generation_config.json |
|
|
|
[INFO|2025-04-28 12:29:14] configuration_utils.py:1140 >> Generate config GenerationConfig { |
|
"bos_token_id": 151643, |
|
"do_sample": true, |
|
"eos_token_id": [ |
|
151645, |
|
151643 |
|
], |
|
"pad_token_id": 151643, |
|
"repetition_penalty": 1.1, |
|
"temperature": 0.7, |
|
"top_k": 20, |
|
"top_p": 0.8 |
|
} |
|
|
|
|
|
[INFO|2025-04-28 12:29:14] logging.py:157 >> Gradient checkpointing enabled. |
|
|
|
[INFO|2025-04-28 12:29:14] logging.py:157 >> Using torch SDPA for faster training and inference. |
|
|
|
[INFO|2025-04-28 12:29:14] logging.py:157 >> Upcasting trainable params to float32. |
|
|
|
[INFO|2025-04-28 12:29:14] logging.py:157 >> Fine-tuning method: Freeze |
|
|
|
[INFO|2025-04-28 12:29:14] logging.py:157 >> Set trainable layers: .13.,.27. |
|
|
|
[INFO|2025-04-28 12:29:14] logging.py:157 >> trainable params: 466,115,584 || all params: 7,615,616,512 || trainable%: 6.1205 |
|
|
|
[INFO|2025-04-28 12:29:14] trainer.py:741 >> Using auto half precision backend |
|
|
|
[INFO|2025-04-28 12:29:15] logging.py:157 >> Found linear modules: up_proj,gate_proj,q_proj,k_proj,down_proj,v_proj,o_proj |
|
|
|
[INFO|2025-04-28 12:29:15] logging.py:157 >> Using APOLLO optimizer with args: {'rank': 256, 'proj': 'random', 'proj_type': 'std', 'update_proj_gap': 200, 'scale': 1, 'scale_type': 'channel', 'scale_front': False}. |
|
|
|
[INFO|2025-04-28 12:29:16] trainer.py:2369 >> ***** Running training ***** |
|
|
|
[INFO|2025-04-28 12:29:16] trainer.py:2370 >> Num examples = 51,880 |
|
|
|
[INFO|2025-04-28 12:29:16] trainer.py:2371 >> Num Epochs = 1 |
|
|
|
[INFO|2025-04-28 12:29:16] trainer.py:2372 >> Instantaneous batch size per device = 16 |
|
|
|
[INFO|2025-04-28 12:29:16] trainer.py:2375 >> Total train batch size (w. parallel, distributed & accumulation) = 512 |
|
|
|
[INFO|2025-04-28 12:29:16] trainer.py:2376 >> Gradient Accumulation steps = 8 |
|
|
|
[INFO|2025-04-28 12:29:16] trainer.py:2377 >> Total optimization steps = 101 |
|
|
|
[INFO|2025-04-28 12:29:16] trainer.py:2378 >> Number of trainable parameters = 466,115,584 |
|
|
|
[INFO|2025-04-28 12:32:00] logging.py:157 >> {'loss': 1.0213, 'learning_rate': 4.9988e-05, 'epoch': 0.01, 'throughput': 12907.24} |
|
|
|
[INFO|2025-04-28 12:34:34] logging.py:157 >> {'loss': 1.0136, 'learning_rate': 4.9952e-05, 'epoch': 0.02, 'throughput': 13247.81} |
|
|
|
[INFO|2025-04-28 12:37:08] logging.py:157 >> {'loss': 0.9404, 'learning_rate': 4.9891e-05, 'epoch': 0.03, 'throughput': 13369.60} |
|
|
|
[INFO|2025-04-28 12:39:42] logging.py:157 >> {'loss': 0.9530, 'learning_rate': 4.9807e-05, 'epoch': 0.04, 'throughput': 13426.82} |
|
|
|
[INFO|2025-04-28 12:42:16] logging.py:157 >> {'loss': 0.9453, 'learning_rate': 4.9698e-05, 'epoch': 0.05, 'throughput': 13463.63} |
|
|
|
[INFO|2025-04-28 12:44:50] logging.py:157 >> {'loss': 0.9111, 'learning_rate': 4.9566e-05, 'epoch': 0.06, 'throughput': 13485.87} |
|
|
|
[INFO|2025-04-28 12:47:24] logging.py:157 >> {'loss': 0.8780, 'learning_rate': 4.9410e-05, 'epoch': 0.07, 'throughput': 13503.29} |
|
|
|
[INFO|2025-04-28 12:49:59] logging.py:157 >> {'loss': 0.9140, 'learning_rate': 4.9230e-05, 'epoch': 0.08, 'throughput': 13515.40} |
|
|
|
[INFO|2025-04-28 12:52:33] logging.py:157 >> {'loss': 0.8649, 'learning_rate': 4.9027e-05, 'epoch': 0.09, 'throughput': 13523.38} |
|
|
|
[INFO|2025-04-28 12:55:07] logging.py:157 >> {'loss': 0.9029, 'learning_rate': 4.8800e-05, 'epoch': 0.10, 'throughput': 13530.26} |
|
|
|
[INFO|2025-04-28 12:57:42] logging.py:157 >> {'loss': 0.8820, 'learning_rate': 4.8551e-05, 'epoch': 0.11, 'throughput': 13535.32} |
|
|
|
[INFO|2025-04-28 13:00:16] logging.py:157 >> {'loss': 0.8438, 'learning_rate': 4.8279e-05, 'epoch': 0.12, 'throughput': 13536.90} |
|
|
|
[INFO|2025-04-28 13:02:51] logging.py:157 >> {'loss': 0.8743, 'learning_rate': 4.7984e-05, 'epoch': 0.13, 'throughput': 13535.86} |
|
|
|
[INFO|2025-04-28 13:05:26] logging.py:157 >> {'loss': 0.8736, 'learning_rate': 4.7667e-05, 'epoch': 0.14, 'throughput': 13539.44} |
|
|
|
[INFO|2025-04-28 13:08:00] logging.py:157 >> {'loss': 0.8687, 'learning_rate': 4.7328e-05, 'epoch': 0.15, 'throughput': 13542.72} |
|
|
|
[INFO|2025-04-28 13:10:34] logging.py:157 >> {'loss': 0.8640, 'learning_rate': 4.6967e-05, 'epoch': 0.16, 'throughput': 13545.64} |
|
|
|
[INFO|2025-04-28 13:13:09] logging.py:157 >> {'loss': 0.8877, 'learning_rate': 4.6586e-05, 'epoch': 0.17, 'throughput': 13548.59} |
|
|
|
[INFO|2025-04-28 13:15:43] logging.py:157 >> {'loss': 0.8749, 'learning_rate': 4.6183e-05, 'epoch': 0.18, 'throughput': 13551.29} |
|
|
|
[INFO|2025-04-28 13:18:17] logging.py:157 >> {'loss': 0.8338, 'learning_rate': 4.5760e-05, 'epoch': 0.19, 'throughput': 13551.85} |
|
|
|
[INFO|2025-04-28 13:20:52] logging.py:157 >> {'loss': 0.8294, 'learning_rate': 4.5316e-05, 'epoch': 0.20, 'throughput': 13552.35} |
|
|
|
[INFO|2025-04-28 13:23:27] logging.py:157 >> {'loss': 0.8666, 'learning_rate': 4.4854e-05, 'epoch': 0.21, 'throughput': 13553.49} |
|
|
|
[INFO|2025-04-28 13:26:01] logging.py:157 >> {'loss': 0.8038, 'learning_rate': 4.4371e-05, 'epoch': 0.22, 'throughput': 13554.09} |
|
|
|
[INFO|2025-04-28 13:28:36] logging.py:157 >> {'loss': 0.8492, 'learning_rate': 4.3871e-05, 'epoch': 0.23, 'throughput': 13554.87} |
|
|
|
[INFO|2025-04-28 13:31:10] logging.py:157 >> {'loss': 0.8047, 'learning_rate': 4.3351e-05, 'epoch': 0.24, 'throughput': 13555.65} |
|
|
|
[INFO|2025-04-28 13:33:45] logging.py:157 >> {'loss': 0.8521, 'learning_rate': 4.2815e-05, 'epoch': 0.25, 'throughput': 13556.20} |
|
|
|
[INFO|2025-04-28 13:36:19] logging.py:157 >> {'loss': 0.8133, 'learning_rate': 4.2261e-05, 'epoch': 0.26, 'throughput': 13556.46} |
|
|
|
[INFO|2025-04-28 13:38:53] logging.py:157 >> {'loss': 0.8365, 'learning_rate': 4.1690e-05, 'epoch': 0.27, 'throughput': 13559.29} |
|
|
|
[INFO|2025-04-28 13:41:27] logging.py:157 >> {'loss': 0.8094, 'learning_rate': 4.1103e-05, 'epoch': 0.28, 'throughput': 13563.03} |
|
|
|
[INFO|2025-04-28 13:44:00] logging.py:157 >> {'loss': 0.8174, 'learning_rate': 4.0500e-05, 'epoch': 0.29, 'throughput': 13565.42} |
|
|
|
[INFO|2025-04-28 13:46:35] logging.py:157 >> {'loss': 0.8278, 'learning_rate': 3.9883e-05, 'epoch': 0.30, 'throughput': 13566.62} |
|
|
|
[INFO|2025-04-28 13:49:08] logging.py:157 >> {'loss': 0.8425, 'learning_rate': 3.9251e-05, 'epoch': 0.31, 'throughput': 13569.29} |
|
|
|
[INFO|2025-04-28 13:51:42] logging.py:157 >> {'loss': 0.8145, 'learning_rate': 3.8605e-05, 'epoch': 0.32, 'throughput': 13570.89} |
|
|
|
[INFO|2025-04-28 13:54:16] logging.py:157 >> {'loss': 0.8322, 'learning_rate': 3.7946e-05, 'epoch': 0.33, 'throughput': 13573.07} |
|
|
|
[INFO|2025-04-28 13:56:51] logging.py:157 >> {'loss': 0.8114, 'learning_rate': 3.7275e-05, 'epoch': 0.34, 'throughput': 13572.53} |
|
|
|
[INFO|2025-04-28 13:59:25] logging.py:157 >> {'loss': 0.8091, 'learning_rate': 3.6592e-05, 'epoch': 0.35, 'throughput': 13572.13} |
|
|
|
[INFO|2025-04-28 14:01:59] logging.py:157 >> {'loss': 0.7743, 'learning_rate': 3.5897e-05, 'epoch': 0.36, 'throughput': 13573.61} |
|
|
|
[INFO|2025-04-28 14:04:33] logging.py:157 >> {'loss': 0.8368, 'learning_rate': 3.5192e-05, 'epoch': 0.36, 'throughput': 13575.22} |
|
|
|
[INFO|2025-04-28 14:07:07] logging.py:157 >> {'loss': 0.8177, 'learning_rate': 3.4477e-05, 'epoch': 0.37, 'throughput': 13576.95} |
|
|
|
[INFO|2025-04-28 14:09:41] logging.py:157 >> {'loss': 0.8109, 'learning_rate': 3.3753e-05, 'epoch': 0.38, 'throughput': 13577.04} |
|
|
|
[INFO|2025-04-28 14:12:15] logging.py:157 >> {'loss': 0.8270, 'learning_rate': 3.3021e-05, 'epoch': 0.39, 'throughput': 13578.09} |
|
|
|
[INFO|2025-04-28 14:14:49] logging.py:157 >> {'loss': 0.8167, 'learning_rate': 3.2280e-05, 'epoch': 0.40, 'throughput': 13579.75} |
|
|
|
[INFO|2025-04-28 14:17:22] logging.py:157 >> {'loss': 0.8073, 'learning_rate': 3.1533e-05, 'epoch': 0.41, 'throughput': 13581.88} |
|
|
|
[INFO|2025-04-28 14:19:56] logging.py:157 >> {'loss': 0.7793, 'learning_rate': 3.0779e-05, 'epoch': 0.42, 'throughput': 13583.46} |
|
|
|
[INFO|2025-04-28 14:22:30] logging.py:157 >> {'loss': 0.8096, 'learning_rate': 3.0020e-05, 'epoch': 0.43, 'throughput': 13585.16} |
|
|
|
[INFO|2025-04-28 14:25:03] logging.py:157 >> {'loss': 0.8212, 'learning_rate': 2.9256e-05, 'epoch': 0.44, 'throughput': 13586.73} |
|
|
|
[INFO|2025-04-28 14:27:38] logging.py:157 >> {'loss': 0.8151, 'learning_rate': 2.8488e-05, 'epoch': 0.45, 'throughput': 13586.13} |
|
|
|
[INFO|2025-04-28 14:30:11] logging.py:157 >> {'loss': 0.8331, 'learning_rate': 2.7716e-05, 'epoch': 0.46, 'throughput': 13587.29} |
|
|
|
[INFO|2025-04-28 14:32:46] logging.py:157 >> {'loss': 0.8003, 'learning_rate': 2.6942e-05, 'epoch': 0.47, 'throughput': 13587.74} |
|
|
|
[INFO|2025-04-28 14:35:19] logging.py:157 >> {'loss': 0.8214, 'learning_rate': 2.6166e-05, 'epoch': 0.48, 'throughput': 13588.82} |
|
|
|
[INFO|2025-04-28 14:37:53] logging.py:157 >> {'loss': 0.8118, 'learning_rate': 2.5389e-05, 'epoch': 0.49, 'throughput': 13589.60} |
|
|
|
[INFO|2025-04-28 14:40:27] logging.py:157 >> {'loss': 0.8382, 'learning_rate': 2.4611e-05, 'epoch': 0.50, 'throughput': 13590.81} |
|
|
|
[INFO|2025-04-28 14:43:01] logging.py:157 >> {'loss': 0.8099, 'learning_rate': 2.3834e-05, 'epoch': 0.51, 'throughput': 13590.72} |
|
|
|
[INFO|2025-04-28 14:45:35] logging.py:157 >> {'loss': 0.7914, 'learning_rate': 2.3058e-05, 'epoch': 0.52, 'throughput': 13591.47} |
|
|
|
[INFO|2025-04-28 14:48:09] logging.py:157 >> {'loss': 0.8104, 'learning_rate': 2.2284e-05, 'epoch': 0.53, 'throughput': 13592.65} |
|
|
|
[INFO|2025-04-28 14:50:42] logging.py:157 >> {'loss': 0.8125, 'learning_rate': 2.1512e-05, 'epoch': 0.54, 'throughput': 13593.72} |
|
|
|
[INFO|2025-04-28 14:53:16] logging.py:157 >> {'loss': 0.8198, 'learning_rate': 2.0744e-05, 'epoch': 0.55, 'throughput': 13594.46} |
|
|
|
[INFO|2025-04-28 14:55:50] logging.py:157 >> {'loss': 0.8019, 'learning_rate': 1.9980e-05, 'epoch': 0.56, 'throughput': 13594.62} |
|
|
|
[INFO|2025-04-28 14:58:24] logging.py:157 >> {'loss': 0.8141, 'learning_rate': 1.9221e-05, 'epoch': 0.57, 'throughput': 13595.48} |
|
|
|
[INFO|2025-04-28 15:00:58] logging.py:157 >> {'loss': 0.7985, 'learning_rate': 1.8467e-05, 'epoch': 0.58, 'throughput': 13596.37} |
|
|
|
[INFO|2025-04-28 15:03:31] logging.py:157 >> {'loss': 0.7998, 'learning_rate': 1.7720e-05, 'epoch': 0.59, 'throughput': 13597.32} |
|
|
|
[INFO|2025-04-28 15:06:05] logging.py:157 >> {'loss': 0.7988, 'learning_rate': 1.6979e-05, 'epoch': 0.60, 'throughput': 13597.98} |
|
|
|
[INFO|2025-04-28 15:08:39] logging.py:157 >> {'loss': 0.8016, 'learning_rate': 1.6247e-05, 'epoch': 0.61, 'throughput': 13598.66} |
|
|
|
[INFO|2025-04-28 15:11:12] logging.py:157 >> {'loss': 0.8162, 'learning_rate': 1.5523e-05, 'epoch': 0.62, 'throughput': 13599.73} |
|
|
|
[INFO|2025-04-28 15:13:46] logging.py:157 >> {'loss': 0.8258, 'learning_rate': 1.4808e-05, 'epoch': 0.63, 'throughput': 13600.66} |
|
|
|
[INFO|2025-04-28 15:16:19] logging.py:157 >> {'loss': 0.8063, 'learning_rate': 1.4103e-05, 'epoch': 0.64, 'throughput': 13601.35} |
|
|
|
[INFO|2025-04-28 15:18:53] logging.py:157 >> {'loss': 0.8116, 'learning_rate': 1.3408e-05, 'epoch': 0.65, 'throughput': 13601.79} |
|
|
|
[INFO|2025-04-28 15:21:27] logging.py:157 >> {'loss': 0.7850, 'learning_rate': 1.2725e-05, 'epoch': 0.66, 'throughput': 13602.47} |
|
|
|
[INFO|2025-04-28 15:24:01] logging.py:157 >> {'loss': 0.8049, 'learning_rate': 1.2054e-05, 'epoch': 0.67, 'throughput': 13602.60} |
|
|
|
[INFO|2025-04-28 15:26:35] logging.py:157 >> {'loss': 0.8034, 'learning_rate': 1.1395e-05, 'epoch': 0.68, 'throughput': 13603.14} |
|
|
|
[INFO|2025-04-28 15:29:08] logging.py:157 >> {'loss': 0.7949, 'learning_rate': 1.0749e-05, 'epoch': 0.69, 'throughput': 13603.86} |
|
|
|
[INFO|2025-04-28 15:31:42] logging.py:157 >> {'loss': 0.8024, 'learning_rate': 1.0117e-05, 'epoch': 0.70, 'throughput': 13603.82} |
|
|
|
[INFO|2025-04-28 15:34:17] logging.py:157 >> {'loss': 0.7608, 'learning_rate': 9.4998e-06, 'epoch': 0.71, 'throughput': 13603.58} |
|
|
|
[INFO|2025-04-28 15:36:51] logging.py:157 >> {'loss': 0.8012, 'learning_rate': 8.8972e-06, 'epoch': 0.72, 'throughput': 13603.80} |
|
|
|
[INFO|2025-04-28 15:39:25] logging.py:157 >> {'loss': 0.7688, 'learning_rate': 8.3103e-06, 'epoch': 0.73, 'throughput': 13604.36} |
|
|
|
[INFO|2025-04-28 15:41:58] logging.py:157 >> {'loss': 0.8023, 'learning_rate': 7.7395e-06, 'epoch': 0.74, 'throughput': 13604.88} |
|
|
|
[INFO|2025-04-28 15:44:32] logging.py:157 >> {'loss': 0.7809, 'learning_rate': 7.1854e-06, 'epoch': 0.75, 'throughput': 13605.47} |
|
|
|
[INFO|2025-04-28 15:47:05] logging.py:157 >> {'loss': 0.8083, 'learning_rate': 6.6485e-06, 'epoch': 0.76, 'throughput': 13606.21} |
|
|
|
[INFO|2025-04-28 15:49:39] logging.py:157 >> {'loss': 0.7903, 'learning_rate': 6.1294e-06, 'epoch': 0.77, 'throughput': 13606.72} |
|
|
|
[INFO|2025-04-28 15:52:13] logging.py:157 >> {'loss': 0.7904, 'learning_rate': 5.6286e-06, 'epoch': 0.78, 'throughput': 13607.18} |
|
|
|
[INFO|2025-04-28 15:54:46] logging.py:157 >> {'loss': 0.7970, 'learning_rate': 5.1465e-06, 'epoch': 0.79, 'throughput': 13607.74} |
|
|
|
[INFO|2025-04-28 15:57:20] logging.py:157 >> {'loss': 0.7636, 'learning_rate': 4.6836e-06, 'epoch': 0.80, 'throughput': 13608.08} |
|
|
|
[INFO|2025-04-28 15:59:55] logging.py:157 >> {'loss': 0.7818, 'learning_rate': 4.2403e-06, 'epoch': 0.81, 'throughput': 13607.00} |
|
|
|
[INFO|2025-04-28 16:02:29] logging.py:157 >> {'loss': 0.7914, 'learning_rate': 3.8171e-06, 'epoch': 0.82, 'throughput': 13607.35} |
|
|
|
[INFO|2025-04-28 16:05:03] logging.py:157 >> {'loss': 0.7985, 'learning_rate': 3.4145e-06, 'epoch': 0.83, 'throughput': 13607.72} |
|
|
|
[INFO|2025-04-28 16:07:36] logging.py:157 >> {'loss': 0.7868, 'learning_rate': 3.0327e-06, 'epoch': 0.84, 'throughput': 13608.19} |
|
|
|
[INFO|2025-04-28 16:10:10] logging.py:157 >> {'loss': 0.7972, 'learning_rate': 2.6721e-06, 'epoch': 0.85, 'throughput': 13608.54} |
|
|
|
[INFO|2025-04-28 16:12:44] logging.py:157 >> {'loss': 0.7933, 'learning_rate': 2.3332e-06, 'epoch': 0.86, 'throughput': 13609.13} |
|
|
|
[INFO|2025-04-28 16:15:17] logging.py:157 >> {'loss': 0.7695, 'learning_rate': 2.0162e-06, 'epoch': 0.87, 'throughput': 13609.89} |
|
|
|
[INFO|2025-04-28 16:17:51] logging.py:157 >> {'loss': 0.7929, 'learning_rate': 1.7214e-06, 'epoch': 0.88, 'throughput': 13609.74} |
|
|
|
[INFO|2025-04-28 16:20:25] logging.py:157 >> {'loss': 0.7995, 'learning_rate': 1.4491e-06, 'epoch': 0.89, 'throughput': 13610.21} |
|
|
|
[INFO|2025-04-28 16:22:59] logging.py:157 >> {'loss': 0.7857, 'learning_rate': 1.1997e-06, 'epoch': 0.90, 'throughput': 13610.41} |
|
|
|
[INFO|2025-04-28 16:25:33] logging.py:157 >> {'loss': 0.8215, 'learning_rate': 9.7323e-07, 'epoch': 0.91, 'throughput': 13610.73} |
|
|
|
[INFO|2025-04-28 16:28:06] logging.py:157 >> {'loss': 0.7931, 'learning_rate': 7.7003e-07, 'epoch': 0.92, 'throughput': 13611.14} |
|
|
|
[INFO|2025-04-28 16:30:40] logging.py:157 >> {'loss': 0.7927, 'learning_rate': 5.9026e-07, 'epoch': 0.93, 'throughput': 13611.37} |
|
|
|
[INFO|2025-04-28 16:33:14] logging.py:157 >> {'loss': 0.7787, 'learning_rate': 4.3412e-07, 'epoch': 0.94, 'throughput': 13611.55} |
|
|
|
[INFO|2025-04-28 16:35:49] logging.py:157 >> {'loss': 0.7771, 'learning_rate': 3.0174e-07, 'epoch': 0.95, 'throughput': 13610.50} |
|
|
|
[INFO|2025-04-28 16:38:24] logging.py:157 >> {'loss': 0.7931, 'learning_rate': 1.9325e-07, 'epoch': 0.96, 'throughput': 13610.20} |
|
|
|
[INFO|2025-04-28 16:40:58] logging.py:157 >> {'loss': 0.7902, 'learning_rate': 1.0877e-07, 'epoch': 0.97, 'throughput': 13609.88} |
|
|
|
[INFO|2025-04-28 16:43:32] logging.py:157 >> {'loss': 0.7906, 'learning_rate': 4.8360e-08, 'epoch': 0.98, 'throughput': 13610.30} |
|
|
|
[INFO|2025-04-28 16:46:05] logging.py:157 >> {'loss': 0.8083, 'learning_rate': 1.2093e-08, 'epoch': 0.99, 'throughput': 13610.82} |
|
|
|
[INFO|2025-04-28 16:48:39] logging.py:157 >> {'loss': 0.7963, 'learning_rate': 0.0000e+00, 'epoch': 1.00, 'throughput': 13611.38} |
|
|
|
[INFO|2025-04-28 16:48:39] trainer.py:3910 >> Saving model checkpoint to saves/Qwen2.5-Coder-7B-Instruct/freeze/qwen_ns/checkpoint-101 |
|
|
|
[INFO|2025-04-28 16:48:39] configuration_utils.py:420 >> Configuration saved in saves/Qwen2.5-Coder-7B-Instruct/freeze/qwen_ns/checkpoint-101/config.json |
|
|
|
[INFO|2025-04-28 16:48:39] configuration_utils.py:909 >> Configuration saved in saves/Qwen2.5-Coder-7B-Instruct/freeze/qwen_ns/checkpoint-101/generation_config.json |
|
|
|
[INFO|2025-04-28 16:49:02] modeling_utils.py:2996 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 4 checkpoint shards. You can find where each parameters has been saved in the index located at saves/Qwen2.5-Coder-7B-Instruct/freeze/qwen_ns/checkpoint-101/model.safetensors.index.json. |
|
|
|
[INFO|2025-04-28 16:49:02] tokenization_utils_base.py:2491 >> tokenizer config file saved in saves/Qwen2.5-Coder-7B-Instruct/freeze/qwen_ns/checkpoint-101/tokenizer_config.json |
|
|
|
[INFO|2025-04-28 16:49:02] tokenization_utils_base.py:2500 >> Special tokens file saved in saves/Qwen2.5-Coder-7B-Instruct/freeze/qwen_ns/checkpoint-101/special_tokens_map.json |
|
|
|
[INFO|2025-04-28 16:49:03] trainer.py:2643 >> |
|
|
|
Training completed. Do not forget to share your model on huggingface.co/models =) |
|
|
|
|
|
|
|
[INFO|2025-04-28 16:49:03] trainer.py:3910 >> Saving model checkpoint to saves/Qwen2.5-Coder-7B-Instruct/freeze/qwen_ns |
|
|
|
[INFO|2025-04-28 16:49:03] configuration_utils.py:420 >> Configuration saved in saves/Qwen2.5-Coder-7B-Instruct/freeze/qwen_ns/config.json |
|
|
|
[INFO|2025-04-28 16:49:03] configuration_utils.py:909 >> Configuration saved in saves/Qwen2.5-Coder-7B-Instruct/freeze/qwen_ns/generation_config.json |
|
|
|
[INFO|2025-04-28 16:49:26] modeling_utils.py:2996 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 4 checkpoint shards. You can find where each parameters has been saved in the index located at saves/Qwen2.5-Coder-7B-Instruct/freeze/qwen_ns/model.safetensors.index.json. |
|
|
|
[INFO|2025-04-28 16:49:26] tokenization_utils_base.py:2491 >> tokenizer config file saved in saves/Qwen2.5-Coder-7B-Instruct/freeze/qwen_ns/tokenizer_config.json |
|
|
|
[INFO|2025-04-28 16:49:26] tokenization_utils_base.py:2500 >> Special tokens file saved in saves/Qwen2.5-Coder-7B-Instruct/freeze/qwen_ns/special_tokens_map.json |
|
|
|
[WARNING|2025-04-28 16:49:26] logging.py:162 >> No metric eval_loss to plot. |
|
|
|
[WARNING|2025-04-28 16:49:26] logging.py:162 >> No metric eval_accuracy to plot. |
|
|
|
[INFO|2025-04-28 16:49:26] modelcard.py:449 >> Dropping the following result as it does not have all the necessary fields: |
|
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}} |
|
|
|
|