Qwen2.5-coder-7B-Instruct-query_ns / running_log.txt

Upload folder using huggingface_hub

c3f3d57 verified 1 day ago

23.4 kB

	[INFO\|2025-04-28 12:29:05] configuration_utils.py:696 >> loading configuration file config.json from cache at /home/kiho/.cache/huggingface/hub/models--Qwen--Qwen2.5-Coder-7B-Instruct/snapshots/c03e6d358207e414f1eca0bb1891e29f1db0e242/config.json

	[INFO\|2025-04-28 12:29:05] configuration_utils.py:768 >> Model config Qwen2Config {
	"_name_or_path": "Qwen/Qwen2.5-Coder-7B-Instruct",
	"architectures": [
	"Qwen2ForCausalLM"
	],
	"attention_dropout": 0.0,
	"bos_token_id": 151643,
	"eos_token_id": 151645,
	"hidden_act": "silu",
	"hidden_size": 3584,
	"initializer_range": 0.02,
	"intermediate_size": 18944,
	"max_position_embeddings": 32768,
	"max_window_layers": 28,
	"model_type": "qwen2",
	"num_attention_heads": 28,
	"num_hidden_layers": 28,
	"num_key_value_heads": 4,
	"rms_norm_eps": 1e-06,
	"rope_scaling": null,
	"rope_theta": 1000000.0,
	"sliding_window": null,
	"tie_word_embeddings": false,
	"torch_dtype": "bfloat16",
	"transformers_version": "4.48.2",
	"use_cache": true,
	"use_sliding_window": false,
	"vocab_size": 152064
	}


	[INFO\|2025-04-28 12:29:05] tokenization_utils_base.py:2034 >> loading file vocab.json from cache at /home/kiho/.cache/huggingface/hub/models--Qwen--Qwen2.5-Coder-7B-Instruct/snapshots/c03e6d358207e414f1eca0bb1891e29f1db0e242/vocab.json

	[INFO\|2025-04-28 12:29:05] tokenization_utils_base.py:2034 >> loading file merges.txt from cache at /home/kiho/.cache/huggingface/hub/models--Qwen--Qwen2.5-Coder-7B-Instruct/snapshots/c03e6d358207e414f1eca0bb1891e29f1db0e242/merges.txt

	[INFO\|2025-04-28 12:29:05] tokenization_utils_base.py:2034 >> loading file tokenizer.json from cache at /home/kiho/.cache/huggingface/hub/models--Qwen--Qwen2.5-Coder-7B-Instruct/snapshots/c03e6d358207e414f1eca0bb1891e29f1db0e242/tokenizer.json

	[INFO\|2025-04-28 12:29:05] tokenization_utils_base.py:2034 >> loading file added_tokens.json from cache at None

	[INFO\|2025-04-28 12:29:05] tokenization_utils_base.py:2034 >> loading file special_tokens_map.json from cache at None

	[INFO\|2025-04-28 12:29:05] tokenization_utils_base.py:2034 >> loading file tokenizer_config.json from cache at /home/kiho/.cache/huggingface/hub/models--Qwen--Qwen2.5-Coder-7B-Instruct/snapshots/c03e6d358207e414f1eca0bb1891e29f1db0e242/tokenizer_config.json

	[INFO\|2025-04-28 12:29:05] tokenization_utils_base.py:2034 >> loading file chat_template.jinja from cache at None

	[INFO\|2025-04-28 12:29:05] tokenization_utils_base.py:2304 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

	[INFO\|2025-04-28 12:29:05] logging.py:157 >> Add <\|im_end\|> to stop words.

	[INFO\|2025-04-28 12:29:05] logging.py:157 >> Loading dataset Codes_query_filtered_330k_ns.json...

	[INFO\|2025-04-28 12:29:09] configuration_utils.py:696 >> loading configuration file config.json from cache at /home/kiho/.cache/huggingface/hub/models--Qwen--Qwen2.5-Coder-7B-Instruct/snapshots/c03e6d358207e414f1eca0bb1891e29f1db0e242/config.json

	[INFO\|2025-04-28 12:29:09] configuration_utils.py:768 >> Model config Qwen2Config {
	"_name_or_path": "Qwen/Qwen2.5-Coder-7B-Instruct",
	"architectures": [
	"Qwen2ForCausalLM"
	],
	"attention_dropout": 0.0,
	"bos_token_id": 151643,
	"eos_token_id": 151645,
	"hidden_act": "silu",
	"hidden_size": 3584,
	"initializer_range": 0.02,
	"intermediate_size": 18944,
	"max_position_embeddings": 32768,
	"max_window_layers": 28,
	"model_type": "qwen2",
	"num_attention_heads": 28,
	"num_hidden_layers": 28,
	"num_key_value_heads": 4,
	"rms_norm_eps": 1e-06,
	"rope_scaling": null,
	"rope_theta": 1000000.0,
	"sliding_window": null,
	"tie_word_embeddings": false,
	"torch_dtype": "bfloat16",
	"transformers_version": "4.48.2",
	"use_cache": true,
	"use_sliding_window": false,
	"vocab_size": 152064
	}


	[WARNING\|2025-04-28 12:29:09] logging.py:162 >> Input length is smaller than max length. Consider increase input length.

	[INFO\|2025-04-28 12:29:09] logging.py:157 >> Using llama3 scaling strategy and setting scaling factor to 1.0.

	[INFO\|2025-04-28 12:29:09] logging.py:157 >> Using block diagonal attention for sequence packing without cross-attention.

	[INFO\|2025-04-28 12:29:09] logging.py:157 >> Liger kernel has been applied to the model.

	[INFO\|2025-04-28 12:29:09] modeling_utils.py:3904 >> loading weights file model.safetensors from cache at /home/kiho/.cache/huggingface/hub/models--Qwen--Qwen2.5-Coder-7B-Instruct/snapshots/c03e6d358207e414f1eca0bb1891e29f1db0e242/model.safetensors.index.json

	[INFO\|2025-04-28 12:29:09] modeling_utils.py:1582 >> Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16.

	[INFO\|2025-04-28 12:29:09] configuration_utils.py:1140 >> Generate config GenerationConfig {
	"bos_token_id": 151643,
	"eos_token_id": 151645
	}


	[INFO\|2025-04-28 12:29:14] modeling_utils.py:4888 >> All model checkpoint weights were used when initializing Qwen2ForCausalLM.


	[INFO\|2025-04-28 12:29:14] modeling_utils.py:4896 >> All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at Qwen/Qwen2.5-Coder-7B-Instruct.
	If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training.

	[INFO\|2025-04-28 12:29:14] configuration_utils.py:1095 >> loading configuration file generation_config.json from cache at /home/kiho/.cache/huggingface/hub/models--Qwen--Qwen2.5-Coder-7B-Instruct/snapshots/c03e6d358207e414f1eca0bb1891e29f1db0e242/generation_config.json

	[INFO\|2025-04-28 12:29:14] configuration_utils.py:1140 >> Generate config GenerationConfig {
	"bos_token_id": 151643,
	"do_sample": true,
	"eos_token_id": [
	151645,
	151643
	],
	"pad_token_id": 151643,
	"repetition_penalty": 1.1,
	"temperature": 0.7,
	"top_k": 20,
	"top_p": 0.8
	}


	[INFO\|2025-04-28 12:29:14] logging.py:157 >> Gradient checkpointing enabled.

	[INFO\|2025-04-28 12:29:14] logging.py:157 >> Using torch SDPA for faster training and inference.

	[INFO\|2025-04-28 12:29:14] logging.py:157 >> Upcasting trainable params to float32.

	[INFO\|2025-04-28 12:29:14] logging.py:157 >> Fine-tuning method: Freeze

	[INFO\|2025-04-28 12:29:14] logging.py:157 >> Set trainable layers: .13.,.27.

	[INFO\|2025-04-28 12:29:14] logging.py:157 >> trainable params: 466,115,584 \|\| all params: 7,615,616,512 \|\| trainable%: 6.1205

	[INFO\|2025-04-28 12:29:14] trainer.py:741 >> Using auto half precision backend

	[INFO\|2025-04-28 12:29:15] logging.py:157 >> Found linear modules: up_proj,gate_proj,q_proj,k_proj,down_proj,v_proj,o_proj

	[INFO\|2025-04-28 12:29:15] logging.py:157 >> Using APOLLO optimizer with args: {'rank': 256, 'proj': 'random', 'proj_type': 'std', 'update_proj_gap': 200, 'scale': 1, 'scale_type': 'channel', 'scale_front': False}.

	[INFO\|2025-04-28 12:29:16] trainer.py:2369 >> *** Running training ***

	[INFO\|2025-04-28 12:29:16] trainer.py:2370 >> Num examples = 51,880

	[INFO\|2025-04-28 12:29:16] trainer.py:2371 >> Num Epochs = 1

	[INFO\|2025-04-28 12:29:16] trainer.py:2372 >> Instantaneous batch size per device = 16

	[INFO\|2025-04-28 12:29:16] trainer.py:2375 >> Total train batch size (w. parallel, distributed & accumulation) = 512

	[INFO\|2025-04-28 12:29:16] trainer.py:2376 >> Gradient Accumulation steps = 8

	[INFO\|2025-04-28 12:29:16] trainer.py:2377 >> Total optimization steps = 101

	[INFO\|2025-04-28 12:29:16] trainer.py:2378 >> Number of trainable parameters = 466,115,584

	[INFO\|2025-04-28 12:32:00] logging.py:157 >> {'loss': 1.0213, 'learning_rate': 4.9988e-05, 'epoch': 0.01, 'throughput': 12907.24}

	[INFO\|2025-04-28 12:34:34] logging.py:157 >> {'loss': 1.0136, 'learning_rate': 4.9952e-05, 'epoch': 0.02, 'throughput': 13247.81}

	[INFO\|2025-04-28 12:37:08] logging.py:157 >> {'loss': 0.9404, 'learning_rate': 4.9891e-05, 'epoch': 0.03, 'throughput': 13369.60}

	[INFO\|2025-04-28 12:39:42] logging.py:157 >> {'loss': 0.9530, 'learning_rate': 4.9807e-05, 'epoch': 0.04, 'throughput': 13426.82}

	[INFO\|2025-04-28 12:42:16] logging.py:157 >> {'loss': 0.9453, 'learning_rate': 4.9698e-05, 'epoch': 0.05, 'throughput': 13463.63}

	[INFO\|2025-04-28 12:44:50] logging.py:157 >> {'loss': 0.9111, 'learning_rate': 4.9566e-05, 'epoch': 0.06, 'throughput': 13485.87}

	[INFO\|2025-04-28 12:47:24] logging.py:157 >> {'loss': 0.8780, 'learning_rate': 4.9410e-05, 'epoch': 0.07, 'throughput': 13503.29}

	[INFO\|2025-04-28 12:49:59] logging.py:157 >> {'loss': 0.9140, 'learning_rate': 4.9230e-05, 'epoch': 0.08, 'throughput': 13515.40}

	[INFO\|2025-04-28 12:52:33] logging.py:157 >> {'loss': 0.8649, 'learning_rate': 4.9027e-05, 'epoch': 0.09, 'throughput': 13523.38}

	[INFO\|2025-04-28 12:55:07] logging.py:157 >> {'loss': 0.9029, 'learning_rate': 4.8800e-05, 'epoch': 0.10, 'throughput': 13530.26}

	[INFO\|2025-04-28 12:57:42] logging.py:157 >> {'loss': 0.8820, 'learning_rate': 4.8551e-05, 'epoch': 0.11, 'throughput': 13535.32}

	[INFO\|2025-04-28 13:00:16] logging.py:157 >> {'loss': 0.8438, 'learning_rate': 4.8279e-05, 'epoch': 0.12, 'throughput': 13536.90}

	[INFO\|2025-04-28 13:02:51] logging.py:157 >> {'loss': 0.8743, 'learning_rate': 4.7984e-05, 'epoch': 0.13, 'throughput': 13535.86}

	[INFO\|2025-04-28 13:05:26] logging.py:157 >> {'loss': 0.8736, 'learning_rate': 4.7667e-05, 'epoch': 0.14, 'throughput': 13539.44}

	[INFO\|2025-04-28 13:08:00] logging.py:157 >> {'loss': 0.8687, 'learning_rate': 4.7328e-05, 'epoch': 0.15, 'throughput': 13542.72}

	[INFO\|2025-04-28 13:10:34] logging.py:157 >> {'loss': 0.8640, 'learning_rate': 4.6967e-05, 'epoch': 0.16, 'throughput': 13545.64}

	[INFO\|2025-04-28 13:13:09] logging.py:157 >> {'loss': 0.8877, 'learning_rate': 4.6586e-05, 'epoch': 0.17, 'throughput': 13548.59}

	[INFO\|2025-04-28 13:15:43] logging.py:157 >> {'loss': 0.8749, 'learning_rate': 4.6183e-05, 'epoch': 0.18, 'throughput': 13551.29}

	[INFO\|2025-04-28 13:18:17] logging.py:157 >> {'loss': 0.8338, 'learning_rate': 4.5760e-05, 'epoch': 0.19, 'throughput': 13551.85}

	[INFO\|2025-04-28 13:20:52] logging.py:157 >> {'loss': 0.8294, 'learning_rate': 4.5316e-05, 'epoch': 0.20, 'throughput': 13552.35}

	[INFO\|2025-04-28 13:23:27] logging.py:157 >> {'loss': 0.8666, 'learning_rate': 4.4854e-05, 'epoch': 0.21, 'throughput': 13553.49}

	[INFO\|2025-04-28 13:26:01] logging.py:157 >> {'loss': 0.8038, 'learning_rate': 4.4371e-05, 'epoch': 0.22, 'throughput': 13554.09}

	[INFO\|2025-04-28 13:28:36] logging.py:157 >> {'loss': 0.8492, 'learning_rate': 4.3871e-05, 'epoch': 0.23, 'throughput': 13554.87}

	[INFO\|2025-04-28 13:31:10] logging.py:157 >> {'loss': 0.8047, 'learning_rate': 4.3351e-05, 'epoch': 0.24, 'throughput': 13555.65}

	[INFO\|2025-04-28 13:33:45] logging.py:157 >> {'loss': 0.8521, 'learning_rate': 4.2815e-05, 'epoch': 0.25, 'throughput': 13556.20}

	[INFO\|2025-04-28 13:36:19] logging.py:157 >> {'loss': 0.8133, 'learning_rate': 4.2261e-05, 'epoch': 0.26, 'throughput': 13556.46}

	[INFO\|2025-04-28 13:38:53] logging.py:157 >> {'loss': 0.8365, 'learning_rate': 4.1690e-05, 'epoch': 0.27, 'throughput': 13559.29}

	[INFO\|2025-04-28 13:41:27] logging.py:157 >> {'loss': 0.8094, 'learning_rate': 4.1103e-05, 'epoch': 0.28, 'throughput': 13563.03}

	[INFO\|2025-04-28 13:44:00] logging.py:157 >> {'loss': 0.8174, 'learning_rate': 4.0500e-05, 'epoch': 0.29, 'throughput': 13565.42}

	[INFO\|2025-04-28 13:46:35] logging.py:157 >> {'loss': 0.8278, 'learning_rate': 3.9883e-05, 'epoch': 0.30, 'throughput': 13566.62}

	[INFO\|2025-04-28 13:49:08] logging.py:157 >> {'loss': 0.8425, 'learning_rate': 3.9251e-05, 'epoch': 0.31, 'throughput': 13569.29}

	[INFO\|2025-04-28 13:51:42] logging.py:157 >> {'loss': 0.8145, 'learning_rate': 3.8605e-05, 'epoch': 0.32, 'throughput': 13570.89}

	[INFO\|2025-04-28 13:54:16] logging.py:157 >> {'loss': 0.8322, 'learning_rate': 3.7946e-05, 'epoch': 0.33, 'throughput': 13573.07}

	[INFO\|2025-04-28 13:56:51] logging.py:157 >> {'loss': 0.8114, 'learning_rate': 3.7275e-05, 'epoch': 0.34, 'throughput': 13572.53}

	[INFO\|2025-04-28 13:59:25] logging.py:157 >> {'loss': 0.8091, 'learning_rate': 3.6592e-05, 'epoch': 0.35, 'throughput': 13572.13}

	[INFO\|2025-04-28 14:01:59] logging.py:157 >> {'loss': 0.7743, 'learning_rate': 3.5897e-05, 'epoch': 0.36, 'throughput': 13573.61}

	[INFO\|2025-04-28 14:04:33] logging.py:157 >> {'loss': 0.8368, 'learning_rate': 3.5192e-05, 'epoch': 0.36, 'throughput': 13575.22}

	[INFO\|2025-04-28 14:07:07] logging.py:157 >> {'loss': 0.8177, 'learning_rate': 3.4477e-05, 'epoch': 0.37, 'throughput': 13576.95}

	[INFO\|2025-04-28 14:09:41] logging.py:157 >> {'loss': 0.8109, 'learning_rate': 3.3753e-05, 'epoch': 0.38, 'throughput': 13577.04}

	[INFO\|2025-04-28 14:12:15] logging.py:157 >> {'loss': 0.8270, 'learning_rate': 3.3021e-05, 'epoch': 0.39, 'throughput': 13578.09}

	[INFO\|2025-04-28 14:14:49] logging.py:157 >> {'loss': 0.8167, 'learning_rate': 3.2280e-05, 'epoch': 0.40, 'throughput': 13579.75}

	[INFO\|2025-04-28 14:17:22] logging.py:157 >> {'loss': 0.8073, 'learning_rate': 3.1533e-05, 'epoch': 0.41, 'throughput': 13581.88}

	[INFO\|2025-04-28 14:19:56] logging.py:157 >> {'loss': 0.7793, 'learning_rate': 3.0779e-05, 'epoch': 0.42, 'throughput': 13583.46}

	[INFO\|2025-04-28 14:22:30] logging.py:157 >> {'loss': 0.8096, 'learning_rate': 3.0020e-05, 'epoch': 0.43, 'throughput': 13585.16}

	[INFO\|2025-04-28 14:25:03] logging.py:157 >> {'loss': 0.8212, 'learning_rate': 2.9256e-05, 'epoch': 0.44, 'throughput': 13586.73}

	[INFO\|2025-04-28 14:27:38] logging.py:157 >> {'loss': 0.8151, 'learning_rate': 2.8488e-05, 'epoch': 0.45, 'throughput': 13586.13}

	[INFO\|2025-04-28 14:30:11] logging.py:157 >> {'loss': 0.8331, 'learning_rate': 2.7716e-05, 'epoch': 0.46, 'throughput': 13587.29}

	[INFO\|2025-04-28 14:32:46] logging.py:157 >> {'loss': 0.8003, 'learning_rate': 2.6942e-05, 'epoch': 0.47, 'throughput': 13587.74}

	[INFO\|2025-04-28 14:35:19] logging.py:157 >> {'loss': 0.8214, 'learning_rate': 2.6166e-05, 'epoch': 0.48, 'throughput': 13588.82}

	[INFO\|2025-04-28 14:37:53] logging.py:157 >> {'loss': 0.8118, 'learning_rate': 2.5389e-05, 'epoch': 0.49, 'throughput': 13589.60}

	[INFO\|2025-04-28 14:40:27] logging.py:157 >> {'loss': 0.8382, 'learning_rate': 2.4611e-05, 'epoch': 0.50, 'throughput': 13590.81}

	[INFO\|2025-04-28 14:43:01] logging.py:157 >> {'loss': 0.8099, 'learning_rate': 2.3834e-05, 'epoch': 0.51, 'throughput': 13590.72}

	[INFO\|2025-04-28 14:45:35] logging.py:157 >> {'loss': 0.7914, 'learning_rate': 2.3058e-05, 'epoch': 0.52, 'throughput': 13591.47}

	[INFO\|2025-04-28 14:48:09] logging.py:157 >> {'loss': 0.8104, 'learning_rate': 2.2284e-05, 'epoch': 0.53, 'throughput': 13592.65}

	[INFO\|2025-04-28 14:50:42] logging.py:157 >> {'loss': 0.8125, 'learning_rate': 2.1512e-05, 'epoch': 0.54, 'throughput': 13593.72}

	[INFO\|2025-04-28 14:53:16] logging.py:157 >> {'loss': 0.8198, 'learning_rate': 2.0744e-05, 'epoch': 0.55, 'throughput': 13594.46}

	[INFO\|2025-04-28 14:55:50] logging.py:157 >> {'loss': 0.8019, 'learning_rate': 1.9980e-05, 'epoch': 0.56, 'throughput': 13594.62}

	[INFO\|2025-04-28 14:58:24] logging.py:157 >> {'loss': 0.8141, 'learning_rate': 1.9221e-05, 'epoch': 0.57, 'throughput': 13595.48}

	[INFO\|2025-04-28 15:00:58] logging.py:157 >> {'loss': 0.7985, 'learning_rate': 1.8467e-05, 'epoch': 0.58, 'throughput': 13596.37}

	[INFO\|2025-04-28 15:03:31] logging.py:157 >> {'loss': 0.7998, 'learning_rate': 1.7720e-05, 'epoch': 0.59, 'throughput': 13597.32}

	[INFO\|2025-04-28 15:06:05] logging.py:157 >> {'loss': 0.7988, 'learning_rate': 1.6979e-05, 'epoch': 0.60, 'throughput': 13597.98}

	[INFO\|2025-04-28 15:08:39] logging.py:157 >> {'loss': 0.8016, 'learning_rate': 1.6247e-05, 'epoch': 0.61, 'throughput': 13598.66}

	[INFO\|2025-04-28 15:11:12] logging.py:157 >> {'loss': 0.8162, 'learning_rate': 1.5523e-05, 'epoch': 0.62, 'throughput': 13599.73}

	[INFO\|2025-04-28 15:13:46] logging.py:157 >> {'loss': 0.8258, 'learning_rate': 1.4808e-05, 'epoch': 0.63, 'throughput': 13600.66}

	[INFO\|2025-04-28 15:16:19] logging.py:157 >> {'loss': 0.8063, 'learning_rate': 1.4103e-05, 'epoch': 0.64, 'throughput': 13601.35}

	[INFO\|2025-04-28 15:18:53] logging.py:157 >> {'loss': 0.8116, 'learning_rate': 1.3408e-05, 'epoch': 0.65, 'throughput': 13601.79}

	[INFO\|2025-04-28 15:21:27] logging.py:157 >> {'loss': 0.7850, 'learning_rate': 1.2725e-05, 'epoch': 0.66, 'throughput': 13602.47}

	[INFO\|2025-04-28 15:24:01] logging.py:157 >> {'loss': 0.8049, 'learning_rate': 1.2054e-05, 'epoch': 0.67, 'throughput': 13602.60}

	[INFO\|2025-04-28 15:26:35] logging.py:157 >> {'loss': 0.8034, 'learning_rate': 1.1395e-05, 'epoch': 0.68, 'throughput': 13603.14}

	[INFO\|2025-04-28 15:29:08] logging.py:157 >> {'loss': 0.7949, 'learning_rate': 1.0749e-05, 'epoch': 0.69, 'throughput': 13603.86}

	[INFO\|2025-04-28 15:31:42] logging.py:157 >> {'loss': 0.8024, 'learning_rate': 1.0117e-05, 'epoch': 0.70, 'throughput': 13603.82}

	[INFO\|2025-04-28 15:34:17] logging.py:157 >> {'loss': 0.7608, 'learning_rate': 9.4998e-06, 'epoch': 0.71, 'throughput': 13603.58}

	[INFO\|2025-04-28 15:36:51] logging.py:157 >> {'loss': 0.8012, 'learning_rate': 8.8972e-06, 'epoch': 0.72, 'throughput': 13603.80}

	[INFO\|2025-04-28 15:39:25] logging.py:157 >> {'loss': 0.7688, 'learning_rate': 8.3103e-06, 'epoch': 0.73, 'throughput': 13604.36}

	[INFO\|2025-04-28 15:41:58] logging.py:157 >> {'loss': 0.8023, 'learning_rate': 7.7395e-06, 'epoch': 0.74, 'throughput': 13604.88}

	[INFO\|2025-04-28 15:44:32] logging.py:157 >> {'loss': 0.7809, 'learning_rate': 7.1854e-06, 'epoch': 0.75, 'throughput': 13605.47}

	[INFO\|2025-04-28 15:47:05] logging.py:157 >> {'loss': 0.8083, 'learning_rate': 6.6485e-06, 'epoch': 0.76, 'throughput': 13606.21}

	[INFO\|2025-04-28 15:49:39] logging.py:157 >> {'loss': 0.7903, 'learning_rate': 6.1294e-06, 'epoch': 0.77, 'throughput': 13606.72}

	[INFO\|2025-04-28 15:52:13] logging.py:157 >> {'loss': 0.7904, 'learning_rate': 5.6286e-06, 'epoch': 0.78, 'throughput': 13607.18}

	[INFO\|2025-04-28 15:54:46] logging.py:157 >> {'loss': 0.7970, 'learning_rate': 5.1465e-06, 'epoch': 0.79, 'throughput': 13607.74}

	[INFO\|2025-04-28 15:57:20] logging.py:157 >> {'loss': 0.7636, 'learning_rate': 4.6836e-06, 'epoch': 0.80, 'throughput': 13608.08}

	[INFO\|2025-04-28 15:59:55] logging.py:157 >> {'loss': 0.7818, 'learning_rate': 4.2403e-06, 'epoch': 0.81, 'throughput': 13607.00}

	[INFO\|2025-04-28 16:02:29] logging.py:157 >> {'loss': 0.7914, 'learning_rate': 3.8171e-06, 'epoch': 0.82, 'throughput': 13607.35}

	[INFO\|2025-04-28 16:05:03] logging.py:157 >> {'loss': 0.7985, 'learning_rate': 3.4145e-06, 'epoch': 0.83, 'throughput': 13607.72}

	[INFO\|2025-04-28 16:07:36] logging.py:157 >> {'loss': 0.7868, 'learning_rate': 3.0327e-06, 'epoch': 0.84, 'throughput': 13608.19}

	[INFO\|2025-04-28 16:10:10] logging.py:157 >> {'loss': 0.7972, 'learning_rate': 2.6721e-06, 'epoch': 0.85, 'throughput': 13608.54}

	[INFO\|2025-04-28 16:12:44] logging.py:157 >> {'loss': 0.7933, 'learning_rate': 2.3332e-06, 'epoch': 0.86, 'throughput': 13609.13}

	[INFO\|2025-04-28 16:15:17] logging.py:157 >> {'loss': 0.7695, 'learning_rate': 2.0162e-06, 'epoch': 0.87, 'throughput': 13609.89}

	[INFO\|2025-04-28 16:17:51] logging.py:157 >> {'loss': 0.7929, 'learning_rate': 1.7214e-06, 'epoch': 0.88, 'throughput': 13609.74}

	[INFO\|2025-04-28 16:20:25] logging.py:157 >> {'loss': 0.7995, 'learning_rate': 1.4491e-06, 'epoch': 0.89, 'throughput': 13610.21}

	[INFO\|2025-04-28 16:22:59] logging.py:157 >> {'loss': 0.7857, 'learning_rate': 1.1997e-06, 'epoch': 0.90, 'throughput': 13610.41}

	[INFO\|2025-04-28 16:25:33] logging.py:157 >> {'loss': 0.8215, 'learning_rate': 9.7323e-07, 'epoch': 0.91, 'throughput': 13610.73}

	[INFO\|2025-04-28 16:28:06] logging.py:157 >> {'loss': 0.7931, 'learning_rate': 7.7003e-07, 'epoch': 0.92, 'throughput': 13611.14}

	[INFO\|2025-04-28 16:30:40] logging.py:157 >> {'loss': 0.7927, 'learning_rate': 5.9026e-07, 'epoch': 0.93, 'throughput': 13611.37}

	[INFO\|2025-04-28 16:33:14] logging.py:157 >> {'loss': 0.7787, 'learning_rate': 4.3412e-07, 'epoch': 0.94, 'throughput': 13611.55}

	[INFO\|2025-04-28 16:35:49] logging.py:157 >> {'loss': 0.7771, 'learning_rate': 3.0174e-07, 'epoch': 0.95, 'throughput': 13610.50}

	[INFO\|2025-04-28 16:38:24] logging.py:157 >> {'loss': 0.7931, 'learning_rate': 1.9325e-07, 'epoch': 0.96, 'throughput': 13610.20}

	[INFO\|2025-04-28 16:40:58] logging.py:157 >> {'loss': 0.7902, 'learning_rate': 1.0877e-07, 'epoch': 0.97, 'throughput': 13609.88}

	[INFO\|2025-04-28 16:43:32] logging.py:157 >> {'loss': 0.7906, 'learning_rate': 4.8360e-08, 'epoch': 0.98, 'throughput': 13610.30}

	[INFO\|2025-04-28 16:46:05] logging.py:157 >> {'loss': 0.8083, 'learning_rate': 1.2093e-08, 'epoch': 0.99, 'throughput': 13610.82}

	[INFO\|2025-04-28 16:48:39] logging.py:157 >> {'loss': 0.7963, 'learning_rate': 0.0000e+00, 'epoch': 1.00, 'throughput': 13611.38}

	[INFO\|2025-04-28 16:48:39] trainer.py:3910 >> Saving model checkpoint to saves/Qwen2.5-Coder-7B-Instruct/freeze/qwen_ns/checkpoint-101

	[INFO\|2025-04-28 16:48:39] configuration_utils.py:420 >> Configuration saved in saves/Qwen2.5-Coder-7B-Instruct/freeze/qwen_ns/checkpoint-101/config.json

	[INFO\|2025-04-28 16:48:39] configuration_utils.py:909 >> Configuration saved in saves/Qwen2.5-Coder-7B-Instruct/freeze/qwen_ns/checkpoint-101/generation_config.json

	[INFO\|2025-04-28 16:49:02] modeling_utils.py:2996 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 4 checkpoint shards. You can find where each parameters has been saved in the index located at saves/Qwen2.5-Coder-7B-Instruct/freeze/qwen_ns/checkpoint-101/model.safetensors.index.json.

	[INFO\|2025-04-28 16:49:02] tokenization_utils_base.py:2491 >> tokenizer config file saved in saves/Qwen2.5-Coder-7B-Instruct/freeze/qwen_ns/checkpoint-101/tokenizer_config.json

	[INFO\|2025-04-28 16:49:02] tokenization_utils_base.py:2500 >> Special tokens file saved in saves/Qwen2.5-Coder-7B-Instruct/freeze/qwen_ns/checkpoint-101/special_tokens_map.json

	[INFO\|2025-04-28 16:49:03] trainer.py:2643 >>

	Training completed. Do not forget to share your model on huggingface.co/models =)



	[INFO\|2025-04-28 16:49:03] trainer.py:3910 >> Saving model checkpoint to saves/Qwen2.5-Coder-7B-Instruct/freeze/qwen_ns

	[INFO\|2025-04-28 16:49:03] configuration_utils.py:420 >> Configuration saved in saves/Qwen2.5-Coder-7B-Instruct/freeze/qwen_ns/config.json

	[INFO\|2025-04-28 16:49:03] configuration_utils.py:909 >> Configuration saved in saves/Qwen2.5-Coder-7B-Instruct/freeze/qwen_ns/generation_config.json

	[INFO\|2025-04-28 16:49:26] modeling_utils.py:2996 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 4 checkpoint shards. You can find where each parameters has been saved in the index located at saves/Qwen2.5-Coder-7B-Instruct/freeze/qwen_ns/model.safetensors.index.json.

	[INFO\|2025-04-28 16:49:26] tokenization_utils_base.py:2491 >> tokenizer config file saved in saves/Qwen2.5-Coder-7B-Instruct/freeze/qwen_ns/tokenizer_config.json

	[INFO\|2025-04-28 16:49:26] tokenization_utils_base.py:2500 >> Special tokens file saved in saves/Qwen2.5-Coder-7B-Instruct/freeze/qwen_ns/special_tokens_map.json

	[WARNING\|2025-04-28 16:49:26] logging.py:162 >> No metric eval_loss to plot.

	[WARNING\|2025-04-28 16:49:26] logging.py:162 >> No metric eval_accuracy to plot.

	[INFO\|2025-04-28 16:49:26] modelcard.py:449 >> Dropping the following result as it does not have all the necessary fields:
	{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}