llm course @ HSE and vk llm
A collection of SmolLM-135M models fine-tuned with DPO, PPO, and Reward Modeling to enhance human-like expressiveness
Daniil Tsesarev
tsessk
AI & ML interests
transformers)
Recent Activity
updated
a model
5 days ago
tsessk/Qwen2-0.5B-TLDR
updated
a dataset
5 days ago
tsessk/yetanother_tldr
published
a dataset
5 days ago
tsessk/yetanother_tldr
Organizations
None yet
Collections
1
models
11

tsessk/Qwen2-0.5B-TLDR
Updated

tsessk/qwen2-0.5b-tldr-lora
Updated

tsessk/llm-course-hw2-dpo
Text Generation
•
Updated
•
1

tsessk/llm-course-hw2-reward-model
Text Classification
•
Updated
•
2

tsessk/llm-course-hw2-ppo
Text Generation
•
Updated
•
1

tsessk/content
Text Classification
•
Updated
•
1

tsessk/llm-course-hw1
Updated
•
1

tsessk/SmolLM2-FT-ORPO
Text Generation
•
Updated

tsessk/SmolLM2-FT-DPO
Text Generation
•
Updated
•
1

tsessk/SmolLM2-FT-PyCodeZone
Text Generation
•
Updated
datasets
6
tsessk/yetanother_tldr
Viewer
•
Updated
•
130k
•
56
tsessk/tldr-17-truncated-tokenized
Viewer
•
Updated
•
130k
•
47
tsessk/tldr-17-t-512
Viewer
•
Updated
•
3.09M
•
66
tsessk/tldr-17-ChatML-tokenized-truncated
Viewer
•
Updated
•
130k
•
68
tsessk/tldr-17-ChatML
Viewer
•
Updated
•
3.85M
•
148
•
1
tsessk/tldr-17-chat
Viewer
•
Updated
•
3.85M
•
143