Spaces:
Running
Running
If you're fine-tuning a model for chat, in addition to setting a chat template, you should probably add any new chat | |
control tokens as special tokens in the tokenizer. Special tokens are never split, | |
ensuring that your control tokens are always handled as single tokens rather than being tokenized in pieces. You | |
should also set the tokenizer's eos_token attribute to the token that marks the end of assistant generations in your | |
template. This will ensure that text generation tools can correctly figure out when to stop generating text. | |
Why do some models have multiple templates? | |
Some models use different templates for different use cases. For example, they might use one template for normal chat | |
and another for tool-use, or retrieval-augmented generation. In these cases, tokenizer.chat_template is a dictionary. | |
This can cause some confusion, and where possible, we recommend using a single template for all use-cases. You can use | |
Jinja statements like if tools is defined and {% macro %} definitions to easily wrap multiple code paths in a | |
single template. | |
When a tokenizer has multiple templates, tokenizer.chat_template will be a dict, where each key is the name | |
of a template. The apply_chat_template method has special handling for certain template names: Specifically, it will | |
look for a template named default in most cases, and will raise an error if it can't find one. However, if a template | |
named tool_use exists when the user has passed a tools argument, it will use that instead. To access templates | |
with other names, pass the name of the template you want to the chat_template argument of | |
apply_chat_template(). | |
We find that this can be a bit confusing for users, though - so if you're writing a template yourself, we recommend | |
trying to put it all in a single template where possible! | |
What are "default" templates? | |
Before the introduction of chat templates, chat handling was hardcoded at the model class level. For backwards | |
compatibility, we have retained this class-specific handling as default templates, also set at the class level. If a | |
model does not have a chat template set, but there is a default template for its model class, the TextGenerationPipeline | |
class and methods like apply_chat_template will use the class template instead. You can find out what the default | |
template for your tokenizer is by checking the tokenizer.default_chat_template attribute. | |
This is something we do purely for backward compatibility reasons, to avoid breaking any existing workflows. Even when | |
the class template is appropriate for your model, we strongly recommend overriding the default template by | |
setting the chat_template attribute explicitly to make it clear to users that your model has been correctly configured | |
for chat. | |
Now that actual chat templates have been adopted more widely, default templates have been deprecated and will be | |
removed in a future release. We strongly recommend setting the chat_template attribute for any tokenizers that | |
still depend on them! | |
What template should I use? | |
When setting the template for a model that's already been trained for chat, you should ensure that the template | |
exactly matches the message formatting that the model saw during training, or else you will probably experience | |
performance degradation. This is true even if you're training the model further - you will probably get the best | |
performance if you keep the chat tokens constant. This is very analogous to tokenization - you generally get the | |
best performance for inference or fine-tuning when you precisely match the tokenization used during training. | |
If you're training a model from scratch, or fine-tuning a base language model for chat, on the other hand, | |
you have a lot of freedom to choose an appropriate template! LLMs are smart enough to learn to handle lots of different | |
input formats. One popular choice is the ChatML format, and this is a good, flexible choice for many use-cases. | |
It looks like this: | |
{%- for message in messages %} | |
{{- '<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n' }} | |
{%- endfor %} | |
If you like this one, here it is in one-liner form, ready to copy into your code. The one-liner also includes | |
handy support for generation prompts, but note that it doesn't add BOS or EOS tokens! | |
If your model expects those, they won't be added automatically by apply_chat_template - in other words, the | |
text will be tokenized with add_special_tokens=False. This is to avoid potential conflicts between the template and | |
the add_special_tokens logic. If your model expects special tokens, make sure to add them to the template! | |
python | |
tokenizer.chat_template = "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}" | |
This template wraps each message in <|im_start|> and <|im_end|> tokens, and simply writes the role as a string, which | |
allows for flexibility in the roles you train with. The output looks like this: | |
text | |
<|im_start|>system | |
You are a helpful chatbot that will do its best not to say anything so stupid that people tweet about it.<|im_end|> | |
<|im_start|>user | |
How are you?<|im_end|> | |
<|im_start|>assistant | |
I'm doing great!<|im_end|> | |
The "user", "system" and "assistant" roles are the standard for chat, and we recommend using them when it makes sense, | |
particularly if you want your model to operate well with [TextGenerationPipeline]. However, you are not limited | |
to these roles - templating is extremely flexible, and any string can be a role. | |
I want to add some chat templates! How should I get started? | |
If you have any chat models, you should set their tokenizer.chat_template attribute and test it using | |
[~PreTrainedTokenizer.apply_chat_template], then push the updated tokenizer to the Hub. This applies even if you're | |
not the model owner - if you're using a model with an empty chat template, or one that's still using the default class | |
template, please open a pull request to the model repository so that this attribute can be set properly! | |
Once the attribute is set, that's it, you're done! tokenizer.apply_chat_template will now work correctly for that | |
model, which means it is also automatically supported in places like TextGenerationPipeline! | |
By ensuring that models have this attribute, we can make sure that the whole community gets to use the full power of | |
open-source models. Formatting mismatches have been haunting the field and silently harming performance for too long - | |
it's time to put an end to them! | |
Advanced: Template writing tips | |
If you're unfamiliar with Jinja, we generally find that the easiest way to write a chat template is to first | |
write a short Python script that formats messages the way you want, and then convert that script into a template. | |
Remember that the template handler will receive the conversation history as a variable called messages. | |
You will be able to access messages in your template just like you can in Python, which means you can loop over | |
it with {% for message in messages %} or access individual messages with {{ messages[0] }}, for example. | |
You can also use the following tips to convert your code to Jinja: | |
Trimming whitespace | |
By default, Jinja will print any whitespace that comes before or after a block. This can be a problem for chat | |
templates, which generally want to be very precise with whitespace! To avoid this, we strongly recommend writing | |
your templates like this: | |
{%- for message in messages %} | |
{{- message['role'] + message['content'] }} | |
{%- endfor %} | |
rather than like this: | |
{% for message in messages %} | |
{{ message['role'] + message['content'] }} | |
{% endfor %} | |
Adding - will strip any whitespace that comes before the block. The second example looks innocent, but the newline | |
and indentation may end up being included in the output, which is probably not what you want! | |
For loops | |
For loops in Jinja look like this: | |
{%- for message in messages %} | |
{{- message['content'] }} | |
{%- endfor %} | |
Note that whatever's inside the {{ expression block }} will be printed to the output. You can use operators like | |
+ to combine strings inside expression blocks. | |
If statements | |
If statements in Jinja look like this: | |
{%- if message['role'] == 'user' %} | |
{{- message['content'] }} | |
{%- endif %} | |
Note how where Python uses whitespace to mark the beginnings and ends of for and if blocks, Jinja requires you | |
to explicitly end them with {% endfor %} and {% endif %}. | |
Special variables | |
Inside your template, you will have access to the list of messages, but you can also access several other special | |
variables. These include special tokens like bos_token and eos_token, as well as the add_generation_prompt | |
variable that we discussed above. You can also use the loop variable to access information about the current loop | |
iteration, for example using {% if loop.last %} to check if the current message is the last message in the | |
conversation. Here's an example that puts these ideas together to add a generation prompt at the end of the | |
conversation if add_generation_prompt is True: | |
{%- if loop.last and add_generation_prompt %} | |
{{- bos_token + 'Assistant:\n' }} | |
{%- endif %} | |
Compatibility with non-Python Jinja | |
There are multiple implementations of Jinja in various languages. They generally have the same syntax, | |
but a key difference is that when you're writing a template in Python you can use Python methods, such as | |
.lower() on strings or .items() on dicts. This will break if someone tries to use your template on a non-Python | |
implementation of Jinja. Non-Python implementations are particularly common in deployment environments, where JS | |
and Rust are very popular. | |
Don't panic, though! There are a few easy changes you can make to your templates to ensure they're compatible across | |
all implementations of Jinja: | |
Replace Python methods with Jinja filters. These usually have the same name, for example string.lower() becomes | |
string|lower, and dict.items() becomes dict|items. One notable change is that string.strip() becomes string|trim. | |
See the list of built-in filters | |
in the Jinja documentation for more. | |
Replace True, False and None, which are Python-specific, with true, false and none. | |
Directly rendering a dict or list may give different results in other implementations (for example, string entries | |
might change from single-quoted to double-quoted). Adding the tojson filter can help to ensure consistency here. |