Spaces:
Running
Running
Templates for Chat Models | |
Introduction | |
An increasingly common use case for LLMs is chat. In a chat context, rather than continuing a single string | |
of text (as is the case with a standard language model), the model instead continues a conversation that consists | |
of one or more messages, each of which includes a role, like "user" or "assistant", as well as message text. | |
Much like tokenization, different models expect very different input formats for chat. This is the reason we added | |
chat templates as a feature. Chat templates are part of the tokenizer. They specify how to convert conversations, | |
represented as lists of messages, into a single tokenizable string in the format that the model expects. | |
Let's make this concrete with a quick example using the BlenderBot model. BlenderBot has an extremely simple default | |
template, which mostly just adds whitespace between rounds of dialogue: | |
thon | |
from transformers import AutoTokenizer | |
tokenizer = AutoTokenizer.from_pretrained("facebook/blenderbot-400M-distill") | |
chat = [ | |
{"role": "user", "content": "Hello, how are you?"}, | |
{"role": "assistant", "content": "I'm doing great. How can I help you today?"}, | |
{"role": "user", "content": "I'd like to show off how chat templating works!"}, | |
] | |
tokenizer.apply_chat_template(chat, tokenize=False) | |
" Hello, how are you? I'm doing great. How can I help you today? I'd like to show off how chat templating works!" | |
Notice how the entire chat is condensed into a single string. If we use tokenize=True, which is the default setting, | |
that string will also be tokenized for us. To see a more complex template in action, though, let's use the | |
mistralai/Mistral-7B-Instruct-v0.1 model. | |
thon | |
from transformers import AutoTokenizer | |
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1") | |
chat = [ | |
{"role": "user", "content": "Hello, how are you?"}, | |
{"role": "assistant", "content": "I'm doing great. How can I help you today?"}, | |
{"role": "user", "content": "I'd like to show off how chat templating works!"}, | |
] | |
tokenizer.apply_chat_template(chat, tokenize=False) | |
"[INST] Hello, how are you? [/INST]I'm doing great. How can I help you today? [INST] I'd like to show off how chat templating works! [/INST]" | |
Note that this time, the tokenizer has added the control tokens [INST] and [/INST] to indicate the start and end of | |
user messages (but not assistant messages!). Mistral-instruct was trained with these tokens, but BlenderBot was not. | |
How do I use chat templates? | |
As you can see in the example above, chat templates are easy to use. Simply build a list of messages, with role | |
and content keys, and then pass it to the [~PreTrainedTokenizer.apply_chat_template] method. Once you do that, | |
you'll get output that's ready to go! When using chat templates as input for model generation, it's also a good idea | |
to use add_generation_prompt=True to add a generation prompt. | |
Here's an example of preparing input for model.generate(), using the Zephyr assistant model: | |
thon | |
from transformers import AutoModelForCausalLM, AutoTokenizer | |
checkpoint = "HuggingFaceH4/zephyr-7b-beta" | |
tokenizer = AutoTokenizer.from_pretrained(checkpoint) | |
model = AutoModelForCausalLM.from_pretrained(checkpoint) # You may want to use bfloat16 and/or move to GPU here | |
messages = [ | |
{ | |
"role": "system", | |
"content": "You are a friendly chatbot who always responds in the style of a pirate", | |
}, | |
{"role": "user", "content": "How many helicopters can a human eat in one sitting?"}, | |
] | |
tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt") | |
print(tokenizer.decode(tokenized_chat[0])) | |
This will yield a string in the input format that Zephyr expects.text | |
<|system|> | |
You are a friendly chatbot who always responds in the style of a pirate | |
<|user|> | |
How many helicopters can a human eat in one sitting? | |
<|assistant|> |