RAG / openai_text-embedding-ada-002 /fixed_chunks /_chat_templating.txt_chunk_0.txt
thenativefox
Added split files and tables
939262b
Templates for Chat Models
Introduction
An increasingly common use case for LLMs is chat. In a chat context, rather than continuing a single string
of text (as is the case with a standard language model), the model instead continues a conversation that consists
of one or more messages, each of which includes a role, like "user" or "assistant", as well as message text.
Much like tokenization, different models expect very different input formats for chat. This is the reason we added
chat templates as a feature. Chat templates are part of the tokenizer. They specify how to convert conversations,
represented as lists of messages, into a single tokenizable string in the format that the model expects.
Let's make this concrete with a quick example using the BlenderBot model. BlenderBot has an extremely simple default
template, which mostly just adds whitespace between rounds of dialogue:
thon
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("facebook/blenderbot-400M-distill")
chat = [
{"role": "user", "content": "Hello, how are you?"},
{"role": "assistant", "content": "I'm doing great. How can I help you today?"},
{"role": "user", "content": "I'd like to show off how chat templating works!"},
]
tokenizer.apply_chat_template(chat, tokenize=False)
" Hello, how are you? I'm doing great. How can I help you today? I'd like to show off how chat templating works!"
Notice how the entire chat is condensed into a single string. If we use tokenize=True, which is the default setting,
that string will also be tokenized for us. To see a more complex template in action, though, let's use the
mistralai/Mistral-7B-Instruct-v0.1 model.
thon
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1")
chat = [
{"role": "user", "content": "Hello, how are you?"},
{"role": "assistant", "content": "I'm doing great. How can I help you today?"},
{"role": "user", "content": "I'd like to show off how chat templating works!"},
]
tokenizer.apply_chat_template(chat, tokenize=False)
"[INST] Hello, how are you? [/INST]I'm doing great. How can I help you today? [INST] I'd like to show off how chat templating works! [/INST]"
Note that this time, the tokenizer has added the control tokens [INST] and [/INST] to indicate the start and end of
user messages (but not assistant messages!). Mistral-instruct was trained with these tokens, but BlenderBot was not.
How do I use chat templates?
As you can see in the example above, chat templates are easy to use. Simply build a list of messages, with role
and content keys, and then pass it to the [~PreTrainedTokenizer.apply_chat_template] method. Once you do that,
you'll get output that's ready to go! When using chat templates as input for model generation, it's also a good idea
to use add_generation_prompt=True to add a generation prompt.
Here's an example of preparing input for model.generate(), using the Zephyr assistant model:
thon
from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "HuggingFaceH4/zephyr-7b-beta"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint) # You may want to use bfloat16 and/or move to GPU here
messages = [
{
"role": "system",
"content": "You are a friendly chatbot who always responds in the style of a pirate",
},
{"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
]
tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
print(tokenizer.decode(tokenized_chat[0]))
This will yield a string in the input format that Zephyr expects.text
<|system|>
You are a friendly chatbot who always responds in the style of a pirate
<|user|>
How many helicopters can a human eat in one sitting?
<|assistant|>