Spaces:

thenativefox
/

RAG

Running

RAG / openai_text-embedding-ada-002 /fixed_chunks /_chat_templating.txt_chunk_0.txt

thenativefox

Added split files and tables

939262b 10 months ago

3.91 kB

	Templates for Chat Models
	Introduction
	An increasingly common use case for LLMs is chat. In a chat context, rather than continuing a single string
	of text (as is the case with a standard language model), the model instead continues a conversation that consists
	of one or more messages, each of which includes a role, like "user" or "assistant", as well as message text.
	Much like tokenization, different models expect very different input formats for chat. This is the reason we added
	chat templates as a feature. Chat templates are part of the tokenizer. They specify how to convert conversations,
	represented as lists of messages, into a single tokenizable string in the format that the model expects.
	Let's make this concrete with a quick example using the BlenderBot model. BlenderBot has an extremely simple default
	template, which mostly just adds whitespace between rounds of dialogue:
	thon

	from transformers import AutoTokenizer
	tokenizer = AutoTokenizer.from_pretrained("facebook/blenderbot-400M-distill")
	chat = [
	{"role": "user", "content": "Hello, how are you?"},
	{"role": "assistant", "content": "I'm doing great. How can I help you today?"},
	{"role": "user", "content": "I'd like to show off how chat templating works!"},
	]
	tokenizer.apply_chat_template(chat, tokenize=False)
	" Hello, how are you? I'm doing great. How can I help you today? I'd like to show off how chat templating works!"

	Notice how the entire chat is condensed into a single string. If we use tokenize=True, which is the default setting,
	that string will also be tokenized for us. To see a more complex template in action, though, let's use the
	mistralai/Mistral-7B-Instruct-v0.1 model.
	thon

	from transformers import AutoTokenizer
	tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1")
	chat = [
	{"role": "user", "content": "Hello, how are you?"},
	{"role": "assistant", "content": "I'm doing great. How can I help you today?"},
	{"role": "user", "content": "I'd like to show off how chat templating works!"},
	]
	tokenizer.apply_chat_template(chat, tokenize=False)
	"[INST] Hello, how are you? [/INST]I'm doing great. How can I help you today? [INST] I'd like to show off how chat templating works! [/INST]"

	Note that this time, the tokenizer has added the control tokens [INST] and [/INST] to indicate the start and end of
	user messages (but not assistant messages!). Mistral-instruct was trained with these tokens, but BlenderBot was not.
	How do I use chat templates?
	As you can see in the example above, chat templates are easy to use. Simply build a list of messages, with role
	and content keys, and then pass it to the [~PreTrainedTokenizer.apply_chat_template] method. Once you do that,
	you'll get output that's ready to go! When using chat templates as input for model generation, it's also a good idea
	to use add_generation_prompt=True to add a generation prompt.
	Here's an example of preparing input for model.generate(), using the Zephyr assistant model:
	thon
	from transformers import AutoModelForCausalLM, AutoTokenizer
	checkpoint = "HuggingFaceH4/zephyr-7b-beta"
	tokenizer = AutoTokenizer.from_pretrained(checkpoint)
	model = AutoModelForCausalLM.from_pretrained(checkpoint) # You may want to use bfloat16 and/or move to GPU here
	messages = [
	{
	"role": "system",
	"content": "You are a friendly chatbot who always responds in the style of a pirate",
	},
	{"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
	]
	tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
	print(tokenizer.decode(tokenized_chat[0]))
	This will yield a string in the input format that Zephyr expects.text
	<\|system\|>
	You are a friendly chatbot who always responds in the style of a pirate
	<\|user\|>
	How many helicopters can a human eat in one sitting?
	<\|assistant\|>