Spaces:

thenativefox
/

RAG

Running

RAG / openai_text-embedding-ada-002 /recursive_chunks /_chat_templating.txt_chunk_1.txt

thenativefox

Added split files and tables

939262b 10 months ago

10.2 kB

	Not all of the tool-calling features shown above are used by all models. Some use tool call IDs, others simply use the function name and
	match tool calls to results using the ordering, and there are several models that use neither and only issue one tool
	call at a time to avoid confusion. If you want your code to be compatible across as many models as possible, we
	recommend structuring your tools calls like we've shown here, and returning tool results in the order that
	they were issued by the model. The chat templates on each model should handle the rest.

	Understanding tool schemas
	Each function you pass to the tools argument of apply_chat_template is converted into a
	JSON schema. These schemas
	are then passed to the model chat template. In other words, tool-use models do not see your functions directly, and they
	never see the actual code inside them. What they care about is the function definitions and the arguments they
	need to pass to them - they care about what the tools do and how to use them, not how they work! It is up to you
	to read their outputs, detect if they have requested to use a tool, pass their arguments to the tool function, and
	return the response in the chat.
	Generating JSON schemas to pass to the template should be automatic and invisible as long as your functions
	follow the specification above, but if you encounter problems, or you simply want more control over the conversion,
	you can handle the conversion manually. Here is an example of a manual schema conversion.
	thon
	from transformers.utils import get_json_schema
	def multiply(a: float, b: float):
	"""
	A function that multiplies two numbers
	Args:
	a: The first number to multiply
	b: The second number to multiply
	"""
	return a * b

	schema = get_json_schema(multiply)
	print(schema)

	This will yield:
	json
	{
	"type": "function",
	"function": {
	"name": "multiply",
	"description": "A function that multiplies two numbers",
	"parameters": {
	"type": "object",
	"properties": {
	"a": {
	"type": "number",
	"description": "The first number to multiply"
	},
	"b": {
	"type": "number",
	"description": "The second number to multiply"
	}
	},
	"required": ["a", "b"]
	}
	}
	}
	If you wish, you can edit these schemas, or even write them from scratch yourself without using get_json_schema at
	all. JSON schemas can be passed directly to the tools argument of
	apply_chat_template - this gives you a lot of power to define precise schemas for more complex functions. Be careful,
	though - the more complex your schemas, the more likely the model is to get confused when dealing with them! We
	recommend simple function signatures where possible, keeping arguments (and especially complex, nested arguments)
	to a minimum.
	Here is an example of defining schemas by hand, and passing them directly to apply_chat_template:
	thon
	A simple function that takes no arguments
	current_time = {
	"type": "function",
	"function": {
	"name": "current_time",
	"description": "Get the current local time as a string.",
	"parameters": {
	'type': 'object',
	'properties': {}
	}
	}
	}
	A more complete function that takes two numerical arguments
	multiply = {
	'type': 'function',
	'function': {
	'name': 'multiply',
	'description': 'A function that multiplies two numbers',
	'parameters': {
	'type': 'object',
	'properties': {
	'a': {
	'type': 'number',
	'description': 'The first number to multiply'
	},
	'b': {
	'type': 'number', 'description': 'The second number to multiply'
	}
	},
	'required': ['a', 'b']
	}
	}
	}
	model_input = tokenizer.apply_chat_template(
	messages,
	tools = [current_time, multiply]
	)

	Advanced: Retrieval-augmented generation
	"Retrieval-augmented generation" or "RAG" LLMs can search a corpus of documents for information before responding
	to a query. This allows models to vastly expand their knowledge base beyond their limited context size. Our
	recommendation for RAG models is that their template
	should accept a documents argument. This should be a list of documents, where each "document"
	is a single dict with title and contents keys, both of which are strings. Because this format is much simpler
	than the JSON schemas used for tools, no helper functions are necessary.
	Here's an example of a RAG template in action:
	thon
	document1 = {
	"title": "The Moon: Our Age-Old Foe",
	"contents": "Man has always dreamed of destroying the moon. In this essay, I shall"
	}
	document2 = {
	"title": "The Sun: Our Age-Old Friend",
	"contents": "Although often underappreciated, the sun provides several notable benefits"
	}
	model_input = tokenizer.apply_chat_template(
	messages,
	documents=[document1, document2]
	)

	Advanced: How do chat templates work?
	The chat template for a model is stored on the tokenizer.chat_template attribute. If no chat template is set, the
	default template for that model class is used instead. Let's take a look at the template for BlenderBot:
	thon

	from transformers import AutoTokenizer
	tokenizer = AutoTokenizer.from_pretrained("facebook/blenderbot-400M-distill")
	tokenizer.default_chat_template
	"{% for message in messages %}{% if message['role'] == 'user' %}{{ ' ' }}{% endif %}{{ message['content'] }}{% if not loop.last %}{{ ' ' }}{% endif %}{% endfor %}{{ eos_token }}"

	That's kind of intimidating. Let's clean it up a little to make it more readable. In the process, though, we also make
	sure that the newlines and indentation we add don't end up being included in the template output - see the tip on
	trimming whitespace below!
	{%- for message in messages %}
	{%- if message['role'] == 'user' %}
	{{- ' ' }}
	{%- endif %}
	{{- message['content'] }}
	{%- if not loop.last %}
	{{- ' ' }}
	{%- endif %}
	{%- endfor %}
	{{- eos_token }}
	If you've never seen one of these before, this is a Jinja template.
	Jinja is a templating language that allows you to write simple code that generates text. In many ways, the code and
	syntax resembles Python. In pure Python, this template would look something like this:
	python
	for idx, message in enumerate(messages):
	if message['role'] == 'user':
	print(' ')
	print(message['content'])
	if not idx == len(messages) - 1: # Check for the last message in the conversation
	print(' ')
	print(eos_token)
	Effectively, the template does three things:
	1. For each message, if the message is a user message, add a blank space before it, otherwise print nothing.
	2. Add the message content
	3. If the message is not the last message, add two spaces after it. After the final message, print the EOS token.
	This is a pretty simple template - it doesn't add any control tokens, and it doesn't support "system" messages, which
	are a common way to give the model directives about how it should behave in the subsequent conversation.
	But Jinja gives you a lot of flexibility to do those things! Let's see a Jinja template that can format inputs
	similarly to the way LLaMA formats them (note that the real LLaMA template includes handling for default system
	messages and slightly different system message handling in general - don't use this one in your actual code!)
	{%- for message in messages %}
	{%- if message['role'] == 'user' %}
	{{- bos_token + '[INST] ' + message['content'] + ' [/INST]' }}
	{%- elif message['role'] == 'system' %}
	{{- '<<SYS>>\\n' + message['content'] + '\\n<</SYS>>\\n\\n' }}
	{%- elif message['role'] == 'assistant' %}
	{{- ' ' + message['content'] + ' ' + eos_token }}
	{%- endif %}
	{%- endfor %}
	Hopefully if you stare at this for a little bit you can see what this template is doing - it adds specific tokens based
	on the "role" of each message, which represents who sent it. User, assistant and system messages are clearly
	distinguishable to the model because of the tokens they're wrapped in.
	Advanced: Adding and editing chat templates
	How do I create a chat template?
	Simple, just write a jinja template and set tokenizer.chat_template. You may find it easier to start with an
	existing template from another model and simply edit it for your needs! For example, we could take the LLaMA template
	above and add "[ASST]" and "[/ASST]" to assistant messages:
	{%- for message in messages %}
	{%- if message['role'] == 'user' %}
	{{- bos_token + '[INST] ' + message['content'].strip() + ' [/INST]' }}
	{%- elif message['role'] == 'system' %}
	{{- '<<SYS>>\\n' + message['content'].strip() + '\\n<</SYS>>\\n\\n' }}
	{%- elif message['role'] == 'assistant' %}
	{{- '[ASST] ' + message['content'] + ' [/ASST]' + eos_token }}
	{%- endif %}
	{%- endfor %}
	Now, simply set the tokenizer.chat_template attribute. Next time you use [~PreTrainedTokenizer.apply_chat_template], it will
	use your new template! This attribute will be saved in the tokenizer_config.json file, so you can use
	[~utils.PushToHubMixin.push_to_hub] to upload your new template to the Hub and make sure everyone's using the right
	template for your model!
	python
	template = tokenizer.chat_template
	template = template.replace("SYS", "SYSTEM") # Change the system token
	tokenizer.chat_template = template # Set the new template
	tokenizer.push_to_hub("model_name") # Upload your new template to the Hub!
	The method [~PreTrainedTokenizer.apply_chat_template] which uses your chat template is called by the [TextGenerationPipeline] class, so
	once you set the correct chat template, your model will automatically become compatible with [TextGenerationPipeline].

	If you're fine-tuning a model for chat, in addition to setting a chat template, you should probably add any new chat
	control tokens as special tokens in the tokenizer. Special tokens are never split,
	ensuring that your control tokens are always handled as single tokens rather than being tokenized in pieces. You
	should also set the tokenizer's eos_token attribute to the token that marks the end of assistant generations in your
	template. This will ensure that text generation tools can correctly figure out when to stop generating text.