Spaces:
Running
Running
Not all of the tool-calling features shown above are used by all models. Some use tool call IDs, others simply use the function name and | |
match tool calls to results using the ordering, and there are several models that use neither and only issue one tool | |
call at a time to avoid confusion. If you want your code to be compatible across as many models as possible, we | |
recommend structuring your tools calls like we've shown here, and returning tool results in the order that | |
they were issued by the model. The chat templates on each model should handle the rest. | |
Understanding tool schemas | |
Each function you pass to the tools argument of apply_chat_template is converted into a | |
JSON schema. These schemas | |
are then passed to the model chat template. In other words, tool-use models do not see your functions directly, and they | |
never see the actual code inside them. What they care about is the function definitions and the arguments they | |
need to pass to them - they care about what the tools do and how to use them, not how they work! It is up to you | |
to read their outputs, detect if they have requested to use a tool, pass their arguments to the tool function, and | |
return the response in the chat. | |
Generating JSON schemas to pass to the template should be automatic and invisible as long as your functions | |
follow the specification above, but if you encounter problems, or you simply want more control over the conversion, | |
you can handle the conversion manually. Here is an example of a manual schema conversion. | |
thon | |
from transformers.utils import get_json_schema | |
def multiply(a: float, b: float): | |
""" | |
A function that multiplies two numbers | |
Args: | |
a: The first number to multiply | |
b: The second number to multiply | |
""" | |
return a * b | |
schema = get_json_schema(multiply) | |
print(schema) | |
This will yield: | |
json | |
{ | |
"type": "function", | |
"function": { | |
"name": "multiply", | |
"description": "A function that multiplies two numbers", | |
"parameters": { | |
"type": "object", | |
"properties": { | |
"a": { | |
"type": "number", | |
"description": "The first number to multiply" | |
}, | |
"b": { | |
"type": "number", | |
"description": "The second number to multiply" | |
} | |
}, | |
"required": ["a", "b"] | |
} | |
} | |
} | |
If you wish, you can edit these schemas, or even write them from scratch yourself without using get_json_schema at | |
all. JSON schemas can be passed directly to the tools argument of | |
apply_chat_template - this gives you a lot of power to define precise schemas for more complex functions. Be careful, | |
though - the more complex your schemas, the more likely the model is to get confused when dealing with them! We | |
recommend simple function signatures where possible, keeping arguments (and especially complex, nested arguments) | |
to a minimum. | |
Here is an example of defining schemas by hand, and passing them directly to apply_chat_template: | |
thon | |
A simple function that takes no arguments | |
current_time = { | |
"type": "function", | |
"function": { | |
"name": "current_time", | |
"description": "Get the current local time as a string.", | |
"parameters": { | |
'type': 'object', | |
'properties': {} | |
} | |
} | |
} | |
A more complete function that takes two numerical arguments | |
multiply = { | |
'type': 'function', | |
'function': { | |
'name': 'multiply', | |
'description': 'A function that multiplies two numbers', | |
'parameters': { | |
'type': 'object', | |
'properties': { | |
'a': { | |
'type': 'number', | |
'description': 'The first number to multiply' | |
}, | |
'b': { | |
'type': 'number', 'description': 'The second number to multiply' | |
} | |
}, | |
'required': ['a', 'b'] | |
} | |
} | |
} | |
model_input = tokenizer.apply_chat_template( | |
messages, | |
tools = [current_time, multiply] | |
) | |
Advanced: Retrieval-augmented generation | |
"Retrieval-augmented generation" or "RAG" LLMs can search a corpus of documents for information before responding | |
to a query. This allows models to vastly expand their knowledge base beyond their limited context size. Our | |
recommendation for RAG models is that their template | |
should accept a documents argument. This should be a list of documents, where each "document" | |
is a single dict with title and contents keys, both of which are strings. Because this format is much simpler | |
than the JSON schemas used for tools, no helper functions are necessary. | |
Here's an example of a RAG template in action: | |
thon | |
document1 = { | |
"title": "The Moon: Our Age-Old Foe", | |
"contents": "Man has always dreamed of destroying the moon. In this essay, I shall" | |
} | |
document2 = { | |
"title": "The Sun: Our Age-Old Friend", | |
"contents": "Although often underappreciated, the sun provides several notable benefits" | |
} | |
model_input = tokenizer.apply_chat_template( | |
messages, | |
documents=[document1, document2] | |
) | |
Advanced: How do chat templates work? | |
The chat template for a model is stored on the tokenizer.chat_template attribute. If no chat template is set, the | |
default template for that model class is used instead. Let's take a look at the template for BlenderBot: | |
thon | |
from transformers import AutoTokenizer | |
tokenizer = AutoTokenizer.from_pretrained("facebook/blenderbot-400M-distill") | |
tokenizer.default_chat_template | |
"{% for message in messages %}{% if message['role'] == 'user' %}{{ ' ' }}{% endif %}{{ message['content'] }}{% if not loop.last %}{{ ' ' }}{% endif %}{% endfor %}{{ eos_token }}" | |
That's kind of intimidating. Let's clean it up a little to make it more readable. In the process, though, we also make | |
sure that the newlines and indentation we add don't end up being included in the template output - see the tip on | |
trimming whitespace below! | |
{%- for message in messages %} | |
{%- if message['role'] == 'user' %} | |
{{- ' ' }} | |
{%- endif %} | |
{{- message['content'] }} | |
{%- if not loop.last %} | |
{{- ' ' }} | |
{%- endif %} | |
{%- endfor %} | |
{{- eos_token }} | |
If you've never seen one of these before, this is a Jinja template. | |
Jinja is a templating language that allows you to write simple code that generates text. In many ways, the code and | |
syntax resembles Python. In pure Python, this template would look something like this: | |
python | |
for idx, message in enumerate(messages): | |
if message['role'] == 'user': | |
print(' ') | |
print(message['content']) | |
if not idx == len(messages) - 1: # Check for the last message in the conversation | |
print(' ') | |
print(eos_token) | |
Effectively, the template does three things: | |
1. For each message, if the message is a user message, add a blank space before it, otherwise print nothing. | |
2. Add the message content | |
3. If the message is not the last message, add two spaces after it. After the final message, print the EOS token. | |
This is a pretty simple template - it doesn't add any control tokens, and it doesn't support "system" messages, which | |
are a common way to give the model directives about how it should behave in the subsequent conversation. | |
But Jinja gives you a lot of flexibility to do those things! Let's see a Jinja template that can format inputs | |
similarly to the way LLaMA formats them (note that the real LLaMA template includes handling for default system | |
messages and slightly different system message handling in general - don't use this one in your actual code!) | |
{%- for message in messages %} | |
{%- if message['role'] == 'user' %} | |
{{- bos_token + '[INST] ' + message['content'] + ' [/INST]' }} | |
{%- elif message['role'] == 'system' %} | |
{{- '<<SYS>>\\n' + message['content'] + '\\n<</SYS>>\\n\\n' }} | |
{%- elif message['role'] == 'assistant' %} | |
{{- ' ' + message['content'] + ' ' + eos_token }} | |
{%- endif %} | |
{%- endfor %} | |
Hopefully if you stare at this for a little bit you can see what this template is doing - it adds specific tokens based | |
on the "role" of each message, which represents who sent it. User, assistant and system messages are clearly | |
distinguishable to the model because of the tokens they're wrapped in. | |
Advanced: Adding and editing chat templates | |
How do I create a chat template? | |
Simple, just write a jinja template and set tokenizer.chat_template. You may find it easier to start with an | |
existing template from another model and simply edit it for your needs! For example, we could take the LLaMA template | |
above and add "[ASST]" and "[/ASST]" to assistant messages: | |
{%- for message in messages %} | |
{%- if message['role'] == 'user' %} | |
{{- bos_token + '[INST] ' + message['content'].strip() + ' [/INST]' }} | |
{%- elif message['role'] == 'system' %} | |
{{- '<<SYS>>\\n' + message['content'].strip() + '\\n<</SYS>>\\n\\n' }} | |
{%- elif message['role'] == 'assistant' %} | |
{{- '[ASST] ' + message['content'] + ' [/ASST]' + eos_token }} | |
{%- endif %} | |
{%- endfor %} | |
Now, simply set the tokenizer.chat_template attribute. Next time you use [~PreTrainedTokenizer.apply_chat_template], it will | |
use your new template! This attribute will be saved in the tokenizer_config.json file, so you can use | |
[~utils.PushToHubMixin.push_to_hub] to upload your new template to the Hub and make sure everyone's using the right | |
template for your model! | |
python | |
template = tokenizer.chat_template | |
template = template.replace("SYS", "SYSTEM") # Change the system token | |
tokenizer.chat_template = template # Set the new template | |
tokenizer.push_to_hub("model_name") # Upload your new template to the Hub! | |
The method [~PreTrainedTokenizer.apply_chat_template] which uses your chat template is called by the [TextGenerationPipeline] class, so | |
once you set the correct chat template, your model will automatically become compatible with [TextGenerationPipeline]. | |
If you're fine-tuning a model for chat, in addition to setting a chat template, you should probably add any new chat | |
control tokens as special tokens in the tokenizer. Special tokens are never split, | |
ensuring that your control tokens are always handled as single tokens rather than being tokenized in pieces. You | |
should also set the tokenizer's eos_token attribute to the token that marks the end of assistant generations in your | |
template. This will ensure that text generation tools can correctly figure out when to stop generating text. |