Spaces:
Running
Running
Beam-search multinomial sampling | |
As the name implies, this decoding strategy combines beam search with multinomial sampling. You need to specify | |
the num_beams greater than 1, and set do_sample=True to use this decoding strategy. | |
thon | |
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, set_seed | |
set_seed(0) # For reproducibility | |
prompt = "translate English to German: The house is wonderful." | |
checkpoint = "google-t5/t5-small" | |
tokenizer = AutoTokenizer.from_pretrained(checkpoint) | |
inputs = tokenizer(prompt, return_tensors="pt") | |
model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint) | |
outputs = model.generate(**inputs, num_beams=5, do_sample=True) | |
tokenizer.decode(outputs[0], skip_special_tokens=True) | |
'Das Haus ist wunderbar.' | |
Diverse beam search decoding | |
The diverse beam search decoding strategy is an extension of the beam search strategy that allows for generating a more diverse | |
set of beam sequences to choose from. To learn how it works, refer to Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models. | |
This approach has three main parameters: num_beams, num_beam_groups, and diversity_penalty. | |
The diversity penalty ensures the outputs are distinct across groups, and beam search is used within each group. | |
thon | |
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM | |
checkpoint = "google/pegasus-xsum" | |
prompt = ( | |
"The Permaculture Design Principles are a set of universal design principles " | |
"that can be applied to any location, climate and culture, and they allow us to design " | |
"the most efficient and sustainable human habitation and food production systems. " | |
"Permaculture is a design system that encompasses a wide variety of disciplines, such " | |
"as ecology, landscape design, environmental science and energy conservation, and the " | |
"Permaculture design principles are drawn from these various disciplines. Each individual " | |
"design principle itself embodies a complete conceptual framework based on sound " | |
"scientific principles. When we bring all these separate principles together, we can " | |
"create a design system that both looks at whole systems, the parts that these systems " | |
"consist of, and how those parts interact with each other to create a complex, dynamic, " | |
"living system. Each design principle serves as a tool that allows us to integrate all " | |
"the separate parts of a design, referred to as elements, into a functional, synergistic, " | |
"whole system, where the elements harmoniously interact and work together in the most " | |
"efficient way possible." | |
) | |
tokenizer = AutoTokenizer.from_pretrained(checkpoint) | |
inputs = tokenizer(prompt, return_tensors="pt") | |
model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint) | |
outputs = model.generate(**inputs, num_beams=5, num_beam_groups=5, max_new_tokens=30, diversity_penalty=1.0) | |
tokenizer.decode(outputs[0], skip_special_tokens=True) | |
'The Design Principles are a set of universal design principles that can be applied to any location, climate and | |
culture, and they allow us to design the' | |
This guide illustrates the main parameters that enable various decoding strategies. More advanced parameters exist for the | |
[generate] method, which gives you even further control over the [generate] method's behavior. | |
For the complete list of the available parameters, refer to the API documentation. | |
Speculative Decoding | |
Speculative decoding (also known as assisted decoding) is a modification of the decoding strategies above, that uses an | |
assistant model (ideally a much smaller one) with the same tokenizer, to generate a few candidate tokens. The main | |
model then validates the candidate tokens in a single forward pass, which speeds up the decoding process. If | |
do_sample=True, then the token validation with resampling introduced in the | |
speculative decoding paper is used. | |
Currently, only greedy search and sampling are supported with assisted decoding, and assisted decoding doesn't support batched inputs. | |
To learn more about assisted decoding, check this blog post. | |
To enable assisted decoding, set the assistant_model argument with a model. | |
thon | |
from transformers import AutoModelForCausalLM, AutoTokenizer | |
prompt = "Alice and Bob" | |
checkpoint = "EleutherAI/pythia-1.4b-deduped" | |
assistant_checkpoint = "EleutherAI/pythia-160m-deduped" | |
tokenizer = AutoTokenizer.from_pretrained(checkpoint) | |
inputs = tokenizer(prompt, return_tensors="pt") | |
model = AutoModelForCausalLM.from_pretrained(checkpoint) | |
assistant_model = AutoModelForCausalLM.from_pretrained(assistant_checkpoint) | |
outputs = model.generate(**inputs, assistant_model=assistant_model) | |
tokenizer.batch_decode(outputs, skip_special_tokens=True) | |
['Alice and Bob are sitting in a bar. Alice is drinking a beer and Bob is drinking a'] | |
When using assisted decoding with sampling methods, you can use the temperature argument to control the randomness, | |
just like in multinomial sampling. However, in assisted decoding, reducing the temperature may help improve the latency. | |
thon | |
from transformers import AutoModelForCausalLM, AutoTokenizer, set_seed | |
set_seed(42) # For reproducibility | |
prompt = "Alice and Bob" | |
checkpoint = "EleutherAI/pythia-1.4b-deduped" | |
assistant_checkpoint = "EleutherAI/pythia-160m-deduped" | |
tokenizer = AutoTokenizer.from_pretrained(checkpoint) | |
inputs = tokenizer(prompt, return_tensors="pt") | |
model = AutoModelForCausalLM.from_pretrained(checkpoint) | |
assistant_model = AutoModelForCausalLM.from_pretrained(assistant_checkpoint) | |
outputs = model.generate(**inputs, assistant_model=assistant_model, do_sample=True, temperature=0.5) | |
tokenizer.batch_decode(outputs, skip_special_tokens=True) | |
['Alice and Bob, a couple of friends of mine, who are both in the same office as'] | |
Alternativelly, you can also set the prompt_lookup_num_tokens to trigger n-gram based assisted decoding, as opposed | |
to model based assisted decoding. You can read more about it here. |