Spaces:
Running
Running
File size: 5,919 Bytes
939262b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 |
Beam-search multinomial sampling As the name implies, this decoding strategy combines beam search with multinomial sampling. You need to specify the num_beams greater than 1, and set do_sample=True to use this decoding strategy. thon from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, set_seed set_seed(0) # For reproducibility prompt = "translate English to German: The house is wonderful." checkpoint = "google-t5/t5-small" tokenizer = AutoTokenizer.from_pretrained(checkpoint) inputs = tokenizer(prompt, return_tensors="pt") model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint) outputs = model.generate(**inputs, num_beams=5, do_sample=True) tokenizer.decode(outputs[0], skip_special_tokens=True) 'Das Haus ist wunderbar.' Diverse beam search decoding The diverse beam search decoding strategy is an extension of the beam search strategy that allows for generating a more diverse set of beam sequences to choose from. To learn how it works, refer to Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models. This approach has three main parameters: num_beams, num_beam_groups, and diversity_penalty. The diversity penalty ensures the outputs are distinct across groups, and beam search is used within each group. thon from transformers import AutoTokenizer, AutoModelForSeq2SeqLM checkpoint = "google/pegasus-xsum" prompt = ( "The Permaculture Design Principles are a set of universal design principles " "that can be applied to any location, climate and culture, and they allow us to design " "the most efficient and sustainable human habitation and food production systems. " "Permaculture is a design system that encompasses a wide variety of disciplines, such " "as ecology, landscape design, environmental science and energy conservation, and the " "Permaculture design principles are drawn from these various disciplines. Each individual " "design principle itself embodies a complete conceptual framework based on sound " "scientific principles. When we bring all these separate principles together, we can " "create a design system that both looks at whole systems, the parts that these systems " "consist of, and how those parts interact with each other to create a complex, dynamic, " "living system. Each design principle serves as a tool that allows us to integrate all " "the separate parts of a design, referred to as elements, into a functional, synergistic, " "whole system, where the elements harmoniously interact and work together in the most " "efficient way possible." ) tokenizer = AutoTokenizer.from_pretrained(checkpoint) inputs = tokenizer(prompt, return_tensors="pt") model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint) outputs = model.generate(**inputs, num_beams=5, num_beam_groups=5, max_new_tokens=30, diversity_penalty=1.0) tokenizer.decode(outputs[0], skip_special_tokens=True) 'The Design Principles are a set of universal design principles that can be applied to any location, climate and culture, and they allow us to design the' This guide illustrates the main parameters that enable various decoding strategies. More advanced parameters exist for the [generate] method, which gives you even further control over the [generate] method's behavior. For the complete list of the available parameters, refer to the API documentation. Speculative Decoding Speculative decoding (also known as assisted decoding) is a modification of the decoding strategies above, that uses an assistant model (ideally a much smaller one) with the same tokenizer, to generate a few candidate tokens. The main model then validates the candidate tokens in a single forward pass, which speeds up the decoding process. If do_sample=True, then the token validation with resampling introduced in the speculative decoding paper is used. Currently, only greedy search and sampling are supported with assisted decoding, and assisted decoding doesn't support batched inputs. To learn more about assisted decoding, check this blog post. To enable assisted decoding, set the assistant_model argument with a model. thon from transformers import AutoModelForCausalLM, AutoTokenizer prompt = "Alice and Bob" checkpoint = "EleutherAI/pythia-1.4b-deduped" assistant_checkpoint = "EleutherAI/pythia-160m-deduped" tokenizer = AutoTokenizer.from_pretrained(checkpoint) inputs = tokenizer(prompt, return_tensors="pt") model = AutoModelForCausalLM.from_pretrained(checkpoint) assistant_model = AutoModelForCausalLM.from_pretrained(assistant_checkpoint) outputs = model.generate(**inputs, assistant_model=assistant_model) tokenizer.batch_decode(outputs, skip_special_tokens=True) ['Alice and Bob are sitting in a bar. Alice is drinking a beer and Bob is drinking a'] When using assisted decoding with sampling methods, you can use the temperature argument to control the randomness, just like in multinomial sampling. However, in assisted decoding, reducing the temperature may help improve the latency. thon from transformers import AutoModelForCausalLM, AutoTokenizer, set_seed set_seed(42) # For reproducibility prompt = "Alice and Bob" checkpoint = "EleutherAI/pythia-1.4b-deduped" assistant_checkpoint = "EleutherAI/pythia-160m-deduped" tokenizer = AutoTokenizer.from_pretrained(checkpoint) inputs = tokenizer(prompt, return_tensors="pt") model = AutoModelForCausalLM.from_pretrained(checkpoint) assistant_model = AutoModelForCausalLM.from_pretrained(assistant_checkpoint) outputs = model.generate(**inputs, assistant_model=assistant_model, do_sample=True, temperature=0.5) tokenizer.batch_decode(outputs, skip_special_tokens=True) ['Alice and Bob, a couple of friends of mine, who are both in the same office as'] Alternativelly, you can also set the prompt_lookup_num_tokens to trigger n-gram based assisted decoding, as opposed to model based assisted decoding. You can read more about it here. |