Spaces:

thenativefox
/

RAG

Running

RAG / openai_text-embedding-ada-002 /recursive_chunks /_generation_strategies.txt_chunk_1.txt

thenativefox

Added split files and tables

939262b 10 months ago

5.92 kB

	Beam-search multinomial sampling
	As the name implies, this decoding strategy combines beam search with multinomial sampling. You need to specify
	the num_beams greater than 1, and set do_sample=True to use this decoding strategy.
	thon

	from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, set_seed
	set_seed(0) # For reproducibility
	prompt = "translate English to German: The house is wonderful."
	checkpoint = "google-t5/t5-small"
	tokenizer = AutoTokenizer.from_pretrained(checkpoint)
	inputs = tokenizer(prompt, return_tensors="pt")
	model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint)
	outputs = model.generate(**inputs, num_beams=5, do_sample=True)
	tokenizer.decode(outputs[0], skip_special_tokens=True)
	'Das Haus ist wunderbar.'

	Diverse beam search decoding
	The diverse beam search decoding strategy is an extension of the beam search strategy that allows for generating a more diverse
	set of beam sequences to choose from. To learn how it works, refer to Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models.
	This approach has three main parameters: num_beams, num_beam_groups, and diversity_penalty.
	The diversity penalty ensures the outputs are distinct across groups, and beam search is used within each group.
	thon

	from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
	checkpoint = "google/pegasus-xsum"
	prompt = (
	"The Permaculture Design Principles are a set of universal design principles "
	"that can be applied to any location, climate and culture, and they allow us to design "
	"the most efficient and sustainable human habitation and food production systems. "
	"Permaculture is a design system that encompasses a wide variety of disciplines, such "
	"as ecology, landscape design, environmental science and energy conservation, and the "
	"Permaculture design principles are drawn from these various disciplines. Each individual "
	"design principle itself embodies a complete conceptual framework based on sound "
	"scientific principles. When we bring all these separate principles together, we can "
	"create a design system that both looks at whole systems, the parts that these systems "
	"consist of, and how those parts interact with each other to create a complex, dynamic, "
	"living system. Each design principle serves as a tool that allows us to integrate all "
	"the separate parts of a design, referred to as elements, into a functional, synergistic, "
	"whole system, where the elements harmoniously interact and work together in the most "
	"efficient way possible."
	)
	tokenizer = AutoTokenizer.from_pretrained(checkpoint)
	inputs = tokenizer(prompt, return_tensors="pt")
	model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint)
	outputs = model.generate(**inputs, num_beams=5, num_beam_groups=5, max_new_tokens=30, diversity_penalty=1.0)
	tokenizer.decode(outputs[0], skip_special_tokens=True)
	'The Design Principles are a set of universal design principles that can be applied to any location, climate and
	culture, and they allow us to design the'

	This guide illustrates the main parameters that enable various decoding strategies. More advanced parameters exist for the
	[generate] method, which gives you even further control over the [generate] method's behavior.
	For the complete list of the available parameters, refer to the API documentation.
	Speculative Decoding
	Speculative decoding (also known as assisted decoding) is a modification of the decoding strategies above, that uses an
	assistant model (ideally a much smaller one) with the same tokenizer, to generate a few candidate tokens. The main
	model then validates the candidate tokens in a single forward pass, which speeds up the decoding process. If
	do_sample=True, then the token validation with resampling introduced in the
	speculative decoding paper is used.
	Currently, only greedy search and sampling are supported with assisted decoding, and assisted decoding doesn't support batched inputs.
	To learn more about assisted decoding, check this blog post.
	To enable assisted decoding, set the assistant_model argument with a model.
	thon

	from transformers import AutoModelForCausalLM, AutoTokenizer
	prompt = "Alice and Bob"
	checkpoint = "EleutherAI/pythia-1.4b-deduped"
	assistant_checkpoint = "EleutherAI/pythia-160m-deduped"
	tokenizer = AutoTokenizer.from_pretrained(checkpoint)
	inputs = tokenizer(prompt, return_tensors="pt")
	model = AutoModelForCausalLM.from_pretrained(checkpoint)
	assistant_model = AutoModelForCausalLM.from_pretrained(assistant_checkpoint)
	outputs = model.generate(**inputs, assistant_model=assistant_model)
	tokenizer.batch_decode(outputs, skip_special_tokens=True)
	['Alice and Bob are sitting in a bar. Alice is drinking a beer and Bob is drinking a']

	When using assisted decoding with sampling methods, you can use the temperature argument to control the randomness,
	just like in multinomial sampling. However, in assisted decoding, reducing the temperature may help improve the latency.
	thon

	from transformers import AutoModelForCausalLM, AutoTokenizer, set_seed
	set_seed(42) # For reproducibility
	prompt = "Alice and Bob"
	checkpoint = "EleutherAI/pythia-1.4b-deduped"
	assistant_checkpoint = "EleutherAI/pythia-160m-deduped"
	tokenizer = AutoTokenizer.from_pretrained(checkpoint)
	inputs = tokenizer(prompt, return_tensors="pt")
	model = AutoModelForCausalLM.from_pretrained(checkpoint)
	assistant_model = AutoModelForCausalLM.from_pretrained(assistant_checkpoint)
	outputs = model.generate(**inputs, assistant_model=assistant_model, do_sample=True, temperature=0.5)
	tokenizer.batch_decode(outputs, skip_special_tokens=True)
	['Alice and Bob, a couple of friends of mine, who are both in the same office as']

	Alternativelly, you can also set the prompt_lookup_num_tokens to trigger n-gram based assisted decoding, as opposed
	to model based assisted decoding. You can read more about it here.