This model will take a instruction template in the format of FineTemplates and a document and return an instantiated instruction and answer pair.

The output will be a JSON object.

Simple Usage Example

import json
import re
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

# Helper to expand excerpts in the answer
def expand(document, text):
    excerpt_pattern = r"<excerpt>(.*?)<\.\.\.>(.*?)</excerpt>"
    matches = re.findall(excerpt_pattern, text, flags=re.DOTALL)
    replacements = {}
    for prefix, suffix in matches:
        match = re.search(
            re.escape(prefix) + r" (.*?) " + re.escape(suffix),
            document,
            flags=re.DOTALL,
        )
        try:
            if match:
                replacements[f"<excerpt>{prefix}<...>{suffix}</excerpt>"] = match.group(
                    0
                )
            else:
                return None
        except Exception:
            return None
    for old, new in replacements.items():
        text = text.replace(old, new)
    return text

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained('fineinstructions/template_instantiator', revision=None)
tokenizer.padding_side = 'left'
model = AutoModelForCausalLM.from_pretrained('fineinstructions/template_instantiator', revision=None)
pipe = pipeline('text-generation', model=model, tokenizer=tokenizer, pad_token_id=tokenizer.pad_token_id, return_full_text=False)

# Run inference to instantiate the instruction template and generate an answer
inputs = [json.dumps({
  "instruction_template": "...",
  "document": "..."
}, indent=2)]
prompts = [tokenizer.apply_chat_template([{'role': 'user', 'content': i}], tokenize=False, add_generation_prompt=True) for i in inputs]
generations = pipe(prompts, max_length=131072, truncation=True, temperature=None, top_p=None, do_sample=False)
output = generations[0][0]['generated_text']
output_json = json.loads()

# Expand the answer
output_json["answer"] = expand(document=inputs[0]["document"], text=output_json["answer"])

# Print the output JSON
print(output_json)

##### Output JSON:
# {
# ..
# }
# 

This model was trained with a synthetic dataset with DataDreamer ๐Ÿค–๐Ÿ’ค. The synthetic dataset card and model card can be found here. The training arguments can be found here.

Downloads last month
10
Safetensors
Model size
1.24B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for fineinstructions/template_instantiator

Finetuned
(463)
this model