Sagui-7B-Instruct-v0.1
Sagui-7B-Instruct-v0.1 is a fine-tuned language model capable of understanding and generating text in both Portuguese and English. The model was fine-tuned using the SlimOrca dataset and its Portuguese-translated version using LibreTranslate. It is fine-tuned from the sabia-7b model and leverages the capabilities of the Llama architecture.
Model Details
Model Description
Sagui-7B-Instruct-v0.1 is designed to assist with natural language understanding and generation tasks. The model was trained to improve its instructive capabilities and can be used for various applications.
- Model type: LlamaForCausalLM
- Languages (NLP): Portuguese and English
- License: Llama
- Fine-tuned from model: sabia-7b
Uses
Direct Use
Sagui-7B-Instruct-v0.1 can be directly used for general language understanding tasks.
Downstream Use
- Fine-tuning for specific domain-related applications
- Integration into multilingual applications and tools
Out-of-Scope Use
- Generating harmful, biased, or offensive content
- Unauthorized personal data extraction
- Tasks requiring real-time decision making without human oversight
Bias, Risks, and Limitations
- Potential biases inherited from training data
- Risks of misuse in generating misleading or harmful content
- Limitations in understanding context-specific nuances
Recommendations
It is recommended to have human oversight in applications involving sensitive information or high-stakes decisions.
How to Get Started with the Model
Here provides a code snippet with apply_chat_template
to show you how to load the tokenizer and model and how to generate contents.
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # the device to load the model onto
model = AutoModelForCausalLM.from_pretrained(
"OliveiraJLT/Sagui-7B-Instruct-v0.1",
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("OliveiraJLT/Sagui-7B-Instruct-v0.1")
prompt = "Por favor, conte-me sobre as habilidades de comunicação dos saguis."
messages = [
{"role": "system", "content": "Você é Sagui-7B-Instruct-v0.1, um modelo de linguagem. Sua missão é ajudar os usuários em diversas tarefas, fornecendo informações precisas, relevantes e úteis de maneira educada, informativa, envolvente e profissional."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)
generated_ids = model.generate(
model_inputs.input_ids,
max_length=2048
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
Citation
@software{OliveiraJLT2024Sagui7BInstruct01
title = {Sagui-7B-Instruct-v0.1},
author = {Oliveira, J. L. T.},
year = {2024},
publisher = {HuggingFace},
journal = {HuggingFace repository},
howpublished = {\url{https://huggingface.co./OliveiraJLT/Sagui-7B-Instruct-v0.1}
}
Open Portuguese LLM Leaderboard Evaluation Results
Detailed results can be found here and on the 🚀 Open Portuguese LLM Leaderboard
Metric | Value |
---|---|
Average | 39.87 |
ENEM Challenge (No Images) | 51.36 |
BLUEX (No Images) | 43.67 |
OAB Exams | 36.22 |
Assin2 RTE | 71.16 |
Assin2 STS | 3.16 |
FaQuAD NLI | 58.05 |
HateBR Binary | 46.46 |
PT Hate Speech Binary | 30.38 |
tweetSentBR | 18.34 |
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 8.39 |
IFEval (0-Shot) | 28.92 |
BBH (3-Shot) | 5.04 |
MATH Lvl 5 (4-Shot) | 0.38 |
GPQA (0-shot) | 0.00 |
MuSR (0-shot) | 10.61 |
MMLU-PRO (5-shot) | 5.39 |
- Downloads last month
- 6
Model tree for OliveiraJLT/Sagui-7B-Instruct-v0.1
Base model
maritaca-ai/sabia-7bDataset used to train OliveiraJLT/Sagui-7B-Instruct-v0.1
Space using OliveiraJLT/Sagui-7B-Instruct-v0.1 1
Collection including OliveiraJLT/Sagui-7B-Instruct-v0.1
Evaluation results
- accuracy on ENEM Challenge (No Images)Open Portuguese LLM Leaderboard51.360
- accuracy on BLUEX (No Images)Open Portuguese LLM Leaderboard43.670
- accuracy on OAB ExamsOpen Portuguese LLM Leaderboard36.220
- f1-macro on Assin2 RTEtest set Open Portuguese LLM Leaderboard71.160
- pearson on Assin2 STStest set Open Portuguese LLM Leaderboard3.160
- f1-macro on FaQuAD NLItest set Open Portuguese LLM Leaderboard58.050
- f1-macro on HateBR Binarytest set Open Portuguese LLM Leaderboard46.460
- f1-macro on PT Hate Speech Binarytest set Open Portuguese LLM Leaderboard30.380
- f1-macro on tweetSentBRtest set Open Portuguese LLM Leaderboard18.340
- strict accuracy on IFEval (0-Shot)Open LLM Leaderboard28.920