EuroLLM-1.7B-Instruct-literary-analysis

A 1.7B parameter multilingual model for structured literary analysis of European language texts.
Pretrained and fine-tuned from utter-project/EuroLLM-1.7B-Instruct on the agentlans/literary-reasoning dataset.
Supports 35 languages in theory, but fine-tuned mostly on English, French, German, Spanish, Italian, and Portuguese.

Input Format

Literary analysis:
{{YOUR_EUROPEAN_LANGUAGE_TEXT_HERE}}

Example:

Literary analysis:

Als Gregor Samsa eines Morgens aus unruhigen Träumen erwachte, fand er sich in seinem Bett zu einem ungeheueren Ungeziefer verwandelt. Er lag auf seinem panzerartig harten Rücken und sah, wenn er den Kopf ein wenig hob, seinen gewölbten, braunen, von bogenförmigen Versteifungen geteilten Bauch, auf dessen Höhe sich die Bettdecke, zum gänzlichen Niedergleiten bereit, kaum noch erhalten konnte. Seine vielen, im Vergleich zu seinem sonstigen Umfang kläglich dünnen Beine flimmerten ihm hilflos vor den Augen.

Output Format

Returns a brief literary analysis in English as JSON:

{
  "summary": "Gregor Samsa wakes up one morning to find himself transformed into an enormous insect.",
  "language": "German",
  "sentiment": -0.13,
  "tone": "Descriptive, ominous",
  "enunciation": "Third-person narrative",
  "speech_standard": "Standard literary language",
  "genre": "Gothic literature",
  "literary_form": "Description of a person's transformation",
  "literary_movement": "Romanticism",
  "trope": "Metamorphosis",
  "reading_grade": 9.7,
  "narrative_arc": "Suspense",
  "active_character": "Gregor Samsa",
  "fuzzy_place": "Gregor's bedroom"
}

Limitations

Model output has not been thoroughly validated for accuracy or bias.
No additional alignment beyond initial training and supervised fine-tuning.
Hallucination rate is low, but errors remain possible.
Output is sensitive to input formatting,
- For example, may misclassify texts with many short lines as poetry or first-person narrative or dialogue.
May fail to capture all cultural or contextual nuances, especially in historical non-English source texts.
Floating point values may lack the precision of those produced by specialist models such as the agentlans/multilingual-e5-small-aligned-* series.
Distinctions may lack sufficient detail or granularity for certain forms of literary scholarship,
- For example, when focusing on a single author (for example, William Shakespeare) or a specific period (for example, Elizabethan theatre).

Training Details

Pretraining:

Learning rate: 5e-5
Train batch size: 2
Eval batch size: 8
Gradient accumulation: 8
Epochs: 10
Optimizer: AdamW (betas=(0.9,0.999), epsilon=1e-8)
Scheduler: Cosine

Supervised fine-tuning:

Same as pretraining except epochs: 2

Framework versions:

PEFT 0.15.0
Transformers 4.49.0
PyTorch 2.6.0+cu124
Datasets 3.4.1
Tokenizers 0.21.0

Licence

Apache 2.0

agentlans
/

EuroLLM-1.7B-Instruct-literary-analysis