llm-studio / documentation /docs /tooltips /experiments /_attention-implementation.mdx
qinfeng722's picture
Upload 322 files
5caedb4 verified
raw
history blame contribute delete
359 Bytes
Allows to change the utilized attention implementation.
- **Auto** selection will automatically choose the implementation based on system availability.
- **Eager** relies on vanilla attention implementation in Python.
- **SDPA** uses scaled dot product attention in PyTorch.
- **Flash Attention 2** explicitly uses FA2 which requires the flash_attn package.