70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float Paper β’ 2504.11651 β’ Published 13 days ago β’ 28
Byte Latent Transformer: Patches Scale Better Than Tokens Paper β’ 2412.09871 β’ Published Dec 13, 2024 β’ 102
RealHarm: A Collection of Real-World Language Model Application Failures Paper β’ 2504.10277 β’ Published 14 days ago β’ 11
Gemma 3 QAT Collection Quantization Aware Trained (QAT) Gemma 3 checkpoints. The model preserves similar quality as half precision while using 3x less memory β’ 15 items β’ Updated 10 days ago β’ 172