view post Post 3403 Having some fun with long context benchmarks (watch the video!!) NoLiMA: NoLiMa: Long-Context Evaluation Beyond Literal Matching (2502.05167)Fiction LiveBench: https://fiction.live/stories/Fiction-liveBench-Mar-25-2025/oQdzQvKHw8JyXbN87Michalenglo: https://deepmind.google/research/publications/117639/LongGenBench: Spinning the Golden Thread: Benchmarking Long-Form Generation in Language Models (2409.02076)NeedleBench: NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window? (2407.11963)RULER: RULER: What's the Real Context Size of Your Long-Context Language Models? (2404.06654)For more: https://www.reddit.com/r/rajistics/comments/1jxwk29/long_context_llm_benchmarks_video/ let me know if you like these posts See translation 👍 4 4 + Reply
Arctic-SnowCoder: Demystifying High-Quality Data in Code Pretraining Paper • 2409.02326 • Published Sep 3, 2024 • 19
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model Paper • 2211.05100 • Published Nov 9, 2022 • 31