EnxinSong

Enxin

AI & ML interests

None yet

Recent Activity

published a dataset 1 day ago
Enxin/lmms_video_mmlu
updated a collection 1 day ago
Video-MMLU
updated a collection 1 day ago
Video-MMLU
View all activity

Organizations

LOVEU-CVPR24-Track1's profile picture attention-videos's profile picture

Posts 1

view post
Post
1054
πŸŽ‰ Introducing Video-MMLU, a new benchmark for evaluating large multimodal models on classroom-style lectures in math, physics, and chemistry!

πŸ§‘β€πŸ«πŸ“šVideo-MMLU requires strong reasoning capabilities and world knowledge compared to the previous benchmarks for video LMMs.

Each video comes with two tasks:
πŸ“ Take Notes β€” detailed captioning of multi-discipline lectures
🧠 Do Quiz β€” open-ended QA to test reasoning over visuals & proofs

We evaluated 90+ models, including vision-blind baselines, open-source models and proprietary ones.
πŸ“‰ We find that existing models generally perform poorly, with accuracy ranging from only 10% to 50%.
πŸ“‰We also explore how the number of visual tokens and the base LLMs influence performance, offering insights into the interplay between multimodal perception and reasoning in lecture comprehension.

For more details, please check below:
πŸ“„ Paper: https://arxiv.org/abs/2504.14693
πŸ’» Code: https://github.com/Espere-1119-Song/Video-MMLU
🧠 Data: Enxin/Video-MMLU
🌐 Website: https://enxinsong.com/Video-MMLU-web/