Merve Noyan PRO
merve
AI & ML interests
VLMs, vision & co
Recent Activity
updated
a dataset
about 8 hours ago
vlmbook/images
replied to
their
post
about 11 hours ago
Don't sleep on new AI at Meta Vision-Language release! 🔥
https://huggingface.co./collections/facebook/perception-encoder-67f977c9a65ca5895a7f6ba1
https://huggingface.co./collections/facebook/perception-lm-67f9783f171948c383ee7498
Meta dropped swiss army knives for vision with A2.0 license 👏
> image/video encoders for vision language modelling and spatial understanding (object detection etc) 👏
> The vision LM outperforms InternVL3 and Qwen2.5VL 👏
> They also release gigantic video and image datasets
The authors attempt to come up with single versatile vision encoder to align on diverse set of tasks.
They trained Perception Encoder (PE) Core: a new state-of-the-art family of vision encoders that can be aligned for both vision-language and spatial tasks. For zero-shot image tasks, it outperforms latest sota SigLIP2 👏
> Among fine-tuned ones, first one is PE-Spatial. It's a model to detect bounding boxes, segmentation, depth estimation and it outperforms all other models 😮
> Second one is PLM, Perception Language Model, where they combine PE-Core with Qwen2.5 LM 7B. it outperforms all other models (including InternVL3 which was trained with Qwen2.5LM too!)
The authors release the following checkpoints in sizes base, large and giant:
> 3 PE-Core checkpoints (224, 336, 448)
> 2 PE-Lang checkpoints (L, G)
> One PE-Spatial (G, 448)
> 3 PLM (1B, 3B, 8B)
> Datasets
Authors release following datasets 📑
> PE Video: Gigantic video datasete of 1M videos with 120k expert annotations ⏯️
> PLM-Video and PLM-Image: Human and auto-annotated image and video datasets on region-based tasks
> PLM-VideoBench: New video benchmark on MCQA
updated
a dataset
about 11 hours ago
huggingfacejs/tasks
Organizations
merve's activity
Add license to model metadata
#2 opened 5 days ago
by
merve

License
1
1
#1 opened 5 days ago
by
merve

add pipeline_tag
1
#3 opened 13 days ago
by
merve

how do i interpret the results
1
3
#2 opened about 1 month ago
by
cuiyi0326

Model Fintune
1
#3 opened about 1 month ago
by
BITDDD
ImportError: cannot import name 'ShieldGemmaForImageClassification' from 'transformers'
15
#1 opened about 2 months ago
by
feabries
Upload 2 files
#6 opened about 2 months ago
by
mervenoyan

Update app.py
#6 opened about 2 months ago
by
reach-vb

Add code snippet, change pipeline tag and library name
#4 opened about 2 months ago
by
merve

Add code snippet, change pipeline tag and library name
#2 opened about 2 months ago
by
merve

Add code snippet, change pipeline tag and library name
#1 opened about 2 months ago
by
merve

Add code snippet, change pipeline tag and library name
#3 opened about 2 months ago
by
merve

Update README.md
#2 opened about 2 months ago
by
Amirsalihdev
Tweaks to the model card
#1 opened 3 months ago
by
pcuenq

create model card
#1 opened 5 months ago
by
ariG23498
