Merve Noyan's picture

Merve Noyan PRO

merve

·

https://github.com/merveenoyan/smol-vision

AI & ML interests

VLMs, vision & co

Recent Activity

updated a dataset about 8 hours ago

vlmbook/images

replied to their post about 11 hours ago

Don't sleep on new AI at Meta Vision-Language release! 🔥 https://huggingface.co./collections/facebook/perception-encoder-67f977c9a65ca5895a7f6ba1 https://huggingface.co./collections/facebook/perception-lm-67f9783f171948c383ee7498 Meta dropped swiss army knives for vision with A2.0 license 👏 > image/video encoders for vision language modelling and spatial understanding (object detection etc) 👏 > The vision LM outperforms InternVL3 and Qwen2.5VL 👏 > They also release gigantic video and image datasets The authors attempt to come up with single versatile vision encoder to align on diverse set of tasks. They trained Perception Encoder (PE) Core: a new state-of-the-art family of vision encoders that can be aligned for both vision-language and spatial tasks. For zero-shot image tasks, it outperforms latest sota SigLIP2 👏 > Among fine-tuned ones, first one is PE-Spatial. It's a model to detect bounding boxes, segmentation, depth estimation and it outperforms all other models 😮 > Second one is PLM, Perception Language Model, where they combine PE-Core with Qwen2.5 LM 7B. it outperforms all other models (including InternVL3 which was trained with Qwen2.5LM too!) The authors release the following checkpoints in sizes base, large and giant: > 3 PE-Core checkpoints (224, 336, 448) > 2 PE-Lang checkpoints (L, G) > One PE-Spatial (G, 448) > 3 PLM (1B, 3B, 8B) > Datasets Authors release following datasets 📑 > PE Video: Gigantic video datasete of 1M videos with 120k expert annotations ⏯️ > PLM-Video and PLM-Image: Human and auto-annotated image and video datasets on region-based tasks > PLM-VideoBench: New video benchmark on MCQA

updated a dataset about 11 hours ago

huggingfacejs/tasks

View all activity

Organizations

merve's activity

New activity in nvidia/DAM-3B-Self-Contained 5 days ago

Add license to model metadata

#2 opened 5 days ago by

License

#1 opened 5 days ago by

New activity in llamaindex/vdr-2b-v1 13 days ago

add pipeline_tag

#3 opened 13 days ago by

New activity in google/shieldgemma-2-4b-it about 1 month ago

how do i interpret the results

#2 opened about 1 month ago by

Model Fintune

#3 opened about 1 month ago by

ImportError: cannot import name 'ShieldGemmaForImageClassification' from 'transformers'

#1 opened about 2 months ago by

New activity in huggingfacejs/tasks about 2 months ago

Upload 2 files

#6 opened about 2 months ago by

New activity in huggingface-projects/gemma-3-12b-it about 2 months ago

Update app.py

#6 opened about 2 months ago by

New activity in google/gemma-3-4b-it about 2 months ago

Add code snippet, change pipeline tag and library name

#4 opened about 2 months ago by

New activity in google/gemma-3-12b-it about 2 months ago

Add code snippet, change pipeline tag and library name

#2 opened about 2 months ago by

New activity in google/gemma-3-27b-it about 2 months ago

Add code snippet, change pipeline tag and library name

#1 opened about 2 months ago by

New activity in google/gemma-3-4b-it about 2 months ago

Add code snippet, change pipeline tag and library name

#3 opened about 2 months ago by

New activity in merve/UDOP about 2 months ago

Update README.md

#2 opened about 2 months ago by

New activity in google/paligemma2-3b-mix-224-jax 3 months ago

Tweaks to the model card

#1 opened 3 months ago by

New activity in google/paligemma2-3b-mix-224 3 months ago

create model card

#1 opened 5 months ago by