Smolvencoder

Enterprise

non-profit

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

mfarre authored a paper 21 days ago

SmolVLM: Redefining small and efficient multimodal models

andito authored a paper 21 days ago

SmolVLM: Redefining small and efficient multimodal models

mfarre authored a paper about 1 month ago

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

View all activity

SmolvencoderOrg's activity

mfarre

authored a paper 21 days ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published 21 days ago • 176

andito

authored a paper 21 days ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published 21 days ago • 176

mfarre

authored a paper about 1 month ago

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

Paper • 2503.11576 • Published Mar 14 • 99

andito

authored a paper about 1 month ago

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

Paper • 2503.11576 • Published Mar 14 • 99

manu

authored 2 papers about 2 months ago

EuroBERT: Scaling Multilingual Encoders for European Languages

Paper • 2503.05500 • Published Mar 7 • 78

MMTEB: Massive Multilingual Text Embedding Benchmark

Paper • 2502.13595 • Published Feb 19 • 34

andito

posted an update about 2 months ago

Post

2652

Extremely bullish on @CohereForAI 's Aya Vision (8B & 32B) - new SOTA open-weight VLMs

- 8B wins up to 81% of the time in its class, better than Gemini Flash
- 32B beats Llama 3.2 90B!
- Covers 23 languages, excels in image captioning, VQA & more
- Integrated on transformers from Day 0!

Efficient multimodal models are here to stay!!🔥
Check out their blog! https://huggingface.co./blog/aya-vision

andito

authored a paper 3 months ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4 • 229

andito

posted an update 3 months ago

Post

1643

𝗜𝗻𝘁𝗿𝗼𝗱𝘂𝗰𝗶𝗻𝗴 𝘁𝗵𝗲 𝘄𝗼𝗿𝗹𝗱'𝘀 𝘀𝗺𝗮𝗹𝗹𝗲𝘀𝘁 𝘃𝗶𝘀𝗶𝗼𝗻 𝗹𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗺𝗼𝗱𝗲𝗹!

We’re thrilled to share 𝗦𝗺𝗼𝗹𝗩𝗟𝗠 (256M & 500M)—the smallest Visual Language Models ever built. Think: running on <1GB of GPU memory—you can fine-tune it on your laptop and run it on your toaster!

Why It’s Game-Changing:
- 𝗢𝘂𝘁𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝘀 𝗟𝗮𝗿𝗴𝗲𝗿 𝗠𝗼𝗱𝗲𝗹𝘀: Even the 256M model surpasses our SOTA 80B-parameter model from just 17 months ago. Over 300x reduction!
𝗠𝗶𝗴𝗵𝘁𝘆 𝗘𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝗰𝘆: The 256M version delivers 80% of our 2.2B model’s performance, and the 500M version hits 90%
𝗟𝗶𝗴𝗵𝘁𝗻𝗶𝗻𝗴-𝗙𝗮𝘀𝘁 𝗦𝗲𝗮𝗿𝗰𝗵: SmolVLM integrates with ColiPali for state-of-the-art retrieval speeds—on par with models 10x bigger. That means cheaper, faster indexing and real-world impact.

What’s New Under the Hood:
- 𝗡𝗲𝘄 𝗩𝗶𝘀𝗶𝗼𝗻 𝗘𝗻𝗰𝗼𝗱𝗲𝗿: Smaller overall size (400M -> 93M), but with higher resolution.
- 𝗛𝗶𝗴𝗵𝗲𝗿 𝗣𝗶𝘅𝗲𝗹𝘀/𝗧𝗼𝗸𝗲𝗻: 4096 vs. 1820—more efficient image processing.
- 𝗦𝗺𝗮𝗿𝘁 𝗧𝗼𝗸𝗲𝗻𝗶𝘇𝗮𝘁𝗶𝗼𝗻: Faster training and a performance boost.

Check our blog: https://huggingface.co./blog/smolervlm
The models: HuggingFaceTB/smolvlm-256m-and-500m-6791fafc5bb0ab8acc960fb0
The demo: HuggingFaceTB/SmolVLM-256M-Demo

1 reply

andito

posted an update 5 months ago

Post

1971

SmolVLM speeding locally on a laptop thanks to mlx-vlm and
@Gradio ! Try it with two lines:
pip install git+https://github.com/andimarafioti/mlx-vlm.git@stream-generate-fix
python -m mlx_vlm.chat_ui --model mlx-community/SmolVLM-Instruct-8bit

Gotta love the MLX community! Big thanks to @pcuenq and @prince_canuma !

andito

posted an update 5 months ago

Post

3390

Let's go! We are releasing SmolVLM, a smol 2B VLM built for on-device inference that outperforms all models at similar GPU RAM usage and tokens throughputs.

- SmolVLM generates tokens 7.5 to 16 times faster than Qwen2-VL! 🤯
- Other models at this size crash a laptop, but SmolVLM comfortably generates 17 tokens/sec on a macbook! 🚀
- SmolVLM can be fine-tuned on a Google collab! Or process millions of documents with a consumer GPU!
- SmolVLM even outperforms larger models in video benchmarks, despite not even being trained on videos!

Check out more!
Demo: HuggingFaceTB/SmolVLM
Blog: https://huggingface.co./blog/smolvlm
Model: HuggingFaceTB/SmolVLM-Instruct
Fine-tuning script: https://github.com/huggingface/smollm/blob/main/finetuning/Smol_VLM_FT.ipynb

manu

authored a paper 7 months ago

EuroLLM: Multilingual Language Models for Europe

Paper • 2409.16235 • Published Sep 24, 2024 • 26

andito

posted an update 8 months ago

Post

1095

Hugging face presents FineVideo 🎥! Unlocking the next generation of Video understanding 🚀

🤯3400 hours of annotated Creative Common videos with rich character descriptions, scene splits, mood, and content descriptions per scene as well as QA pairs.
🔥
@mfarre processed over 2M videos of Youtube-CC to make this incredibly powerful selection.

Very psyched to fine-tune idefics on this dataset. ⚡️
Explore the videos: HuggingFaceFV/FineVideo-Explorer

andito

authored 2 papers 8 months ago

GACELA -- A generative adversarial context encoder for long audio inpainting

Paper • 2005.05032 • Published May 11, 2020

Adversarial Generation of Time-Frequency Features with application in audio synthesis

Paper • 1902.04072 • Published Feb 11, 2019

andito

posted an update 8 months ago

Post

1636

🚀 Introducing Hugging Face's Multilingual Speech-to-Speech! 🎤
💬Our modular, cross-platform pipeline to run GPT4o-like experiences on device can now seamlessly switch languages mid-conversation with an imperceptible 100ms delay.

🌟 Building on an amazing early reception with 2600 stars on GitHub 🌟
🚀 We are expanding the library to support multiple languages
🔥 Try it out with a flag: --language fr
🤯 Or don't set the flag and let the system detect the language

💡 What feature should we add next?