Victor Mustar's picture

Victor Mustar PRO

victor

AI & ML interests

Building the UX of this website

Recent Activity

Organizations

Hugging Face's profile picture Google's profile picture Competitions's profile picture Safetensors's profile picture 21 RNN's profile picture Spaces-explorers's profile picture Text Generation Inference's profile picture CVPR Demo Track's profile picture Spaces Examples's profile picture Hugging Chat's profile picture Webhooks Explorers (BETA)'s profile picture lora concepts library's profile picture Scanned Tokens's profile picture Huggingface Projects's profile picture hf admins's profile picture Hugging Face OSS Metrics's profile picture Stable Diffusion Dreambooth Concepts Library's profile picture Core ML Projects's profile picture temp-org's profile picture Blog-explorers's profile picture Mustarz's profile picture Open LLM Leaderboard's profile picture Enterprise Explorers's profile picture The Collectionists's profile picture ZeroGPU Explorers's profile picture Hugging Face Tools's profile picture TstOrg141's profile picture Stable Video benchmark's profile picture Social Post Explorers's profile picture Dev Mode Explorers's profile picture LLHF's profile picture SLLHF's profile picture Self-serve FTW's profile picture Inference Explorers's profile picture

victor's activity

reacted to AdinaY's post with 🔥🔥🔥 about 6 hours ago
view post
Post
2408
Kimi-Audio 🚀🎧 an OPEN audio foundation model released by Moonshot AI
moonshotai/Kimi-Audio-7B-Instruct
✨ 7B
✨ 13M+ hours of pretraining data
✨ Novel hybrid input architecture
✨ Universal audio capabilities (ASR, AQA, AAC, SER, SEC/ASC, end-to-end conversation)
reacted to jasoncorkill's post with 🚀 about 6 hours ago
view post
Post
1533
🚀 Building Better Evaluations: 32K Image Annotations Now Available

Today, we're releasing an expanded version: 32K images annotated with 3.7M responses from over 300K individuals which was completed in under two weeks using the Rapidata Python API.

Rapidata/text-2-image-Rich-Human-Feedback-32k

A few months ago, we published one of our most liked dataset with 13K images based on the @data-is-better-together 's dataset, following Google's research on "Rich Human Feedback for Text-to-Image Generation" (https://arxiv.org/abs/2312.10240). It collected over 1.5M responses from 150K+ participants.

Rapidata/text-2-image-Rich-Human-Feedback

In the examples below, users highlighted words from prompts that were not correctly depicted in the generated images. Higher word scores indicate more frequent issues. If an image captured the prompt accurately, users could select [No_mistakes].

We're continuing to work on large-scale human feedback and model evaluation. If you're working on related research and need large, high-quality annotations, feel free to get in touch: [email protected].
reacted to Aurelien-Morgan's post with 👀 about 6 hours ago
view post
Post
817
The Almighty function-caller

How would you like to build smart GenAi infrastructure ?
Give extensive tools memory to your edge agentic system,
And optimize the resources it takes to run yet a high-performance set of agents ?

We came up with a novel approach to function-calling at scale for smart companies and corporate-grade use-cases.

Read our full-fledged blog article on this here on Hugging Face :
https://huggingface.co./blog/Aurelien-Morgan/the-almighty-function-caller
replied to prithivMLmods's post 3 days ago
reacted to prithivMLmods's post with 🔥 3 days ago
view post
Post
1132
Dropping the domain-specific downstream image classification content moderation models, including the anime image type classification, GeoSceneNet, indoor-outdoor scene classification, and black-and-white vs. colored image classification models, along with the datasets. 🔥

╰┈➤Models :
+ GeoSceneNet : prithivMLmods/Multilabel-GeoSceneNet
+ IndoorOutdoorNet : prithivMLmods/IndoorOutdoorNet
+ B&W vs Colored : prithivMLmods/BnW-vs-Colored-Detection
+ Anime Image Type : prithivMLmods/Anime-Classification-v1.0
+ Multilabel Portrait : prithivMLmods/Multilabel-Portrait-SigLIP2

╰┈➤Datasets :
- GeoSceneNet : prithivMLmods/Multilabel-GeoSceneNet-16K
- IndoorOutdoorNet : prithivMLmods/IndoorOutdoorNet-20K
- BnW vs Colored : prithivMLmods/BnW-vs-Colored-10K
- Multilabel Portrait : prithivMLmods/Multilabel-Portrait-18K

╰┈➤Collections :
> Multilabel Image Classification Datasets : prithivMLmods/multilabel-image-classification-datasets-6809aa64637f45d4c47fa6ca
> Model Collection : prithivMLmods/siglip2-content-filters-models-v2-68053a958c42ef17a3a3f4d1

Note: The anime scene type dataset is not mentioned in the list because it is private and only accessible to members of the DeepGHS organization.

For raw ZIP files or more information about the datasets, visit: https://www.kaggle.com/prithivsakthiur/datasets
  • 1 reply
·
reacted to orasul's post with 👍 3 days ago
view post
Post
2067
hi, it is deki, and now I am open sourced.

An Android AI agent powered by open-source ML model, 𝗱𝗲𝗸𝗶, was fully open-sourced.

It understands what’s on your screen and can perform tasks based on your voice or text commands.

Some examples:
* "Write my friend "some_name" in WhatsApp that I'll be 15 minutes late"
* "Open Twitter in the browser and write a post about something"
* "Read my latest notifications"
* "Write a linkedin post about something"

Currently, it works only on Android — but support for other OS is planned.

The ML and backend codes were also fully open-sourced.

Video prompt example:

"Open linkedin, tap post and write: hi, it is deki, and now I am open sourced. But don't send, just return"

License: GPLv3

You can find other AI agent demos or usage examples, like, code generation or object detection in github.

Github: https://github.com/RasulOs/deki
  • 2 replies
·
reacted to YerbaPage's post with 🔥 4 days ago
view post
Post
1984
Curated list of **Repository-level Code Generation** papers & benchmarks! 🔥

Stay ahead with the latest in:
✅ Repo-level Issue Resolution (SWE-bench, Agents)
✅ Repo-level Code Completion (Repo understanding)
✅ Datasets & Benchmarks

👉 Check it out: https://github.com/YerbaPage/Awesome-Repo-Level-Code-Generation 🔥
reacted to ProCreations's post with 🔥 4 days ago
view post
Post
2086
Come check out my new dataset Mistake to Meaning as an attempt to help smaller models understand user typos better! Hope you guys enjoy it

ProCreations/Mistake-To-Meaning
posted an update 5 days ago
view post
Post
2426
DIA TTS is just amazing - please share your funniest gens (here is mine) 😂
nari-labs/Dia-1.6B
reacted to davidberenstein1957's post with 🚀 5 days ago
reacted to AdinaY's post with 🔥 5 days ago
view post
Post
3332
MAGI-1 🪄 the autoregressive diffusion video model, released by Sand AI

sand-ai/MAGI-1

✨ 24B with Apache 2.0
✨ Strong temporal consistency
✨ Benchmark-topping performance
  • 1 reply
·
reacted to shekkizh's post with 👀 5 days ago
view post
Post
1839
Think AGI is just around the corner? Not so fast.

When OpenAI released its Computer-Using Agent (CUA) API, I happened to be playing Wordle 🧩 and thought, why not see how the model handles it?
Spoiler: Wordle turned out to be a surprisingly effective benchmark.
So Romain Cosentino Ph.D. and I dug in and analyzed the results of several hundred runs.

🔑 Takeaways
1️⃣ Even the best computer-using models struggle with simple, context-dependent tasks. 
2️⃣ Visual perception and reasoning remain major hurdles for multimodal agents.
3️⃣ Real-world use cases reveal significant gaps between hype and reality. Perception accuracy drops to near zero by the last turn 📉

🔗 Read our arxiv article for more details https://www.arxiv.org/abs/2504.15434
  • 3 replies
·
reacted to ProCreations's post with 🔥 5 days ago
view post
Post
1357
🤖 IntellIte‑Chat v1.0 (Coming Soon)

A compact chat model built for speed, efficiency, and simplicity.

IntellIte‑Chat v1.0 is the debut model in the IntellIte series—a lightweight conversational transformer crafted to be fast, memory-efficient, and easy to work with. It’s designed for devs and enthusiasts who want sharp results without huge resource demands.

No fluff. Just chats.



🎯 Target Specs
• Pretraining Tokens: 4 billion
• Context Length: 16,384 tokens



🧠 Parameters & Architecture
• Model Size: ~100M parameters
• Architecture: Modified GPT-NeoX
• Focus: Chat performance with low latency and efficient memory use



🧃 Support the Build
Every dollar you donate is an extra amount of VRAM I get to work with. 😅
This project is fully independent and entirely self-funded. If you want to help bring it to life:
👉 https://buymeacoffee.com/procreations



💛 Early Supporters
All early supporters will be credited here when the model launches.
Even the smallest support means the world and pushes this project forward.

Special thanks to:
Maybe you?



🛠️ Development Status
• Architecture Design: Completed ✅
• Dataset Planning: Completed ✅
• Training Code: Near Completion 🛠️
• Training Launch: Starting Soon ⏳
• Evaluation Setup: Coming soon 🔜
• Final Release: Coming soon 🔜



Built to chat. Built on a budget. Built to prove what small models can do.
reacted to clem's post with 🔥 5 days ago
view post
Post
3784
Energy is a massive constraint for AI but do you even know what energy your chatGPT convos are using?

We're trying to change this by releasing ChatUI-energy, the first interface where you see in real-time what energy your AI conversations consume. Great work from @jdelavande powered by spaces & TGI, available for a dozen of open-source models like Llama, Mistral, Qwen, Gemma and more.

jdelavande/chat-ui-energy

Should all chat interfaces have this? Just like ingredients have to be shown on products you buy, we need more transparency in AI for users!
  • 3 replies
·
reacted to linoyts's post with 👍 5 days ago
reacted to clem's post with 🤗 5 days ago
view post
Post
2835
Just crossed half a million public apps on Hugging Face. A new public app is created every minute these days 🤯🤯🤯

What's your favorite? http://hf.co/spaces
  • 3 replies
·
reacted to bhalajin's post with 🔥 5 days ago
view post
Post
1613
###### CVPR2025 Workshop Challenge Alert ######

🫠 Between deadlines, rebuttals, and existential crises??? "We got you!!!!"

📢 Our new CVPR25 multi-modal challenge is online !!!

🍽️ Dishcovery: VLM MetaFood Challenge!!!! 🍽️


😋🧫 Can your groundbreaking VLM understand the difference between sushi styles, pasta types, or cooking methods from just image + caption pairs?

🌐 Our Task: Match fine-grained images to food descriptions


Challenge Highlights:

📦 400K food image-caption pairs, a little taste to get you started !!!

🔬 Got a SoTA VLM? Come test it on our challenging test sets !!!

🎯 Challenge for everyone! Easy to use SigLIP baseline is provided !!!

🔍 Real, synthetic, noisy data – just like real life - Will your VLM redefine how people track their diets??? ( 🗣️ We believe so!!! )


🔗 Join the challenge: https://www.kaggle.com/competitions/dishcovery-vlm-mtf-cvpr-2025

🗓️ Deadline: Phase I: 4th of May, 2025 - Phase II: 10th of May, 2025

👉 Workshop website: https://sites.google.com/view/cvpr-metafood-2025


#CVPR25 #ComputerVision #CV #Deeplearning #DL #VisionLanguage #VLM #multimodal #FoundationModels
reacted to luigi12345's post with 🔥 5 days ago
view post
Post
2381
SkyReels-V2 INFINITE VIDEO🔥♾️🎬 UNLIMITED duration video generation model by Skywork.

> “Finally is here. An Open-Source model that achieves what we all have waiting for: Infinite Length Videos.’’😮

Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought (2504.05599)

Model: Skywork/SkyReels-V2-T2V-14B-720P

✨ 1.3B & 14B
✨ Generates infinite length videos using Diffusion Forcing with diffusion models + autoregressive methods