victor (Victor Mustar)

reacted to AdinaY's post with 🔥🔥🔥 about 6 hours ago

Post

2408

Kimi-Audio 🚀🎧 an OPEN audio foundation model released by Moonshot AI
moonshotai/Kimi-Audio-7B-Instruct
✨ 7B
✨ 13M+ hours of pretraining data
✨ Novel hybrid input architecture
✨ Universal audio capabilities (ASR, AQA, AAC, SER, SEC/ASC, end-to-end conversation)

reacted to jasoncorkill's post with 🚀 about 6 hours ago

Post

1533

🚀 Building Better Evaluations: 32K Image Annotations Now Available

Today, we're releasing an expanded version: 32K images annotated with 3.7M responses from over 300K individuals which was completed in under two weeks using the Rapidata Python API.

Rapidata/text-2-image-Rich-Human-Feedback-32k

A few months ago, we published one of our most liked dataset with 13K images based on the @data-is-better-together 's dataset, following Google's research on "Rich Human Feedback for Text-to-Image Generation" (https://arxiv.org/abs/2312.10240). It collected over 1.5M responses from 150K+ participants.

Rapidata/text-2-image-Rich-Human-Feedback

In the examples below, users highlighted words from prompts that were not correctly depicted in the generated images. Higher word scores indicate more frequent issues. If an image captured the prompt accurately, users could select [No_mistakes].

We're continuing to work on large-scale human feedback and model evaluation. If you're working on related research and need large, high-quality annotations, feel free to get in touch: [email protected].

reacted to Aurelien-Morgan's post with 👀 about 6 hours ago

Post

817

The Almighty function-caller

How would you like to build smart GenAi infrastructure ?
Give extensive tools memory to your edge agentic system,
And optimize the resources it takes to run yet a high-performance set of agents ?

We came up with a novel approach to function-calling at scale for smart companies and corporate-grade use-cases.

Read our full-fledged blog article on this here on Hugging Face :
https://huggingface.co./blog/Aurelien-Morgan/the-almighty-function-caller

replied to prithivMLmods's post 3 days ago

I <3 your model cards

reacted to prithivMLmods's post with 🔥 3 days ago

Post

1132

Dropping the domain-specific downstream image classification content moderation models, including the anime image type classification, GeoSceneNet, indoor-outdoor scene classification, and black-and-white vs. colored image classification models, along with the datasets. 🔥

╰┈➤Models :
+ GeoSceneNet : prithivMLmods/Multilabel-GeoSceneNet
+ IndoorOutdoorNet : prithivMLmods/IndoorOutdoorNet
+ B&W vs Colored : prithivMLmods/BnW-vs-Colored-Detection
+ Anime Image Type : prithivMLmods/Anime-Classification-v1.0
+ Multilabel Portrait : prithivMLmods/Multilabel-Portrait-SigLIP2

╰┈➤Datasets :
- GeoSceneNet : prithivMLmods/Multilabel-GeoSceneNet-16K
- IndoorOutdoorNet : prithivMLmods/IndoorOutdoorNet-20K
- BnW vs Colored : prithivMLmods/BnW-vs-Colored-10K
- Multilabel Portrait : prithivMLmods/Multilabel-Portrait-18K

╰┈➤Collections :
> Multilabel Image Classification Datasets : prithivMLmods/multilabel-image-classification-datasets-6809aa64637f45d4c47fa6ca
> Model Collection : prithivMLmods/siglip2-content-filters-models-v2-68053a958c42ef17a3a3f4d1

Note: The anime scene type dataset is not mentioned in the list because it is private and only accessible to members of the DeepGHS organization.

For raw ZIP files or more information about the datasets, visit: https://www.kaggle.com/prithivsakthiur/datasets

1 reply

·

reacted to orasul's post with 👍 3 days ago

Post

2067

hi, it is deki, and now I am open sourced.

An Android AI agent powered by open-source ML model, 𝗱𝗲𝗸𝗶, was fully open-sourced.

It understands what’s on your screen and can perform tasks based on your voice or text commands.

Some examples:
* "Write my friend "some_name" in WhatsApp that I'll be 15 minutes late"
* "Open Twitter in the browser and write a post about something"
* "Read my latest notifications"
* "Write a linkedin post about something"

Currently, it works only on Android — but support for other OS is planned.

The ML and backend codes were also fully open-sourced.

Video prompt example:

"Open linkedin, tap post and write: hi, it is deki, and now I am open sourced. But don't send, just return"

License: GPLv3

You can find other AI agent demos or usage examples, like, code generation or object detection in github.

Github: https://github.com/RasulOs/deki

2 replies

·

reacted to YerbaPage's post with 🔥 4 days ago

Post

1984

Curated list of **Repository-level Code Generation** papers & benchmarks! 🔥

Stay ahead with the latest in:
✅ Repo-level Issue Resolution (SWE-bench, Agents)
✅ Repo-level Code Completion (Repo understanding)
✅ Datasets & Benchmarks

👉 Check it out: https://github.com/YerbaPage/Awesome-Repo-Level-Code-Generation 🔥

reacted to ProCreations's post with 🔥 4 days ago

Post

2086

Come check out my new dataset Mistake to Meaning as an attempt to help smaller models understand user typos better! Hope you guys enjoy it

ProCreations/Mistake-To-Meaning

posted an update 5 days ago

Post

2426

DIA TTS is just amazing - please share your funniest gens (here is mine) 😂
nari-labs/Dia-1.6B

reacted to davidberenstein1957's post with 🚀 5 days ago

Post

2093

🔥 Announcing FLUX-Juiced: The Fastest Image Generation Endpoint (2.6x faster)!

Optimisations are widely applied and can reduce inference time, but their impact on quality often remains unclear, so we decided to challenge the status quo and create our own optimised version of FLUX.1[dev] called FLUX-juiced.

Blog: https://huggingface.co./blog/PrunaAI/flux-fastest-image-generation-endpoint

reacted to AdinaY's post with 🔥 5 days ago

Post

3332

MAGI-1 🪄 the autoregressive diffusion video model, released by Sand AI

sand-ai/MAGI-1

✨ 24B with Apache 2.0
✨ Strong temporal consistency
✨ Benchmark-topping performance

1 reply

·

reacted to shekkizh's post with 👀 5 days ago

Post

1839

Think AGI is just around the corner? Not so fast.

When OpenAI released its Computer-Using Agent (CUA) API, I happened to be playing Wordle 🧩 and thought, why not see how the model handles it?
Spoiler: Wordle turned out to be a surprisingly effective benchmark.
So Romain Cosentino Ph.D. and I dug in and analyzed the results of several hundred runs.

🔑 Takeaways
1️⃣ Even the best computer-using models struggle with simple, context-dependent tasks.
2️⃣ Visual perception and reasoning remain major hurdles for multimodal agents.
3️⃣ Real-world use cases reveal significant gaps between hype and reality. Perception accuracy drops to near zero by the last turn 📉

🔗 Read our arxiv article for more details https://www.arxiv.org/abs/2504.15434

3 replies

·

reacted to ProCreations's post with 🔥 5 days ago

Post

1357

🤖 IntellIte‑Chat v1.0 (Coming Soon)

A compact chat model built for speed, efficiency, and simplicity.

IntellIte‑Chat v1.0 is the debut model in the IntellIte series—a lightweight conversational transformer crafted to be fast, memory-efficient, and easy to work with. It’s designed for devs and enthusiasts who want sharp results without huge resource demands.

No fluff. Just chats.

⸻

🎯 Target Specs
• Pretraining Tokens: 4 billion
• Context Length: 16,384 tokens

⸻

🧠 Parameters & Architecture
• Model Size: ~100M parameters
• Architecture: Modified GPT-NeoX
• Focus: Chat performance with low latency and efficient memory use

⸻

🧃 Support the Build
Every dollar you donate is an extra amount of VRAM I get to work with. 😅
This project is fully independent and entirely self-funded. If you want to help bring it to life:
👉 https://buymeacoffee.com/procreations

⸻

💛 Early Supporters
All early supporters will be credited here when the model launches.
Even the smallest support means the world and pushes this project forward.

Special thanks to:
Maybe you?

⸻

🛠️ Development Status
• Architecture Design: Completed ✅
• Dataset Planning: Completed ✅
• Training Code: Near Completion 🛠️
• Training Launch: Starting Soon ⏳
• Evaluation Setup: Coming soon 🔜
• Final Release: Coming soon 🔜

⸻

Built to chat. Built on a budget. Built to prove what small models can do.

reacted to clem's post with 🔥 5 days ago

Post

3784

Energy is a massive constraint for AI but do you even know what energy your chatGPT convos are using?

We're trying to change this by releasing ChatUI-energy, the first interface where you see in real-time what energy your AI conversations consume. Great work from @jdelavande powered by spaces & TGI, available for a dozen of open-source models like Llama, Mistral, Qwen, Gemma and more.

jdelavande/chat-ui-energy

Should all chat interfaces have this? Just like ingredients have to be shown on products you buy, we need more transparency in AI for users!

3 replies

·

reacted to linoyts's post with 👍 5 days ago

Post

2464

We just shipped HiDream Image LoRA fine-tuning to diffusers🧨

HiDream's sota capabilities (and mit license) bring a lot of potential to explore with fine-tunes 🔥

- more upgrades and features soon!
- code, weights and config example 👇

🧶Yarn art lora: linoyts/HiDream-yarn-art-LoRA
code: https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/README_hidream.md

2 replies

·

reacted to clem's post with 🤗 5 days ago

Post

2835

Just crossed half a million public apps on Hugging Face. A new public app is created every minute these days 🤯🤯🤯

What's your favorite? http://hf.co/spaces

3 replies

·

reacted to bhalajin's post with 🔥 5 days ago

Post

1613

###### CVPR2025 Workshop Challenge Alert ######

🫠 Between deadlines, rebuttals, and existential crises??? "We got you!!!!"

📢 Our new CVPR25 multi-modal challenge is online !!!

🍽️ Dishcovery: VLM MetaFood Challenge!!!! 🍽️

😋🧫 Can your groundbreaking VLM understand the difference between sushi styles, pasta types, or cooking methods from just image + caption pairs?

🌐 Our Task: Match fine-grained images to food descriptions

Challenge Highlights:

📦 400K food image-caption pairs, a little taste to get you started !!!

🔬 Got a SoTA VLM? Come test it on our challenging test sets !!!

🎯 Challenge for everyone! Easy to use SigLIP baseline is provided !!!

🔍 Real, synthetic, noisy data – just like real life - Will your VLM redefine how people track their diets??? ( 🗣️ We believe so!!! )

🔗 Join the challenge: https://www.kaggle.com/competitions/dishcovery-vlm-mtf-cvpr-2025

🗓️ Deadline: Phase I: 4th of May, 2025 - Phase II: 10th of May, 2025

👉 Workshop website: https://sites.google.com/view/cvpr-metafood-2025

#CVPR25 #ComputerVision #CV #Deeplearning #DL #VisionLanguage #VLM #multimodal #FoundationModels

reacted to luigi12345's post with 🔥 5 days ago

Post

2381

SkyReels-V2 INFINITE VIDEO🔥♾️🎬 UNLIMITED duration video generation model by Skywork.

> “Finally is here. An Open-Source model that achieves what we all have waiting for: Infinite Length Videos.’’😮

Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought (2504.05599)

Model: Skywork/SkyReels-V2-T2V-14B-720P

✨ 1.3B & 14B
✨ Generates infinite length videos using Diffusion Forcing with diffusion models + autoregressive methods

Victor Mustar PRO

AI & ML interests

Recent Activity

Organizations

victor's activity