Kimi-Audio 🚀🎧 an OPEN audio foundation model released by Moonshot AI moonshotai/Kimi-Audio-7B-Instruct ✨ 7B ✨ 13M+ hours of pretraining data ✨ Novel hybrid input architecture ✨ Universal audio capabilities (ASR, AQA, AAC, SER, SEC/ASC, end-to-end conversation)
reacted to jasoncorkill's
post with 🚀about 6 hours ago
🚀 Building Better Evaluations: 32K Image Annotations Now Available
Today, we're releasing an expanded version: 32K images annotated with 3.7M responses from over 300K individuals which was completed in under two weeks using the Rapidata Python API.
A few months ago, we published one of our most liked dataset with 13K images based on the @data-is-better-together's dataset, following Google's research on "Rich Human Feedback for Text-to-Image Generation" (https://arxiv.org/abs/2312.10240). It collected over 1.5M responses from 150K+ participants.
In the examples below, users highlighted words from prompts that were not correctly depicted in the generated images. Higher word scores indicate more frequent issues. If an image captured the prompt accurately, users could select [No_mistakes].
We're continuing to work on large-scale human feedback and model evaluation. If you're working on related research and need large, high-quality annotations, feel free to get in touch: [email protected].
How would you like to build smart GenAi infrastructure ? Give extensive tools memory to your edge agentic system, And optimize the resources it takes to run yet a high-performance set of agents ?
We came up with a novel approach to function-calling at scale for smart companies and corporate-grade use-cases.
Dropping the domain-specific downstream image classification content moderation models, including the anime image type classification, GeoSceneNet, indoor-outdoor scene classification, and black-and-white vs. colored image classification models, along with the datasets. 🔥
An Android AI agent powered by open-source ML model, 𝗱𝗲𝗸𝗶, was fully open-sourced.
It understands what’s on your screen and can perform tasks based on your voice or text commands.
Some examples: * "Write my friend "some_name" in WhatsApp that I'll be 15 minutes late" * "Open Twitter in the browser and write a post about something" * "Read my latest notifications" * "Write a linkedin post about something"
Currently, it works only on Android — but support for other OS is planned.
The ML and backend codes were also fully open-sourced.
Video prompt example:
"Open linkedin, tap post and write: hi, it is deki, and now I am open sourced. But don't send, just return"
License: GPLv3
You can find other AI agent demos or usage examples, like, code generation or object detection in github.
🔥 Announcing FLUX-Juiced: The Fastest Image Generation Endpoint (2.6x faster)!
Optimisations are widely applied and can reduce inference time, but their impact on quality often remains unclear, so we decided to challenge the status quo and create our own optimised version of FLUX.1[dev] called FLUX-juiced.
When OpenAI released its Computer-Using Agent (CUA) API, I happened to be playing Wordle 🧩 and thought, why not see how the model handles it? Spoiler: Wordle turned out to be a surprisingly effective benchmark. So Romain Cosentino Ph.D. and I dug in and analyzed the results of several hundred runs.
🔑 Takeaways 1️⃣ Even the best computer-using models struggle with simple, context-dependent tasks. 2️⃣ Visual perception and reasoning remain major hurdles for multimodal agents. 3️⃣ Real-world use cases reveal significant gaps between hype and reality. Perception accuracy drops to near zero by the last turn 📉
A compact chat model built for speed, efficiency, and simplicity.
IntellIte‑Chat v1.0 is the debut model in the IntellIte series—a lightweight conversational transformer crafted to be fast, memory-efficient, and easy to work with. It’s designed for devs and enthusiasts who want sharp results without huge resource demands.
🧠 Parameters & Architecture • Model Size: ~100M parameters • Architecture: Modified GPT-NeoX • Focus: Chat performance with low latency and efficient memory use
⸻
🧃 Support the Build Every dollar you donate is an extra amount of VRAM I get to work with. 😅 This project is fully independent and entirely self-funded. If you want to help bring it to life: 👉 https://buymeacoffee.com/procreations
⸻
💛 Early Supporters All early supporters will be credited here when the model launches. Even the smallest support means the world and pushes this project forward.
Special thanks to: Maybe you?
⸻
🛠️ Development Status • Architecture Design: Completed ✅ • Dataset Planning: Completed ✅ • Training Code: Near Completion 🛠️ • Training Launch: Starting Soon ⏳ • Evaluation Setup: Coming soon 🔜 • Final Release: Coming soon 🔜
⸻
Built to chat. Built on a budget. Built to prove what small models can do.
Energy is a massive constraint for AI but do you even know what energy your chatGPT convos are using?
We're trying to change this by releasing ChatUI-energy, the first interface where you see in real-time what energy your AI conversations consume. Great work from @jdelavande powered by spaces & TGI, available for a dozen of open-source models like Llama, Mistral, Qwen, Gemma and more.