RL now is where the real action is, it's the engine behind autonomous tech, robots, and the next wave of AI that thinks, moves and solves problems on its own. To stay up to date with what’s happening in RL, we offer some fresh materials on it:
1. "Reinforcement Learning from Human Feedback" by Nathan Lambert -> https://rlhfbook.com/ It's a short introduction to RLHF, explaining instruction tuning, reward modeling, alignment methods, synthetic data, evaluation, and more
2. "A Course in Reinforcement Learning (2nd Edition)" by Dimitri P. Bertsekas -> https://www.mit.edu/~dimitrib/RLbook.html Explains dynamic programming (DP) and RL, diving into rollout algorithms, neural networks, policy learning, etc. It’s packed with solved exercises and real-world examples
4. "Multi-Agent Reinforcement Learning" by Stefano V. Albrecht, Filippos Christianos, and Lukas Schäfer -> https://www.marl-book.com/ Covers models, core ideas of multi-agent RL (MARL) and modern approaches to combining it with deep learning
5. "Reinforcement Learning: A Comprehensive Overview" by Kevin P. Murphy -> https://arxiv.org/pdf/2412.05265 Explains RL and sequential decision making, covering value-based, policy-gradient, model-based, multi-agent RL methods, RL+LLMs, and RL+inference and other topics
If your Space stops working after restarting mainly for the last 5 days (https://discuss.huggingface.co/t/my-space-suddenly-went-offline-the-cpu-cannot-restart/151121/22), try some of following. 1. Add pydantic==2.10.6 to requirements.txt or upgrade Gradio to the latest version. 2. Upgrade PyTorch to 2.2.0 or later (torch>=2.2.0 for Zero GPU space). 3. Fix Transformers to 4.49.0 or earlier (transformers<=4.49.0for spaces using Transformers or Diffusers). 4. Fix huggingface_hub to the old version (huggingface_hub==0.25.2 for if an error like cached_download is not available occurs or inference does not work properly) 5. Specifying WORKDIR in Dockerfile may cause the application to fail to start with error 137. (Docker Spaces, https://discuss.huggingface.co/t/error-code-137-cache-error/152177)
Kimi-Audio 🚀🎧 an OPEN audio foundation model released by Moonshot AI moonshotai/Kimi-Audio-7B-Instruct ✨ 7B ✨ 13M+ hours of pretraining data ✨ Novel hybrid input architecture ✨ Universal audio capabilities (ASR, AQA, AAC, SER, SEC/ASC, end-to-end conversation)
reacted to nyuuzyou's
post with 🔥about 6 hours ago
Collection of 3,655,810 Scalable Vector Graphics (SVG) icons featuring: - Sourced from SVGFind across diverse categories & styles - Includes metadata: unique ID, title, tags, data pack, and license information - Contains minified SVG markup for direct use or processing - Organized into splits based on license type (Creative Commons: 3,645,444 icons, Public Domain: 10,366 icons)
With over 3.6 million icons, this appears to be the largest SVG dataset on Hugging Face to date. If you're aware of a larger SVG collection, please let me know and I'll update this post with a reference to the largest dataset.
Introducing the ONNX model explorer: Browse, search, and visualize neural networks directly in your browser. 🤯 A great tool for anyone studying Machine Learning! We're also releasing the entire dataset of graphs so you can use them in your own projects! 🤗
Finally my first solo preprint is here:) a love letter to the field. Nothing much lol, this is just me trying to finetune my understanding of research behind the recent breakthroughs in reasoning models. It’s a preprint targeting beginners in the field - will eventually make necessary changes later. In the meantime have fun with it:) Download: https://github.com/Jaykef/Jaykef/blob/main/papers/The-Dawn-of-Thinking-Machines.pdf
Bringing out style-intermixing adapters for Flux.Dev, including Aura Glow, Fallen Ink Art, Cardboard Paper Arts, Black & White Expressions, and Glitter Gem Touch. For more details, visit the model card of the LoRA. 🥳
The best dimensions and inference settings for optimal results are as follows: A resolution of 1280 x 832 with a 3:2 aspect ratio is recommended for the best quality, while 1024 x 1024 with a 1:1 aspect ratio serves as the default option. For inference, the recommended number of steps ranges between 30 and 35 to achieve optimal output.
Meta dropped swiss army knives for vision with A2.0 license 👏 > image/video encoders for vision language modelling and spatial understanding (object detection etc) 👏 > The vision LM outperforms InternVL3 and Qwen2.5VL 👏 > They also release gigantic video and image datasets
The authors attempt to come up with single versatile vision encoder to align on diverse set of tasks.
They trained Perception Encoder (PE) Core: a new state-of-the-art family of vision encoders that can be aligned for both vision-language and spatial tasks. For zero-shot image tasks, it outperforms latest sota SigLIP2 👏
> Among fine-tuned ones, first one is PE-Spatial. It's a model to detect bounding boxes, segmentation, depth estimation and it outperforms all other models 😮
> Second one is PLM, Perception Language Model, where they combine PE-Core with Qwen2.5 LM 7B. it outperforms all other models (including InternVL3 which was trained with Qwen2.5LM too!)
The authors release the following checkpoints in sizes base, large and giant:
Authors release following datasets 📑 > PE Video: Gigantic video datasete of 1M videos with 120k expert annotations ⏯️ > PLM-Video and PLM-Image: Human and auto-annotated image and video datasets on region-based tasks > PLM-VideoBench: New video benchmark on MCQA
Dropping the domain-specific downstream image classification content moderation models, including the anime image type classification, GeoSceneNet, indoor-outdoor scene classification, and black-and-white vs. colored image classification models, along with the datasets. 🔥
Gemini 2.5 Flash is here! We excited launch our first hybrid reasoning Gemini model. In Flash 2.5 developer can turn thinking off.
**TL;DR:** - 🧠 Controllable "Thinking" with thinking budget with up to 24k token - 🌌 1 Million multimodal input context for text, image, video, audio, and pdf - 🛠️ Function calling, structured output, google search & code execution. - 🏦 $0.15 1M input tokens; $0.6 or $3.5 (thinking on) per million output tokens (thinking tokens are billed as output tokens) - 💡 Knowledge cut of January 2025 - 🚀 Rate limits - Free 10 RPM 500 req/day - 🏅Outperforms 2.0 Flash on every benchmark
Dropping an entire collection of Style Intermixing Adapters on StrangerZone HF — including Realism, Anime, Sketch, Texture-Rich 3D Experimentals, Automotive Concept Images, and LoRA models based on Flux.1, SD 3.5 Turbo/Large, Stable Diffusion XL 🎨
Try out the demo for Multimodal OCR featuring the implementation of models including RolmOCR and Qwen2VL OCR. The use case showcases image-text-to-text conversion and video understanding support for the RolmOCR model ! 🚀
Loaded some domain-specific downstream image classification content moderation models, which is essentially the practice of monitoring and filtering user-generated content on platforms, based on SigLIP-2 Base Patch16 with newly initialized trainable parameters. 🥠