nyuuzyou's picture

nyuuzyou PRO

nyuuzyou

AI & ML interests

None yet

Recent Activity

Organizations

Social Post Explorers's profile picture AI Starter Pack's profile picture

nyuuzyou's activity

posted an update about 1 hour ago
view post
Post
82
🖼️ SVGFind Icons Dataset - nyuuzyou/svgfind

Collection of 3,655,810 Scalable Vector Graphics (SVG) icons featuring:
- Sourced from SVGFind across diverse categories & styles
- Includes metadata: unique ID, title, tags, data pack, and license information
- Contains minified SVG markup for direct use or processing
- Organized into splits based on license type (Creative Commons: 3,645,444 icons, Public Domain: 10,366 icons)

With over 3.6 million icons, this appears to be the largest SVG dataset on Hugging Face to date. If you're aware of a larger SVG collection, please let me know and I'll update this post with a reference to the largest dataset.
replied to their post about 8 hours ago
replied to their post 1 day ago
replied to their post 1 day ago
view reply

Sorry, replied to the wrong comment/it was deleted. I meant to/it was to reply to another aggressive comment

reacted to DawnC's post with 🔥 1 day ago
view post
Post
3530
I'm excited to introduce VisionScout —an interactive vision tool that makes computer vision both accessible and powerful! 👀🔍

What can VisionScout do right now?
🖼️ Upload any image and detect 80 different object types using YOLOv8.
🔄 Instantly switch between Nano, Medium, and XLarge models depending on your speed vs. accuracy needs.
🎯 Filter specific classes (people, vehicles, animals, etc.) to focus only on what matters to you.
📊 View detailed statistics about detected objects, confidence levels, and spatial distribution.
🎨 Enjoy a clean, intuitive interface with responsive design and enhanced visualizations.

What's next?
I'm working on exciting updates:
- Support for more models
- Video processing and object tracking across frames
- Faster real-time detection
- Improved mobile responsiveness

The goal is to build a complete but user-friendly vision toolkit for both beginners and advanced users.

Try it yourself! 🚀
DawnC/VisionScout

I'd love to hear your feedback , what features would you find most useful? Any specific use cases you'd love to see supported?

Give it a try and let me know your thoughts in the comments! Stay tuned for future updates.

#ComputerVision #ObjectDetection #YOLO #MachineLearning #TechForLife
replied to their post 1 day ago
posted an update 2 days ago
view post
Post
3042
🖼️ SVGRepo Icons Dataset - nyuuzyou/svgrepo

Collection of 217,510 Scalable Vector Graphics (SVG) icons featuring:

- Sourced from SVGRepo.com across diverse categories & styles
- Includes metadata: title, tags, source collection, and specific license
- Contains minified SVG markup for direct use or processing
- Organized into splits based on individual icon license (e.g., MIT, CC0, Apache)
replied to their post 4 days ago
reacted to JingzeShi's post with 🚀🚀 4 days ago
view post
Post
2459
@SmallDoge SmallTalks( SmallDoge/SmallTalks) is a synthetic dataset designed for supervised fine-tuning of language models. The dataset covers a variety of conversational content, including daily conversations, tool usage, Python programming, encyclopedia Q&A, exam problem-solving, logical reasoning, and more. Each task is provided in both English and Chinese versions.
reacted to aiqtech's post with 🔥 8 days ago
view post
Post
4277
🌐 AI Token Visualization Tool with Perfect Multilingual Support

Hello! Today I'm introducing my Token Visualization Tool with comprehensive multilingual support. This web-based application allows you to see how various Large Language Models (LLMs) tokenize text.

aiqtech/LLM-Token-Visual

✨ Key Features

🤖 Multiple LLM Tokenizers: Support for Llama 4, Mistral, Gemma, Deepseek, QWQ, BERT, and more
🔄 Custom Model Support: Use any tokenizer available on HuggingFace
📊 Detailed Token Statistics: Analyze total tokens, unique tokens, compression ratio, and more
🌈 Visual Token Representation: Each token assigned a unique color for visual distinction
📂 File Analysis Support: Upload and analyze large files

🌏 Powerful Multilingual Support
The most significant advantage of this tool is its perfect support for all languages:

📝 Asian languages including Korean, Chinese, and Japanese fully supported
🔤 RTL (right-to-left) languages like Arabic and Hebrew supported
🈺 Special characters and emoji tokenization visualization
🧩 Compare tokenization differences between languages
💬 Mixed multilingual text processing analysis

🚀 How It Works

Select your desired tokenizer model (predefined or HuggingFace model ID)
Input multilingual text or upload a file for analysis
Click 'Analyze Text' to see the tokenized results
Visually understand how the model breaks down various languages with color-coded tokens

💡 Benefits of Multilingual Processing
Understanding multilingual text tokenization patterns helps you:

Optimize prompts that mix multiple languages
Compare token efficiency across languages (e.g., English vs. Korean vs. Chinese token usage)
Predict token usage for internationalization (i18n) applications
Optimize costs for multilingual AI services

🛠️ Technology Stack

Backend: Flask (Python)
Frontend: HTML, CSS, JavaScript (jQuery)
Tokenizers: 🤗 Transformers library
·
reacted to openfree's post with 🔥 9 days ago
view post
Post
4899
🧠 ThinkFlow: The Revolutionary Platform That Gives LLMs the Power to Think 🚀

Hello AI community! We're excited to introduce you to ThinkFlow, an innovative service that transforms how language models solve problems. 🎉
VIDraft/ThinkFlow-llama

✨ What is ThinkFlow?
ThinkFlow is a groundbreaking platform that automatically applies step-by-step reasoning capabilities to existing LLM models without any modifications. It makes complex problem-solving transparent, allowing you to witness the model's thought process in real-time.

🔍 Key Features

Reasoning Without Model Modifications: Add step-by-step reasoning while utilizing existing LLMs as they are ⚙️
Visualized Thinking Process: See exactly how the model analyzes and solves problems 👁️
Before & After Comparison: Compare standard responses with reasoning-enhanced outputs in real-time 📊
Improved Accuracy: Deliver more accurate solutions for complex math and logic problems 📈
Educational Value: Teach students systematic approaches to problem-solving 👨‍🏫
User-Friendly Interface: Intuitive and easy-to-use UI for seamless experience 🖥️

💡 What Problems Can It Solve?
ThinkFlow is particularly effective for various domains including:

Complex mathematical problems 🧮
Logic puzzles 🧩
Questions requiring multi-step reasoning 🤔
Scientific analysis challenges 🔬
Complex decision-making processes 📝

👨‍💻 Technical Details
ThinkFlow is built on the meta-llama/Llama-3.1-8B-Instruct model and uses carefully designed prompt chains to guide the model through step-by-step thinking. Each reasoning step builds upon the results of previous steps, culminating in a comprehensive final answer.

💬 Join Our Community!
If you have questions or suggestions about ThinkFlow, join our Discord community: https://discord.gg/openfreeai
Let's build better AI reasoning experiences together! 💪

#AI #LLM #ReasoningAI #ThinkFlow #HuggingFace #OpenSource #AIEducation
·
posted an update 10 days ago
view post
Post
3546
🦅 SmolLM2-Eagle Collection - nyuuzyou/smollm2-eagle-680263bf97f0c7e6bbe4936b

Collection of fine-tuned bilingual language models featuring:
- Models in three parameter sizes: 135M, 360M, and 1.7B based on HuggingFaceTB's SmolLM2 models
- Both standard and GGUF formats for flexible deployment in llama.cpp and Ollama
- Fine-tuned on nyuuzyou/EagleSFT dataset (536,231 Russian-English QA pairs derived from 739k+ real user queries)
- Experimental Russian language capabilities while maintaining English performance
- Limited Russian capabilities due to SFT-only approach without Russian pre-training
- Environmental impact: ~19.75 kg CO2eq

This collection provides compact models for research on bilingual language capabilities, resource-constrained environments, and educational applications. Not recommended for production use due to experimental nature and inherent limitations. Available under Apache 2.0 license.
  • 1 reply
·
reacted to philschmid's post with 🔥 10 days ago
view post
Post
2221
Gemini 2.5 Flash is here! We excited launch our first hybrid reasoning Gemini model. In Flash 2.5 developer can turn thinking off.

**TL;DR:**
- 🧠 Controllable "Thinking" with thinking budget with up to 24k token
- 🌌 1 Million multimodal input context for text, image, video, audio, and pdf
- 🛠️ Function calling, structured output, google search & code execution.
- 🏦 $0.15 1M input tokens; $0.6 or $3.5 (thinking on) per million output tokens (thinking tokens are billed as output tokens)
- 💡 Knowledge cut of January 2025
- 🚀 Rate limits - Free 10 RPM 500 req/day
- 🏅Outperforms 2.0 Flash on every benchmark

Try it ⬇️
https://aistudio.google.com/prompts/new_chat?model=gemini-2.5-flash-preview-04-17
  • 1 reply
·
posted an update 12 days ago
view post
Post
2927
🦅 EagleSFT Dataset - nyuuzyou/EagleSFT

Collection of 536,231 question-answer pairs featuring:

- Human-posed questions and machine-generated responses for SFT
- Bilingual content in Russian and English with linked IDs
- Derived from 739k+ real user queries, primarily educational topics
- Includes unique IDs and machine-generated category labels

This dataset provides a resource for supervised fine-tuning (SFT) of large language models, cross-lingual research, and understanding model responses to diverse user prompts. Released to the public domain under CC0 1.0 license.
reacted to neph1's post with 👍 12 days ago
replied to merve's post 13 days ago
view reply

Multimodal reasoning is the way to go, especially with open source, congrats to the Moonshot team

posted an update 15 days ago
view post
Post
5583
🇷🇺 Russian Forum Messages Dataset - nyuuzyou/ruforum

Collection of approximately 58 million Russian forum messages featuring:

- Complete message content from Russian online forums spanning 2010-2025
- Comprehensive metadata including unique message IDs and timestamps
- Full text content preserving original user discussions and interactions
- Monolingual dataset focused exclusively on Russian language content

This dataset offers a unique textual archive of Russian online conversations suitable for text generation, sentiment analysis, and language modeling research. Released to the public domain under CC0 1.0 license.
replied to piper2024's post 16 days ago
view reply

Hello, Hugging Face uses git (or backwards-compatible xet) for storing files. When you upload a new version of a file to Git, it doesn't overwrite the old file. Instead, Git stores both versions, with the new version becoming the current one while the old version remains accessible in your history, which is why repositories grow over time.