Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion Paper • 2409.11406 • Published Sep 17, 2024 • 28
MeshFormer: High-Quality Mesh Generation with 3D-Guided Reconstruction Model Paper • 2408.10198 • Published Aug 19, 2024 • 36
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications Paper • 2408.11878 • Published Aug 20, 2024 • 60
ControlNeXt: Powerful and Efficient Control for Image and Video Generation Paper • 2408.06070 • Published Aug 12, 2024 • 54
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers Paper • 2408.06195 • Published Aug 12, 2024 • 73
Learning Task Decomposition to Assist Humans in Competitive Programming Paper • 2406.04604 • Published Jun 7, 2024 • 4
Sketch2Scene: Automatic Generation of Interactive 3D Game Scenes from User's Casual Sketches Paper • 2408.04567 • Published Aug 8, 2024 • 27
CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets Paper • 2406.13897 • Published May 30, 2024 • 12
Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion Paper • 2407.13759 • Published Jul 18, 2024 • 18
POGEMA: A Benchmark Platform for Cooperative Multi-Agent Navigation Paper • 2407.14931 • Published Jul 20, 2024 • 22
OutfitAnyone: Ultra-high Quality Virtual Try-On for Any Clothing and Any Person Paper • 2407.16224 • Published Jul 23, 2024 • 29
DistilDIRE: A Small, Fast, Cheap and Lightweight Diffusion Synthesized Deepfake Detection Paper • 2406.00856 • Published Jun 2, 2024 • 12
OpenDevin: An Open Platform for AI Software Developers as Generalist Agents Paper • 2407.16741 • Published Jul 23, 2024 • 72
Improving 2D Feature Representations by 3D-Aware Fine-Tuning Paper • 2407.20229 • Published Jul 29, 2024 • 7
RelBench: A Benchmark for Deep Learning on Relational Databases Paper • 2407.20060 • Published Jul 29, 2024 • 10
RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation Paper • 2408.02545 • Published Aug 5, 2024 • 38
MeshAnything V2: Artist-Created Mesh Generation With Adjacent Mesh Tokenization Paper • 2408.02555 • Published Aug 5, 2024 • 33
Synthesizing Text-to-SQL Data from Weak and Strong LLMs Paper • 2408.03256 • Published Aug 6, 2024 • 11
Transformer Explainer: Interactive Learning of Text-Generative Models Paper • 2408.04619 • Published Aug 8, 2024 • 163
FruitNeRF: A Unified Neural Radiance Field based Fruit Counting Framework Paper • 2408.06190 • Published Aug 12, 2024 • 18
Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents Paper • 2408.07060 • Published Aug 13, 2024 • 43
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering Paper • 2408.09174 • Published Aug 17, 2024 • 53
LayerPano3D: Layered 3D Panorama for Hyper-Immersive Scene Generation Paper • 2408.13252 • Published Aug 23, 2024 • 27
Training Language Models to Self-Correct via Reinforcement Learning Paper • 2409.12917 • Published Sep 19, 2024 • 140
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution Paper • 2409.12961 • Published Sep 19, 2024 • 26
FlexiTex: Enhancing Texture Generation with Visual Guidance Paper • 2409.12431 • Published Sep 19, 2024 • 13
3DGS-LM: Faster Gaussian-Splatting Optimization with Levenberg-Marquardt Paper • 2409.12892 • Published Sep 19, 2024 • 5
SpaceBlender: Creating Context-Rich Collaborative Spaces Through Generative 3D Scene Blending Paper • 2409.13926 • Published Sep 20, 2024 • 6
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions Paper • 2409.15278 • Published Sep 23, 2024 • 26
Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale Paper • 2409.17115 • Published Sep 25, 2024 • 63
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness Paper • 2409.18125 • Published Sep 26, 2024 • 35
Game4Loc: A UAV Geo-Localization Benchmark from Game Data Paper • 2409.16925 • Published Sep 25, 2024 • 8
DressRecon: Freeform 4D Human Reconstruction from Monocular Video Paper • 2409.20563 • Published Sep 30, 2024 • 9
Posterior-Mean Rectified Flow: Towards Minimum MSE Photo-Realistic Image Restoration Paper • 2410.00418 • Published Oct 1, 2024 • 10
SyntheOcc: Synthesize Geometric-Controlled Street View Images through 3D Semantic MPIs Paper • 2410.00337 • Published Oct 1, 2024 • 11
Flex3D: Feed-Forward 3D Generation With Flexible Reconstruction Model And Input View Curation Paper • 2410.00890 • Published Oct 1, 2024 • 20
Law of the Weakest Link: Cross Capabilities of Large Language Models Paper • 2409.19951 • Published Sep 30, 2024 • 55
From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging Paper • 2410.01215 • Published Oct 2, 2024 • 33
3DGS-DET: Empower 3D Gaussian Splatting with Boundary Guidance and Box-Focused Sampling for 3D Object Detection Paper • 2410.01647 • Published Oct 2, 2024 • 31
Addition is All You Need for Energy-efficient Language Models Paper • 2410.00907 • Published Oct 1, 2024 • 151
MIGA: Mixture-of-Experts with Group Aggregation for Stock Market Prediction Paper • 2410.02241 • Published Oct 3, 2024 • 8
CANVAS: Commonsense-Aware Navigation System for Intuitive Human-Robot Interaction Paper • 2410.01273 • Published Oct 2, 2024 • 10
A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image Generation Paper • 2410.01912 • Published Oct 2, 2024 • 14
DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search Paper • 2410.03864 • Published Oct 4, 2024 • 12
Seeker: Enhancing Exception Handling in Code with LLM-based Multi-Agent Approach Paper • 2410.06949 • Published Oct 9, 2024 • 5
Data Selection via Optimal Control for Language Models Paper • 2410.07064 • Published Oct 9, 2024 • 9
IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation Paper • 2410.07171 • Published Oct 9, 2024 • 44
MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents Paper • 2410.03450 • Published Oct 4, 2024 • 37
Agent S: An Open Agentic Framework that Uses Computers Like a Human Paper • 2410.08164 • Published Oct 10, 2024 • 24
PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers in LLMs Paper • 2410.05265 • Published Oct 7, 2024 • 31
LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models Paper • 2410.09732 • Published Oct 13, 2024 • 56
Toward General Instruction-Following Alignment for Retrieval-Augmented Generation Paper • 2410.09584 • Published Oct 12, 2024 • 49
Animate-X: Universal Character Image Animation with Enhanced Motion Representation Paper • 2410.10306 • Published Oct 14, 2024 • 57
MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks Paper • 2410.10563 • Published Oct 14, 2024 • 39
Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations Paper • 2410.10792 • Published Oct 14, 2024 • 31
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents Paper • 2410.10594 • Published Oct 14, 2024 • 27
Rethinking Data Selection at Scale: Random Selection is Almost All You Need Paper • 2410.09335 • Published Oct 12, 2024 • 17
Mentor-KD: Making Small Language Models Better Multi-step Reasoners Paper • 2410.09037 • Published Oct 11, 2024 • 4
SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights Paper • 2410.09008 • Published Oct 11, 2024 • 17
Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining Paper • 2410.08102 • Published Oct 10, 2024 • 20
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization Paper • 2410.08815 • Published Oct 11, 2024 • 50
From Generalist to Specialist: Adapting Vision Language Models via Task-Specific Visual Instruction Tuning Paper • 2410.06456 • Published Oct 9, 2024 • 38
Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis Paper • 2410.08261 • Published Oct 10, 2024 • 52
Harnessing Webpage UIs for Text-Rich Visual Understanding Paper • 2410.13824 • Published Oct 17, 2024 • 32
MobA: A Two-Level Agent System for Efficient Mobile Task Automation Paper • 2410.13757 • Published Oct 17, 2024 • 33
Remember, Retrieve and Generate: Understanding Infinite Visual Concepts as Your Personalized Assistant Paper • 2410.13360 • Published Oct 17, 2024 • 9
AERO: Softmax-Only LLMs for Efficient Private Inference Paper • 2410.13060 • Published Oct 16, 2024 • 4
MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures Paper • 2410.13754 • Published Oct 17, 2024 • 76
MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models Paper • 2410.13370 • Published Oct 17, 2024 • 38
Diffusion Curriculum: Synthetic-to-Real Generative Curriculum Learning via Image-Guided Diffusion Paper • 2410.13674 • Published Oct 17, 2024 • 17
DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation Paper • 2410.13726 • Published Oct 17, 2024 • 12
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer Paper • 2410.10812 • Published Oct 14, 2024 • 18
AutoTrain: No-code training for state-of-the-art models Paper • 2410.15735 • Published Oct 21, 2024 • 60
PUMA: Empowering Unified MLLM with Multi-granular Visual Generation Paper • 2410.13861 • Published Oct 17, 2024 • 57
SemiEvol: Semi-supervised Fine-tuning for LLM Adaptation Paper • 2410.14745 • Published Oct 17, 2024 • 48
Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages Paper • 2410.16153 • Published Oct 21, 2024 • 45
Meta-Chunking: Learning Efficient Text Segmentation via Logical Perception Paper • 2410.12788 • Published Oct 16, 2024 • 24
DynamicCity: Large-Scale LiDAR Generation from Dynamic Scenes Paper • 2410.18084 • Published Oct 23, 2024 • 14
ARKit LabelMaker: A New Scale for Indoor 3D Scene Understanding Paper • 2410.13924 • Published Oct 17, 2024 • 7
LOGO -- Long cOntext aliGnment via efficient preference Optimization Paper • 2410.18533 • Published Oct 24, 2024 • 44
Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch Paper • 2410.18693 • Published Oct 24, 2024 • 43
Unbounded: A Generative Infinite Game of Character Life Simulation Paper • 2410.18975 • Published Oct 24, 2024 • 38
Distill Visual Chart Reasoning Ability from LLMs to MLLMs Paper • 2410.18798 • Published Oct 24, 2024 • 21
WAFFLE: Multi-Modal Model for Automated Front-End Development Paper • 2410.18362 • Published Oct 24, 2024 • 13
Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning Paper • 2410.19290 • Published Oct 25, 2024 • 10
Continuous Speech Synthesis using per-token Latent Diffusion Paper • 2410.16048 • Published Oct 21, 2024 • 30
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark Paper • 2410.19168 • Published Oct 24, 2024 • 20
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding Paper • 2404.16710 • Published Apr 25, 2024 • 80
Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines Paper • 2410.21220 • Published Oct 28, 2024 • 10
AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant Paper • 2410.18603 • Published Oct 24, 2024 • 33
DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation Paper • 2410.18666 • Published Oct 24, 2024 • 19
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference Paper • 2410.21465 • Published Oct 28, 2024 • 11
CLEAR: Character Unlearning in Textual and Visual Modalities Paper • 2410.18057 • Published Oct 23, 2024 • 211
AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions Paper • 2410.20424 • Published Oct 27, 2024 • 41
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters Paper • 2410.23168 • Published Oct 30, 2024 • 24
NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks Paper • 2410.20650 • Published Oct 28, 2024 • 17
Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders Paper • 2410.22366 • Published Oct 28, 2024 • 83
Language Models can Self-Lengthen to Generate Long Texts Paper • 2410.23933 • Published Oct 31, 2024 • 18
Navigating the Unknown: A Chat-Based Collaborative Interface for Personalized Exploratory Tasks Paper • 2410.24032 • Published Oct 31, 2024 • 10
M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation Paper • 2410.21157 • Published Oct 28, 2024 • 6
HelloMeme: Integrating Spatial Knitting Attentions to Embed High-Level and Fidelity-Rich Conditions in Diffusion Models Paper • 2410.22901 • Published Oct 30, 2024 • 8
CityGaussianV2: Efficient and Geometrically Accurate Reconstruction for Large-Scale Scenes Paper • 2411.00771 • Published Nov 1, 2024 • 9
AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents Paper • 2410.24024 • Published Oct 31, 2024 • 51
Training-free Regional Prompting for Diffusion Transformers Paper • 2411.02395 • Published Nov 4, 2024 • 26
"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization Paper • 2411.02355 • Published Nov 4, 2024 • 51
MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D Paper • 2411.02336 • Published Nov 4, 2024 • 25
GarVerseLOD: High-Fidelity 3D Garment Reconstruction from a Single In-the-Wild Image using a Dataset with Levels of Details Paper • 2411.03047 • Published Nov 5, 2024 • 9
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems Paper • 2411.02959 • Published Nov 5, 2024 • 71
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation Paper • 2411.04709 • Published Nov 5, 2024 • 27
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models Paper • 2411.04905 • Published Nov 7, 2024 • 125
RetrieveGPT: Merging Prompts and Mathematical Models for Enhanced Code-Mixed Information Retrieval Paper • 2411.04752 • Published Nov 7, 2024 • 17
SVDQunat: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models Paper • 2411.05007 • Published Nov 7, 2024 • 19
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding Paper • 2411.04952 • Published Nov 7, 2024 • 30
M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework Paper • 2411.06176 • Published Nov 9, 2024 • 46
Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models Paper • 2411.07126 • Published Nov 11, 2024 • 31
OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision Paper • 2411.07199 • Published Nov 11, 2024 • 50
CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM Paper • 2411.04954 • Published Nov 7, 2024 • 10
LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models Paper • 2411.09595 • Published Nov 14, 2024 • 76
MagicQuill: An Intelligent Interactive Image Editing System Paper • 2411.09703 • Published Nov 14, 2024 • 77
Large Language Models Can Self-Improve in Long-context Reasoning Paper • 2411.08147 • Published Nov 12, 2024 • 67
LLaVA-o1: Let Vision Language Models Reason Step-by-Step Paper • 2411.10440 • Published Nov 15, 2024 • 125
SmoothCache: A Universal Inference Acceleration Technique for Diffusion Transformers Paper • 2411.10510 • Published Nov 15, 2024 • 8
Awaker2.5-VL: Stably Scaling MLLMs with Parameter-Efficient Mixture of Experts Paper • 2411.10669 • Published Nov 16, 2024 • 10
SlimLM: An Efficient Small Language Model for On-Device Document Assistance Paper • 2411.09944 • Published Nov 15, 2024 • 12
BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices Paper • 2411.10640 • Published Nov 16, 2024 • 47
Continuous Speculative Decoding for Autoregressive Image Generation Paper • 2411.11925 • Published Nov 18, 2024 • 16
RedPajama: an Open Dataset for Training Large Language Models Paper • 2411.12372 • Published Nov 19, 2024 • 56
ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text, and Architectural Enhancements Paper • 2411.12044 • Published Nov 18, 2024 • 14
Building Trust: Foundations of Security, Safety and Transparency in AI Paper • 2411.12275 • Published Nov 19, 2024 • 11
SEAGULL: No-reference Image Quality Assessment for Regions of Interest via Vision-Language Instruction Tuning Paper • 2411.10161 • Published Nov 15, 2024 • 9
SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration Paper • 2411.10958 • Published Nov 17, 2024 • 56
SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory Paper • 2411.11922 • Published Nov 18, 2024 • 19
OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs Paper • 2411.14199 • Published Nov 21, 2024 • 32
Patience Is The Key to Large Language Model Reasoning Paper • 2411.13082 • Published Nov 20, 2024 • 7
DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding Paper • 2411.14347 • Published Nov 21, 2024 • 13
MagicDriveDiT: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control Paper • 2411.13807 • Published Nov 21, 2024 • 11
Hymba: A Hybrid-head Architecture for Small Language Models Paper • 2411.13676 • Published Nov 20, 2024 • 45
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions Paper • 2411.14405 • Published Nov 21, 2024 • 62
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training Paper • 2411.15124 • Published Nov 22, 2024 • 63
Large Multi-modal Models Can Interpret Features in Large Multi-modal Models Paper • 2411.14982 • Published Nov 22, 2024 • 17
VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection Paper • 2411.14794 • Published Nov 22, 2024 • 13
MyTimeMachine: Personalized Facial Age Transformation Paper • 2411.14521 • Published Nov 21, 2024 • 22
Adapting Vision Foundation Models for Robust Cloud Segmentation in Remote Sensing Images Paper • 2411.13127 • Published Nov 20, 2024 • 4
Material Anything: Generating Materials for Any 3D Object via Diffusion Paper • 2411.15138 • Published Nov 22, 2024 • 51
ShowUI: One Vision-Language-Action Model for GUI Visual Agent Paper • 2411.17465 • Published Nov 26, 2024 • 87
Learning 3D Representations from Procedural 3D Programs Paper • 2411.17467 • Published Nov 25, 2024 • 9
TEXGen: a Generative Diffusion Model for Mesh Textures Paper • 2411.14740 • Published Nov 22, 2024 • 17
ROICtrl: Boosting Instance Control for Visual Generation Paper • 2411.17949 • Published Nov 27, 2024 • 88
DreamCache: Finetuning-Free Lightweight Personalized Image Generation via Feature Caching Paper • 2411.17786 • Published Nov 26, 2024 • 12
Beyond Examples: High-level Automated Reasoning Paradigm in In-Context Learning via MCTS Paper • 2411.18478 • Published Nov 27, 2024 • 38
GRAPE: Generalizing Robot Policy via Preference Alignment Paper • 2411.19309 • Published Nov 28, 2024 • 48
FAM Diffusion: Frequency and Attention Modulation for High-Resolution Image Generation with Stable Diffusion Paper • 2411.18552 • Published Nov 27, 2024 • 18
Puzzle: Distillation-Based NAS for Inference-Optimized LLMs Paper • 2411.19146 • Published Nov 28, 2024 • 18
MATATA: a weak-supervised MAthematical Tool-Assisted reasoning for Tabular Applications Paper • 2411.18915 • Published Nov 28, 2024 • 8
LLM Teacher-Student Framework for Text Classification With No Manually Annotated Data: A Case Study in IPTC News Topic Classification Paper • 2411.19638 • Published Nov 29, 2024 • 6
Scaling Transformers for Low-Bitrate High-Quality Speech Coding Paper • 2411.19842 • Published Nov 29, 2024 • 12
SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters Paper • 2412.00174 • Published Nov 29, 2024 • 23
The Well: a Large-Scale Collection of Diverse Physics Simulations for Machine Learning Paper • 2412.00568 • Published Nov 30, 2024 • 15
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models Paper • 2412.01822 • Published Dec 2, 2024 • 15
Art-Free Generative Models: Art Creation Without Graphic Art Knowledge Paper • 2412.00176 • Published Nov 29, 2024 • 9
HUGSIM: A Real-Time, Photo-Realistic and Closed-Loop Simulator for Autonomous Driving Paper • 2412.01718 • Published Dec 2, 2024 • 3
LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences Paper • 2412.01292 • Published Dec 2, 2024 • 13
SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance Paper • 2412.02687 • Published Dec 3, 2024 • 114
PaliGemma 2: A Family of Versatile VLMs for Transfer Paper • 2412.03555 • Published Dec 4, 2024 • 135
Distilling Diffusion Models to Efficient 3D LiDAR Scene Completion Paper • 2412.03515 • Published Dec 4, 2024 • 28
NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training Paper • 2412.02030 • Published Dec 2, 2024 • 19
MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation Paper • 2412.03558 • Published Dec 4, 2024 • 19
Mimir: Improving Video Diffusion Models for Precise Text Understanding Paper • 2412.03085 • Published Dec 4, 2024 • 12
Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection Paper • 2412.04455 • Published Dec 5, 2024 • 39
Around the World in 80 Timesteps: A Generative Approach to Global Visual Geolocation Paper • 2412.06781 • Published Dec 9, 2024 • 21
StyleMaster: Stylize Your Video with Artistic Generation and Translation Paper • 2412.07744 • Published Dec 10, 2024 • 19
GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training Paper • 2412.11863 • Published Dec 16, 2024 • 4
TidyBot++: An Open-Source Holonomic Mobile Manipulator for Robot Learning Paper • 2412.10447 • Published Dec 11, 2024 • 5
The Open Source Advantage in Large Language Models (LLMs) Paper • 2412.12004 • Published Dec 16, 2024 • 10
Smaller Language Models Are Better Instruction Evolvers Paper • 2412.11231 • Published Dec 15, 2024 • 29
RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation Paper • 2412.11919 • Published Dec 16, 2024 • 37
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published Dec 13, 2024 • 102
Multi-Dimensional Insights: Benchmarking Real-World Personalization in Large Multimodal Models Paper • 2412.12606 • Published Dec 17, 2024 • 42
Compressed Chain of Thought: Efficient Reasoning Through Dense Representations Paper • 2412.13171 • Published Dec 17, 2024 • 36
VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation Paper • 2412.10704 • Published Dec 14, 2024 • 15
Seeker: Towards Exception Safety Code Generation with Intermediate Language Agents Framework Paper • 2412.11713 • Published Dec 16, 2024 • 5
RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment Paper • 2412.13746 • Published Dec 18, 2024 • 9
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition Paper • 2412.09501 • Published Dec 12, 2024 • 49
Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity Visual Descriptions Paper • 2412.08737 • Published Dec 11, 2024 • 54
FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion Paper • 2412.09626 • Published Dec 12, 2024 • 20
Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse Paper • 2409.11242 • Published Sep 17, 2024 • 7
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper • 2412.13663 • Published Dec 18, 2024 • 149
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks Paper • 2412.14161 • Published Dec 18, 2024 • 52
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces Paper • 2412.14171 • Published Dec 18, 2024 • 24
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN Paper • 2412.13795 • Published Dec 18, 2024 • 20
LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks Paper • 2412.15204 • Published Dec 19, 2024 • 38
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval Paper • 2412.14475 • Published Dec 19, 2024 • 55
Progressive Multimodal Reasoning via Active Retrieval Paper • 2412.14835 • Published Dec 19, 2024 • 74
CAD-Recode: Reverse Engineering CAD Code from Point Clouds Paper • 2412.14042 • Published Dec 18, 2024 • 6
Predicting the Original Appearance of Damaged Historical Documents Paper • 2412.11634 • Published Dec 16, 2024 • 4
AnySat: An Earth Observation Model for Any Resolutions, Scales, and Modalities Paper • 2412.14123 • Published Dec 18, 2024 • 11
SCOPE: Optimizing Key-Value Cache Compression in Long-context Generation Paper • 2412.13649 • Published Dec 18, 2024 • 20
MixLLM: LLM Quantization with Global Mixed-precision between Output-features and Highly-efficient System Design Paper • 2412.14590 • Published Dec 19, 2024 • 14
IDOL: Instant Photorealistic 3D Human Creation from a Single Image Paper • 2412.14963 • Published Dec 19, 2024 • 6
SKETCH: Structured Knowledge Enhanced Text Comprehension for Holistic Retrieval Paper • 2412.15443 • Published Dec 19, 2024 • 10
RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response Paper • 2412.14922 • Published Dec 19, 2024 • 89
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search Paper • 2412.18319 • Published Dec 24, 2024 • 40
MMFactory: A Universal Solution Search Engine for Vision-Language Tasks Paper • 2412.18072 • Published Dec 24, 2024 • 19
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Paper • 2412.05271 • Published Dec 6, 2024 • 155
Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models Paper • 2412.18605 • Published Dec 24, 2024 • 21
Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization Paper • 2412.18525 • Published Dec 24, 2024 • 76
Bringing Objects to Life: 4D generation from 3D objects Paper • 2412.20422 • Published Dec 29, 2024 • 42
Slow Perception: Let's Perceive Geometric Figures Step-by-step Paper • 2412.20631 • Published Dec 30, 2024 • 15
OneKE: A Dockerized Schema-Guided LLM Agent-based Knowledge Extraction System Paper • 2412.20005 • Published Dec 28, 2024 • 18
EXAONE 3.5: Series of Large Language Models for Real-world Use Cases Paper • 2412.04862 • Published Dec 6, 2024 • 51
ProgCo: Program Helps Self-Correction of Large Language Models Paper • 2501.01264 • Published Jan 2 • 27
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper • 2501.00958 • Published Jan 1 • 108
MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models Paper • 2501.00316 • Published Dec 31, 2024 • 22
MapQaTor: A System for Efficient Annotation of Map Query Datasets Paper • 2412.21015 • Published Dec 30, 2024 • 10
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis Paper • 2412.19723 • Published Dec 27, 2024 • 89
BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning Paper • 2501.03226 • Published Jan 6 • 45
Test-time Computing: from System-1 Thinking to System-2 Thinking Paper • 2501.02497 • Published Jan 5 • 46
Personalized Graph-Based Retrieval for Large Language Models Paper • 2501.02157 • Published Jan 4 • 32
Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model Paper • 2501.02790 • Published Jan 6 • 9
Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback Paper • 2501.03916 • Published Jan 7 • 15
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models Paper • 2501.03262 • Published Jan 4 • 99
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos Paper • 2501.04001 • Published Jan 7 • 46
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token Paper • 2501.03895 • Published Jan 7 • 53
SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images Paper • 2501.04689 • Published Jan 8 • 17
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper • 2501.04519 • Published Jan 8 • 277
LLM4SR: A Survey on Large Language Models for Scientific Research Paper • 2501.04306 • Published Jan 8 • 37
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though Paper • 2501.04682 • Published Jan 8 • 98
Multi-task retriever fine-tuning for domain-specific and efficient RAG Paper • 2501.04652 • Published Jan 8 • 10
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs Paper • 2501.06186 • Published Jan 10 • 66
Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains Paper • 2501.05707 • Published Jan 10 • 20
Demystifying Domain-adaptive Post-training for Financial LLMs Paper • 2501.04961 • Published Jan 9 • 11
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction Paper • 2501.06282 • Published Jan 10 • 51
MiniMax-01: Scaling Foundation Models with Lightning Attention Paper • 2501.08313 • Published Jan 14 • 289
MMDocIR: Benchmarking Multi-Modal Retrieval for Long Documents Paper • 2501.08828 • Published Jan 15 • 32
CityDreamer4D: Compositional Generative Model of Unbounded 4D Cities Paper • 2501.08983 • Published Jan 15 • 20
SynthLight: Portrait Relighting with Diffusion Model by Learning to Re-render Synthetic Faces Paper • 2501.09756 • Published Jan 16 • 19
OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking Paper • 2501.09751 • Published Jan 16 • 49
PaSa: An LLM Agent for Comprehensive Academic Paper Search Paper • 2501.10120 • Published Jan 17 • 49
Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks Paper • 2501.11733 • Published Jan 20 • 29
Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement Paper • 2501.12273 • Published Jan 21 • 14
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22 • 391
Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments Paper • 2501.10893 • Published Jan 18 • 26
Multiview Equivariance Improves 3D Correspondence Understanding with Minimal Feature Finetuning Paper • 2411.19458 • Published Nov 29, 2024 • 6
Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch Paper • 2501.18512 • Published Jan 30 • 30
Zero-Shot Novel View and Depth Synthesis with Multi-View Geometric Diffusion Paper • 2501.18804 • Published Jan 30 • 5
Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2 Paper • 2502.03544 • Published Feb 5 • 44
PlotGen: Multi-Agent LLM-based Scientific Data Visualization via Multimodal Feedback Paper • 2502.00988 • Published Feb 3 • 6
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4 • 229
LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer Paper • 2502.01105 • Published Feb 3 • 20
TwinMarket: A Scalable Behavioral and Social Simulation for Financial Markets Paper • 2502.01506 • Published Feb 3 • 38
Riddle Me This! Stealthy Membership Inference for Retrieval-Augmented Generation Paper • 2502.00306 • Published Feb 1 • 5
COCONut-PanCap: Joint Panoptic Segmentation and Grounded Captions for Fine-Grained Understanding and Generation Paper • 2502.02589 • Published Feb 4 • 10
AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding Paper • 2502.01341 • Published Feb 3 • 39
ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference Paper • 2502.00299 • Published Feb 1 • 2
FastKV: KV Cache Compression for Fast Long-Context Processing with Token-Selective Propagation Paper • 2502.01068 • Published Feb 3 • 16
Text-to-CAD Generation Through Infusing Visual Feedback in Large Language Models Paper • 2501.19054 • Published Jan 31 • 10
SafeRAG: Benchmarking Security in Retrieval-Augmented Generation of Large Language Model Paper • 2501.18636 • Published Jan 28 • 29
DeepRAG: Thinking to Retrieval Step by Step for Large Language Models Paper • 2502.01142 • Published Feb 3 • 24
QuEST: Stable Training of LLMs with 1-Bit Weights and Activations Paper • 2502.05003 • Published Feb 7 • 44
CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference Paper • 2502.04416 • Published Feb 6 • 12
CodeSteer: Symbolic-Augmented Language Models via Code/Text Guidance Paper • 2502.04350 • Published Feb 4 • 11
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling Paper • 2502.06703 • Published Feb 10 • 151
CAD-Editor: A Locate-then-Infill Framework with Automated Training Data Synthesis for Text-Based CAD Editing Paper • 2502.03997 • Published Feb 6 • 9
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters! Paper • 2502.07374 • Published Feb 11 • 39
Towards Trustworthy Retrieval Augmented Generation for Large Language Models: A Survey Paper • 2502.06872 • Published Feb 8 • 8
WorldGUI: Dynamic Testing for Comprehensive Desktop GUI Automation Paper • 2502.08047 • Published Feb 12 • 27
BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language Models Paper • 2502.07346 • Published Feb 11 • 54
Fino1: On the Transferability of Reasoning Enhanced LLMs to Finance Paper • 2502.08127 • Published Feb 12 • 56
NatureLM: Deciphering the Language of Nature for Scientific Discovery Paper • 2502.07527 • Published Feb 11 • 20
Scaling Pre-training to One Hundred Billion Data for Vision Language Models Paper • 2502.07617 • Published Feb 11 • 29
Retrieval-augmented Large Language Models for Financial Time Series Forecasting Paper • 2502.05878 • Published Feb 9 • 41
3CAD: A Large-Scale Real-World 3C Product Dataset for Unsupervised Anomaly Paper • 2502.05761 • Published Feb 9 • 6
The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding Paper • 2502.08946 • Published Feb 13 • 195
TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models Paper • 2502.06608 • Published Feb 10 • 41
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU Paper • 2502.08910 • Published Feb 13 • 149
RealSyn: An Effective and Scalable Multimodal Interleaved Document Transformation Paradigm Paper • 2502.12513 • Published Feb 18 • 16
MLGym: A New Framework and Benchmark for Advancing AI Research Agents Paper • 2502.14499 • Published Feb 20 • 192
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published Feb 20 • 143
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines Paper • 2502.14739 • Published Feb 20 • 103
SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation Paper • 2502.13128 • Published Feb 18 • 42
NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule Generation Paper • 2502.12638 • Published Feb 18 • 8
Reducing Hallucinations in Language Model-based SPARQL Query Generation Using Post-Generation Memory Retrieval Paper • 2502.13369 • Published Feb 19 • 2
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning Paper • 2502.14768 • Published Feb 20 • 48
LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models Paper • 2502.14834 • Published Feb 20 • 24
S^2R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning Paper • 2502.12853 • Published Feb 18 • 29
Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation Paper • 2502.14846 • Published Feb 20 • 13
NAVIG: Natural Language-guided Analysis with Vision Language Models for Image Geo-localization Paper • 2502.14638 • Published Feb 20 • 11
Geolocation with Real Human Gameplay Data: A Large-Scale Dataset and Human-Like Reasoning Framework Paper • 2502.13759 • Published Feb 19 • 4
InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning Paper • 2502.11573 • Published Feb 17 • 8
Slamming: Training a Speech Language Model on One GPU in a Day Paper • 2502.15814 • Published Feb 19 • 69
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers Paper • 2502.15007 • Published Feb 20 • 175
Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator Paper • 2502.19204 • Published Feb 26 • 11
AISafetyLab: A Comprehensive Framework for AI Safety Evaluation and Improvement Paper • 2502.16776 • Published Feb 24 • 6
SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference Paper • 2502.18137 • Published Feb 25 • 57
Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam Paper • 2502.17055 • Published Feb 24 • 18
Towards Fully-Automated Materials Discovery via Large-Scale Synthesis Dataset and Expert-Level LLM-as-a-Judge Paper • 2502.16457 • Published Feb 23 • 11
Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective Paper • 2502.17262 • Published Feb 24 • 20
ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents Paper • 2502.18017 • Published Feb 25 • 20
TeleRAG: Efficient Retrieval-Augmented Generation Inference with Lookahead Retrieval Paper • 2502.20969 • Published Feb 28 • 11
Babel: Open Multilingual Large Language Models Serving Over 90% of Global Speakers Paper • 2503.00865 • Published Mar 2 • 64
Unified Reward Model for Multimodal Understanding and Generation Paper • 2503.05236 • Published Mar 7 • 123
Feature-Level Insights into Artificial Text Detection with Sparse Autoencoders Paper • 2503.03601 • Published Mar 5 • 232
Beyond RAG: Task-Aware KV Cache Compression for Comprehensive Knowledge Reasoning Paper • 2503.04973 • Published Mar 6 • 24
EuroBERT: Scaling Multilingual Encoders for European Languages Paper • 2503.05500 • Published Mar 7 • 78
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models Paper • 2503.09573 • Published Mar 12 • 71
Monte Carlo Diffusion for Generalizable Learning-Based RANSAC Paper • 2503.09410 • Published Mar 12 • 1
Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Enhancement Protocol Paper • 2503.05860 • Published Mar 7 • 10
LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization Paper • 2503.08619 • Published Mar 11 • 20
More Documents, Same Length: Isolating the Challenge of Multiple Documents in RAG Paper • 2503.04388 • Published Mar 6 • 16
SEAP: Training-free Sparse Expert Activation Pruning Unlock the Brainpower of Large Language Models Paper • 2503.07605 • Published Mar 10 • 68
TinyR1-32B-Preview: Boosting Accuracy with Branch-Merge Distillation Paper • 2503.04872 • Published Mar 6 • 15
Infinite Mobility: Scalable High-Fidelity Synthesis of Articulated Objects via Procedural Generation Paper • 2503.13424 • Published Mar 17 • 30
FlexWorld: Progressively Expanding 3D Scenes for Flexiable-View Synthesis Paper • 2503.13265 • Published Mar 17 • 15
MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs Paper • 2503.13111 • Published Mar 17 • 7
BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing Paper • 2503.13434 • Published Mar 17 • 26
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization Paper • 2503.12937 • Published Mar 17 • 28
DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models Paper • 2503.12885 • Published Mar 17 • 44
ReCamMaster: Camera-Controlled Generative Rendering from A Single Video Paper • 2503.11647 • Published Mar 14 • 140
PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity Paper • 2503.07677 • Published Mar 10 • 84
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion Paper • 2503.11576 • Published Mar 14 • 99
TreeMeshGPT: Artistic Mesh Generation with Autoregressive Tree Sequencing Paper • 2503.11629 • Published Mar 14 • 6
Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement Learning Paper • 2503.16252 • Published Mar 20 • 27
NuiScene: Exploring Efficient Generation of Unbounded Outdoor Scenes Paper • 2503.16375 • Published Mar 20 • 9
Sonata: Self-Supervised Learning of Reliable Point Representations Paper • 2503.16429 • Published Mar 20 • 11
V-Seek: Accelerating LLM Reasoning on Open-hardware Server-class RISC-V Platforms Paper • 2503.17422 • Published Mar 21 • 6
Typed-RAG: Type-aware Multi-Aspect Decomposition for Non-Factoid Question Answering Paper • 2503.15879 • Published Mar 20 • 6
Large Language Model Agent: A Survey on Methodology, Applications and Challenges Paper • 2503.21460 • Published Mar 27 • 77
LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis Paper • 2503.21749 • Published Mar 27 • 26
ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation Paper • 2503.21729 • Published Mar 27 • 28
Lumina-Image 2.0: A Unified and Efficient Image Generative Framework Paper • 2503.21758 • Published Mar 27 • 21
BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation Paper • 2503.20672 • Published Mar 26 • 14
MCTS-RAG: Enhancing Retrieval-Augmented Generation with Monte Carlo Tree Search Paper • 2503.20757 • Published Mar 26 • 10
OpenCity3D: What do Vision-Language Models know about Urban Environments? Paper • 2503.16776 • Published Mar 21 • 3
Feather-SQL: A Lightweight NL2SQL Framework with Dual-Model Collaboration Paradigm for Small Language Models Paper • 2503.17811 • Published Mar 22 • 13
SparseFlex: High-Resolution and Arbitrary-Topology 3D Shape Modeling Paper • 2503.21732 • Published Mar 27 • 8
Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for Large Language Models Paper • 2503.24377 • Published 28 days ago • 17
TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes Paper • 2503.23461 • Published 29 days ago • 94
Hi3DGen: High-fidelity 3D Geometry Generation from Images via Normal Bridging Paper • 2503.22236 • Published Mar 28 • 11
Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency Paper • 2503.20785 • Published Mar 26 • 21
AdaptiVocab: Enhancing LLM Efficiency in Focused Domains through Lightweight Vocabulary Adaptation Paper • 2503.19693 • Published Mar 25 • 75
MeshCraft: Exploring Efficient and Controllable Mesh Generation with Flow-based DiTs Paper • 2503.23022 • Published about 1 month ago • 7
Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL Paper • 2503.23157 • Published about 1 month ago • 8
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems Paper • 2504.01990 • Published 28 days ago • 268
OmniSVG: A Unified Scalable Vector Graphics Generation Model Paper • 2504.06263 • Published 20 days ago • 155
Hogwild! Inference: Parallel LLM Generation via Concurrent Attention Paper • 2504.06261 • Published 20 days ago • 104
HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference Paper • 2504.05897 • Published 20 days ago • 14
Mamba as a Bridge: Where Vision Foundation Models Meet Vision Language Models for Domain-Generalized Semantic Segmentation Paper • 2504.03193 • Published 25 days ago • 5
SynWorld: Virtual Scenario Synthesis for Agentic Action Knowledge Refinement Paper • 2504.03561 • Published 24 days ago • 18
SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning Paper • 2504.08600 • Published 17 days ago • 26
Do PhD-level LLMs Truly Grasp Elementary Addition? Probing Rule Learning vs. Memorization in Large Language Models Paper • 2504.05262 • Published 21 days ago • 11
Visual Chronicles: Using Multimodal LLMs to Analyze Massive Collections of Images Paper • 2504.08727 • Published 17 days ago • 11
VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model Paper • 2504.07615 • Published 18 days ago • 31
BlockGaussian: Efficient Large-Scale Scene Novel View Synthesis via Adaptive Block-Based Gaussian Splatting Paper • 2504.09048 • Published 17 days ago • 7
DataDecide: How to Predict Best Pretraining Data with Small Experiments Paper • 2504.11393 • Published 13 days ago • 17
Diffusion Distillation With Direct Preference Optimization For Efficient 3D LiDAR Scene Completion Paper • 2504.11447 • Published 13 days ago • 5
AlayaDB: The Data Foundation for Efficient and Effective Long-context LLM Inference Paper • 2504.10326 • Published 14 days ago • 25
A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis Paper • 2504.12322 • Published 18 days ago • 27
PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding Paper • 2504.13180 • Published 11 days ago • 17
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float Paper • 2504.11651 • Published 13 days ago • 28
Retrieval-Augmented Generation with Conflicting Evidence Paper • 2504.13079 • Published 11 days ago • 7
MIG: Automatic Data Selection for Instruction Tuning by Maximizing Information Gain in Semantic Space Paper • 2504.13835 • Published 10 days ago • 36
NodeRAG: Structuring Graph-based RAG with Heterogeneous Nodes Paper • 2504.11544 • Published 13 days ago • 41
AerialMegaDepth: Learning Aerial-Ground Reconstruction and View Synthesis Paper • 2504.13157 • Published 11 days ago • 20
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning Paper • 2504.17192 • Published 5 days ago • 94
Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs Paper • 2504.17432 • Published 4 days ago • 37
Large Language Models can accomplish Business Process Management Tasks Paper • 2307.09923 • Published Jul 19, 2023
DataLab: A Unifed Platform for LLM-Powered Business Intelligence Paper • 2412.02205 • Published Dec 3, 2024
Mask2Map: Vectorized HD Map Construction Using Bird's Eye View Segmentation Masks Paper • 2407.13517 • Published Jul 18, 2024
Layered Image Vectorization via Semantic Simplification Paper • 2406.05404 • Published Jun 8, 2024 • 3
MARVEL: Raster Manga Vectorization via Primitive-wise Deep Reinforcement Learning Paper • 2110.04830 • Published Oct 10, 2021
Retrieval Augmented Structured Generation: Business Document Information Extraction As Tool Use Paper • 2405.20245 • Published May 30, 2024 • 1