VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning Paper • 2504.07960 • Published 18 days ago • 46
SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published 21 days ago • 176
LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis Paper • 2503.21749 • Published Mar 27 • 26
Lumina-Image 2.0: A Unified and Efficient Image Generative Framework Paper • 2503.21758 • Published Mar 27 • 21
Lumina-Video: Efficient and Flexible Video Generation with Multi-scale Next-DiT Paper • 2502.06782 • Published Feb 10 • 14
K-LoRA: Unlocking Training-Free Fusion of Any Subject and Style LoRAs Paper • 2502.18461 • Published Feb 25 • 15
Hyperstroke: A Novel High-quality Stroke Representation for Assistive Artistic Drawing Paper • 2408.09348 • Published Aug 18, 2024 • 1
PhenDiff: Revealing Invisible Phenotypes with Conditional Diffusion Models Paper • 2312.08290 • Published Dec 13, 2023 • 2
IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models Paper • 2501.13920 • Published Jan 23 • 17
Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance Paper • 2403.17377 • Published Mar 26, 2024 • 2
World-consistent Video Diffusion with Explicit 3D Modeling Paper • 2412.01821 • Published Dec 2, 2024 • 4
Pathways on the Image Manifold: Image Editing via Video Generation Paper • 2411.16819 • Published Nov 25, 2024 • 37
Fake it to make it: Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis Paper • 2404.19622 • Published Apr 30, 2024 • 2
MM-Conv: A Multi-modal Conversational Dataset for Virtual Humans Paper • 2410.00253 • Published Sep 30, 2024
Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models Paper • 2408.15518 • Published Aug 28, 2024 • 43
Smoothed Energy Guidance: Guiding Diffusion Models with Reduced Energy Curvature of Attention Paper • 2408.00760 • Published Aug 1, 2024 • 7