Marigold Intrinsic Image Decomposition (IID) Lighting v1-1 Model Card

Image IID diffusers Github Website arXiv Social License

This is a model card for the marigold-iid-lighting-v1-1 model for single-image Intrinsic Image Decomposition (IID). The model is fine-tuned from the stable-diffusion-2 model as described in a follow-up of our CVPR'2024 paper titled "Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation".

This model type (lighting) is trained to perform HyperSim decomposition into Albedo, Diffuse shading, and Non-diffuse residual. This decomposition aligns with the intrinsic residual model I=Aโˆ—S+RI = A*S+R, where the image II is composed of a three-channel albedo AA, a three-channel diffuse shading component SS (representing illumination color), and an additive three-channel residual term RR capturing non-diffuse effects. The input is in the sRGB color space, while all outputs are in linear space. For an alternative model type (appearance) that performs decomposition into Albedo, Roughness, and Metallicity, click here.

  • Play with the interactive Hugging Face Spaces demo: check out how the model works with example images or upload your own.
  • Use it with diffusers to compute the results with a few lines of code.
  • Get to the bottom of things with our official codebase.

Model Details

  • Developed by: Bingxin Ke, Kevin Qu, Tianfu Wang, Nando Metzger, Shengyu Huang, Bo Li, Anton Obukhov, Konrad Schindler.
  • Model type: Generative latent diffusion-based intrinsic image decomposition (lighting: albedo, diffuse shading, and non-diffuse residual) from a single image.
  • Language: English.
  • License: CreativeML Open RAIL++-M License.
  • Model Description: This model can be used to generate an estimated intrinsic image decomposition of an input image.
    • Resolution: Even though any resolution can be processed, the model inherits the base diffusion model's effective resolution of roughly 768 pixels. This means that for optimal predictions, any larger input image should be resized to make the longer side 768 pixels before feeding it into the model.
    • Steps and scheduler: This model was designed for usage with DDIM scheduler and between 1 and 50 denoising steps.
    • Outputs:
      • Albedo: The predicted values are between 0 and 1, linear space.
      • Diffuse shading: The predicted values are between 0 and 1, linear space.
      • Non-diffuse residual: The predicted values are between 0 and 1, linear space.
      • Uncertainty maps: Produced for each modality only when multiple predictions are ensembled with ensemble size larger than 2.
  • Resources for more information: Project Website, Paper, Code.
  • Cite as:

Placeholder for the citation block of the follow-up paper

@InProceedings{ke2023repurposing,
      title={Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation},
      author={Bingxin Ke and Anton Obukhov and Shengyu Huang and Nando Metzger and Rodrigo Caye Daudt and Konrad Schindler},
      booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
      year={2024}
}
Downloads last month
0
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support