controlnet-DharunSN/model_out
These are controlnet weights trained on stabilityai/stable-diffusion-2-1-base with new type of conditioning. You can find some example images below. NOTE: This is a low precision model so image quality and charactersitics may be lagging
prompt: a white hoodie shirt on a size four model in a beach setting
DensePose Condition:
Image Generated:
prompt: a green jumper shirt and white pants with a green overcoat on top
Image Generated:
Intended uses & limitations
How to use
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
from transformers import CLIPTokenizer, CLIPTextModel
from PIL import Image
import torch
# Load pre-trained components
controlnet = ControlNetModel.from_pretrained("path/to/your/controlnet-model")
tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14")
text_encoder = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14")
pipe = StableDiffusionControlNetPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base",
controlnet=controlnet,
tokenizer=tokenizer,
text_encoder=text_encoder,
torch_dtype=torch.float16,
).to("cuda")
# Example: Generate an image of a red jacket on a model pose
pose_image = Image.open("path/to/pose.png").convert("RGB").resize((512, 512))
prompt = "a red leather jacket with silver zippers, worn on a casual street-style model"
image = pipe(prompt=prompt, control_image=pose_image, num_inference_steps=30).images[0]
image.save("output.png")
Limitations and bias
Pose Alignment Errors: The model may fail to accurately align garments with extremely dynamic or occluded body poses.
Fabric Simulation: Lacks realistic physical behavior of fabrics like wrinkles, folds, or flowing movement.
Resolution Constraints: Default generation is 512ร512. Upscaling may lose fidelity unless further post-processing is used.
Model Drift in Edge Cases: Struggles with rare combinations of garment types and unconventional descriptions.
Dataset Bias: DeepFashion and related datasets often overrepresent certain body types, genders, and skin tones, which can skew model generalization.
Style Bias: High fashion or Western clothing styles are more common in training data, leading to poorer performance for traditional or niche designs.
Recommendations: Augment training data with underrepresented demographics and clothing styles
Use reinforcement or adversarial training to improve physics realism and fairness
Apply domain adaptation for traditional clothing categories
Training details
Training Data:
DeepFashion: 400k images with clothing types, poses, and attributes (https://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html)
DensePose / OpenPose: For extracting skeletal keypoints and human pose conditioning maps
Text Descriptions: Generated or curated captions describing clothing, structure, and materials
Fashion-Design-10K, Fabric-Texture-2K, Fashion-Model-5K: Supporting datasets for garment diversity, material realism, and body types
Training Setup:
Hardware: NVIDIA L4 GPU (32GB VRAM)
Precision: Mixed precision (torch.float16)
Optimizer: 8-bit AdamW with learning rate 1e-5
Gradient Accumulation: 8โ16 steps to support large batch emulation
Epochs: 1โ3 depending on overfitting trends
Batch Size: Effective size of 32 (with accumulation)
Training Duration: Approx. 12โ15 hours for 1 full epoch on 400k images
- Downloads last month
- 17
Model tree for DharunSN/model_out
Base model
stabilityai/stable-diffusion-2-1-base