Close Menu
    Facebook X (Twitter) Instagram
    Articles Stock
    • Home
    • Technology
    • AI
    • Pages
      • About us
      • Contact us
      • Disclaimer For Articles Stock
      • Privacy Policy
      • Terms and Conditions
    Facebook X (Twitter) Instagram
    Articles Stock
    AI

    A Coding Information to Excessive-High quality Picture Technology, Management, and Enhancing Utilizing HuggingFace Diffusers

    Naveed AhmadBy Naveed Ahmad21/02/2026Updated:21/02/2026No Comments5 Mins Read
    blog banner23 48


    On this tutorial, we design a sensible image-generation workflow utilizing the Diffusers library. We begin by stabilizing the surroundings, then generate high-quality photographs from textual content prompts utilizing Steady Diffusion with an optimized scheduler. We speed up inference with a LoRA-based latent consistency strategy, information composition with ControlNet underneath edge conditioning, and at last carry out localized edits by way of inpainting. Additionally, we concentrate on real-world strategies that steadiness picture high quality, velocity, and controllability.

    !pip -q uninstall -y pillow Pillow || true
    !pip -q set up --upgrade --force-reinstall "pillow<12.0"
    !pip -q set up --upgrade diffusers transformers speed up safetensors huggingface_hub opencv-python
    
    
    import os, math, random
    import torch
    import numpy as np
    import cv2
    from PIL import Picture, ImageDraw, ImageFilter
    from diffusers import (
       StableDiffusionPipeline,
       StableDiffusionInpaintPipeline,
       ControlNetModel,
       StableDiffusionControlNetPipeline,
       UniPCMultistepScheduler,
    )
    

    We put together a clear and appropriate runtime by resolving dependency conflicts and putting in all required libraries. We guarantee picture processing works reliably by pinning the right Pillow model and loading the Diffusers ecosystem. We additionally import all core modules wanted for era, management, and inpainting workflows.

    def seed_everything(seed=42):
       random.seed(seed)
       np.random.seed(seed)
       torch.manual_seed(seed)
       torch.cuda.manual_seed_all(seed)
    
    
    def to_grid(photographs, cols=2, bg=255):
       if isinstance(photographs, Picture.Picture):
           photographs = [images]
       w, h = photographs[0].dimension
       rows = math.ceil(len(photographs) / cols)
       grid = Picture.new("RGB", (cols*w, rows*h), (bg, bg, bg))
       for i, im in enumerate(photographs):
           grid.paste(im, ((i % cols)*w, (i // cols)*h))
       return grid
    
    
    machine = "cuda" if torch.cuda.is_available() else "cpu"
    dtype = torch.float16 if machine == "cuda" else torch.float32
    print("machine:", machine, "| dtype:", dtype)

    We outline utility capabilities to make sure reproducibility and to arrange visible outputs effectively. We set world random seeds so our generations stay constant throughout runs. We additionally detect the out there {hardware} and configure precision to optimize efficiency on the GPU or CPU.

    seed_everything(7)
    BASE_MODEL = "runwayml/stable-diffusion-v1-5"
    
    
    pipe = StableDiffusionPipeline.from_pretrained(
       BASE_MODEL,
       torch_dtype=dtype,
       safety_checker=None,
    ).to(machine)
    
    
    pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
    
    
    if machine == "cuda":
       pipe.enable_attention_slicing()
       pipe.enable_vae_slicing()
    
    
    immediate = "a cinematic photograph of a futuristic avenue market at nightfall, ultra-detailed, 35mm, volumetric lighting"
    negative_prompt = "blurry, low high quality, deformed, watermark, textual content"
    
    
    img_text = pipe(
       immediate=immediate,
       negative_prompt=negative_prompt,
       num_inference_steps=25,
       guidance_scale=6.5,
       width=768,
       top=512,
    ).photographs[0]

    We initialize the bottom Steady Diffusion pipeline and change to a extra environment friendly UniPC scheduler. We generate a high-quality picture straight from a textual content immediate utilizing fastidiously chosen steerage and backbone settings. This establishes a robust baseline for subsequent enhancements in velocity and management.

    LCM_LORA = "latent-consistency/lcm-lora-sdv1-5"
    pipe.load_lora_weights(LCM_LORA)
    
    
    attempt:
       pipe.fuse_lora()
       lora_fused = True
    besides Exception as e:
       lora_fused = False
       print("LoRA fuse skipped:", e)
    
    
    fast_prompt = "a clear product photograph of a minimal smartwatch on a reflective floor, studio lighting"
    fast_images = []
    for steps in [4, 6, 8]:
       fast_images.append(
           pipe(
               immediate=fast_prompt,
               negative_prompt=negative_prompt,
               num_inference_steps=steps,
               guidance_scale=1.5,
               width=768,
               top=512,
           ).photographs[0]
       )
    
    
    grid_fast = to_grid(fast_images, cols=3)
    print("LoRA fused:", lora_fused)
    
    
    W, H = 768, 512
    format = Picture.new("RGB", (W, H), "white")
    draw = ImageDraw.Draw(format)
    draw.rectangle([40, 80, 340, 460], define="black", width=6)
    draw.ellipse([430, 110, 720, 400], define="black", width=6)
    draw.line([0, 420, W, 420], fill="black", width=5)
    
    
    edges = cv2.Canny(np.array(format), 80, 160)
    edges = np.stack([edges]*3, axis=-1)
    canny_image = Picture.fromarray(edges)
    
    
    CONTROLNET = "lllyasviel/sd-controlnet-canny"
    controlnet = ControlNetModel.from_pretrained(
       CONTROLNET,
       torch_dtype=dtype,
    ).to(machine)
    
    
    cn_pipe = StableDiffusionControlNetPipeline.from_pretrained(
       BASE_MODEL,
       controlnet=controlnet,
       torch_dtype=dtype,
       safety_checker=None,
    ).to(machine)
    
    
    cn_pipe.scheduler = UniPCMultistepScheduler.from_config(cn_pipe.scheduler.config)
    
    
    if machine == "cuda":
       cn_pipe.enable_attention_slicing()
       cn_pipe.enable_vae_slicing()
    
    
    cn_prompt = "a contemporary cafe inside, architectural render, gentle daylight, excessive element"
    img_controlnet = cn_pipe(
       immediate=cn_prompt,
       negative_prompt=negative_prompt,
       picture=canny_image,
       num_inference_steps=25,
       guidance_scale=6.5,
       controlnet_conditioning_scale=1.0,
    ).photographs[0]

    We speed up inference by loading and fusing a LoRA adapter and display quick sampling with only a few diffusion steps. We then assemble a structural conditioning picture and apply ControlNet to information the format of the generated scene. This enables us to protect composition whereas nonetheless benefiting from inventive textual content steerage.

    masks = Picture.new("L", img_controlnet.dimension, 0)
    mask_draw = ImageDraw.Draw(masks)
    mask_draw.rectangle([60, 90, 320, 170], fill=255)
    masks = masks.filter(ImageFilter.GaussianBlur(2))
    
    
    inpaint_pipe = StableDiffusionInpaintPipeline.from_pretrained(
       BASE_MODEL,
       torch_dtype=dtype,
       safety_checker=None,
    ).to(machine)
    
    
    inpaint_pipe.scheduler = UniPCMultistepScheduler.from_config(inpaint_pipe.scheduler.config)
    
    
    if machine == "cuda":
       inpaint_pipe.enable_attention_slicing()
       inpaint_pipe.enable_vae_slicing()
    
    
    inpaint_prompt = "a glowing neon signal that claims 'CAFÉ', cyberpunk type, life like lighting"
    
    
    img_inpaint = inpaint_pipe(
       immediate=inpaint_prompt,
       negative_prompt=negative_prompt,
       picture=img_controlnet,
       mask_image=masks,
       num_inference_steps=30,
       guidance_scale=7.0,
    ).photographs[0]
    
    
    os.makedirs("outputs", exist_ok=True)
    img_text.save("outputs/text2img.png")
    grid_fast.save("outputs/lora_fast_grid.png")
    format.save("outputs/format.png")
    canny_image.save("outputs/canny.png")
    img_controlnet.save("outputs/controlnet.png")
    masks.save("outputs/masks.png")
    img_inpaint.save("outputs/inpaint.png")
    
    
    print("Saved outputs:", sorted(os.listdir("outputs")))
    print("Accomplished.")

    We create a masks to isolate a particular area and apply inpainting to switch solely that a part of the picture. We refine the chosen space utilizing a focused immediate whereas conserving the remaining intact. Lastly, we save all intermediate and last outputs to disk for inspection and reuse.

    In conclusion, we demonstrated how a single Diffusers pipeline can evolve into a versatile, production-ready picture era system. We defined easy methods to transfer from pure text-to-image era to quick sampling, structural management, and focused picture modifying with out altering frameworks or tooling. This tutorial highlights how we are able to mix schedulers, LoRA adapters, ControlNet, and inpainting to create controllable and environment friendly generative pipelines which are simple to increase for extra superior inventive or utilized use circumstances.


    Try the Full Codes here. Additionally, be happy to comply with us on Twitter and don’t neglect to affix our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.




    Source link

    Naveed Ahmad

    Related Posts

    Apple’s iOS 26.4 arrives in public beta with AI music playlists, video podcasts, and extra

    21/02/2026

    India’s Sarvam launches Indus AI chat app as competitors heats up

    21/02/2026

    Anthropic-funded group backs candidate attacked by rival AI tremendous PAC

    21/02/2026
    Leave A Reply Cancel Reply

    Categories
    • AI
    Recent Comments
      Facebook X (Twitter) Instagram Pinterest
      © 2026 ThemeSphere. Designed by ThemeSphere.

      Type above and press Enter to search. Press Esc to cancel.