Introduction to Diffusers by Hugging Face

Author

Ravi Kalia

Published

April 5, 2025

Diffusers by Hugging Face

Diffusers

What is diffusers?

diffusers is a Hugging Face library for working with diffusion models, especially for generative tasks like:

  • Text-to-image generation
  • Inpainting (image repair)
  • Conditional generation (e.g., guided by a sketch or pose)
  • Audio generation

It wraps pretrained diffusion models like Stable Diffusion, Kandinsky, and more into easy-to-use pipelines, and provides tools to train or fine-tune these models with efficient techniques like LoRA.

What are Diffusion Models?

Diffusion models work by learning to denoise random noise step-by-step to generate a clean signal — such as an image or audio waveform.

They simulate a “reverse-noise” process:
Start with noise → gradually clean → generate output.

Key Components

Component Role
UNet Core denoiser — learns how to remove noise
Scheduler Determines how noise is added/removed
VAE Encodes/decodes images in latent space
Text Encoder Converts text prompts into embeddings

These are all pluggable and customizable.

Examples

1. Text-to-Image with Stable Diffusion

from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
ipe = pipe.to("mps")

image = pipe("a futuristic city at night, cyberpunk style").images[0]
image.save("cyberpunk_city.png")
display(image)
Loading pipeline components...: 100%
 7/7 [00:00<00:00, 21.88it/s]
100%
 50/50 [00:53<00:00,  1.05it/s]

2. Image Inpainting

from diffusers import StableDiffusionInpaintPipeline
from PIL import Image
import torch

pipe = StableDiffusionInpaintPipeline.from_pretrained(
    "runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16
)
pipe.to("mps")

image = Image.open("base.png").convert("RGB")
mask = Image.open("mask.png").convert("RGB")

result = pipe(prompt="a black cat with glowing eyes, cute, adorable, disney, pixar, highly detailed, 8k", image=image, mask_image=mask).images[0]
result.save("inpainted.png")
Loading pipeline components...: 100%
 7/7 [00:04<00:00,  1.70it/s]
An error occurred while trying to fetch /Users/ravikalia/.cache/huggingface/hub/models--runwayml--stable-diffusion-inpainting/snapshots/8a4288a76071f7280aedbdb3253bdb9e9d5d84bb/unet: Error no file named diffusion_pytorch_model.safetensors found in directory /Users/ravikalia/.cache/huggingface/hub/models--runwayml--stable-diffusion-inpainting/snapshots/8a4288a76071f7280aedbdb3253bdb9e9d5d84bb/unet.
Defaulting to unsafe serialization. Pass `allow_pickle=False` to raise an error instead.
An error occurred while trying to fetch /Users/ravikalia/.cache/huggingface/hub/models--runwayml--stable-diffusion-inpainting/snapshots/8a4288a76071f7280aedbdb3253bdb9e9d5d84bb/vae: Error no file named diffusion_pytorch_model.safetensors found in directory /Users/ravikalia/.cache/huggingface/hub/models--runwayml--stable-diffusion-inpainting/snapshots/8a4288a76071f7280aedbdb3253bdb9e9d5d84bb/vae.
Defaulting to unsafe serialization. Pass `allow_pickle=False` to raise an error instead.
100%
 50/50 [00:52<00:00,  1.00it/s]

3. Conditional Generation with ControlNet

from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
from diffusers.utils import load_image

controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny")
pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", controlnet=controlnet
).to("mps")

input_image = load_image("canny_edges.png")
image = pipe("A robot dog", image=input_image).images[0]
Loading pipeline components...: 100%
 7/7 [00:00<00:00, 20.36it/s]
100%
 50/50 [00:29<00:00,  3.58it/s]
Potential NSFW content was detected in one or more images. A black image will be returned instead. Try again with a different prompt and/or seed.

4. Training and Fine-tuning

diffusers supports LoRA finetuning with tools like: • accelerate • Trainer class from Hugging Face • PEFT (parameter-efficient fine-tuning)

Summary

Hugging Face diffusers makes it easy to: • Run cutting-edge diffusion models • Use text/image/audio inputs • Fine-tune models with low resources • Build custom generative AI apps

Want to learn more? Explore the docs: https://huggingface.co/docs/diffusers