CS180: Project5

Fun With Diffusion Models!

by Shenghan Zhou

Part A: The Power of Diffusion Models!

Part 0: Setup

In this section, I used three text prompts (“an oil painting of a snowy mountain village,” “a man wearing a hat,” and “a rocket ship”) to generate images at two different numbers of inference steps (20 and 30).
The random seed I used is 180

Image 1
20 steps
Image 2
30 steps

Part 1: Sampling Loops

1.1 Implementing the Forward Process

I use equation (A.2) to create a noisy image

Image 1
Berkeley Campanile
Image 1
noise level 250
Image 2
noise level 500
Image 3
noise level 750

1.2 Classical Denoising

For noise level 250, the best kernel size I set is 5 and alpha is 2. For noise level 500, the best kernel size I set is 5 and alpha is 3. For noise level 750, the best kernel size I set is 5 and alpha is 4.

Image 1
Noise level 250
Image 1
Noise level 500
Image 1
Noise level 750

1.3 One-Step Denoising

In this part, I use unet to denoise image by predicting the noise

Image 1
One-Step Denoised Campanile at t=250
Image 1
One-Step Denoised Campanile at t=500
Image 1
One-Step Denoised Campanile at t=750

1.4 Iterative Denoising

In one step denoising, we cannot completely get a clear image in most of time. We can utilizs diffusion models to denoise iteratively

the noisy image every 5th loop of denoising
Image 1
Final results

Image 1

1.5 Diffusion Model Sampling

I use the diffusion model to generate images from random noise and set i_start = 0.

Final results
Image 1

1.6 Classifier-Free Guidance (CFG)

By CFG, we can generate images that adhere more or less closely to the text prompts we provide

Final results (γ = 7)
Image 1

1.7 Image-to-image Translation

By following the algorithm SDEdit, we can create a new image which is similar to the original image. I use 'a high quality photo' as the text prompst in the following 3 results

Campanile
Image 1
Cartoon capybara
Image 1
Landscape
Image 1

1.7.1 Editing Hand-Drawn and Web Images

The algorithm is effective at projecting non-realistic images onto the natural image manifold. In the following experiment, I will use one image sourced from the web and two hand-drawn images.

The web image of avocado

Image 1

Hand-written images

a house and the sun
Image 1
a chair
Image 1

1.7.2 Inpainting

We can use a similar approach to implement inpainting, as described in the RePaint paper. We can create a new image that preserves the original content, while generating new content.

Campanile (text prompt: a high quality photo)
Image 1

I use text prompt 'a dog is running on the grass' in the example below.

a dog is running on the grass
Image 1

I use text prompt 'a bear is sitting on the ground' in the example below.

a bear is sitting on the ground
Image 1

1.7.3 Text-Conditional Image-to-image Translation

This method allows to use text prompt to guide projection.

Text prompt: a rocket ship

Campanile
Image 1

Text prompt: a rocket ship

My cup
Image 1

Text prompt: a lion is running on the grass

a running dog
Image 1

1.8 Visual Anagrams

By following the steps in Visual Anagrams, we can create optical illusions using diffusion models.

Example 1: an oil painting of an old man & an oil painting of people around a campfire

Image 1

Example 2: an oil painting of a fruit bowl & an oil painting of a monkey

Image 1

Example 3: a watercolor of a kitten & a watercolor of a puppy

Image 1

1.9 Hybrid Images

In Factorized Diffusion, we can create a hybrid image by applying a high-pass filter and a low-pass filter separately to two noise and then combining the filtered results to get the final noise to generat image.

Example 1: a lithograph of a skull (low) & a lithograph of waterfalls(high)

Image 1 Image 1

Example 2: oil painting style of a tiger (low) & oil painting style of mountains(high)

Image 1 Image 1

Example 3: a watercolor of a pig (low) & a watercolor of a landscape (high)

Image 1 Image 1

Part B: Diffusion Models from Scratch!

Part 1: Training a Single-Step Denoising UNet

In this part, I follow the structure in the picture to build a U-Net and train it to denoise image by σ = 0.5. I set batch size to 256, and training is conducted over 5 epochs. Besides, I use Adam optimizer for training with a learning rate of 1 × 10-4. Finally, I sample results on the test set after the first and the 5-th epoch and sample results on the test set with out-of-distribution noise levels

Visualiztion of the noisig process

Image 1

Traning loss curve

Image 1

Results from 1st epoch model and 5th epoch model

Image 1
Image 1

Sample results from test with out-distribution noise level

Image 1

Part 2: Training a Diffusion Model

In this part, I implemeted DDPM and add an time-conditioning module to U-Net. For the training process, I set batch size to 128, and training is conducted over 20 epochs. Besides, I use Adam optimizer for training with a learning rate of 1 × 10-3, along with an exponential learning rate decay scheduler (gamma set to 0.1(1⁄num_epochs) )

Training loss curve

Image 1

Sampling results for 5 and 20 epochs

5 epoch (click the image to see the animation)
Image 1
20 epoch (click the image to see the animation)
Image 1

Then, I implemented class-conditioning U-Net and trained it with same configuration with time-conditioning U-Net.

Training loss curve

Image 1

Sampling results for 5 and 20 epochs

5 epoch (click the image to see the animation)
Image 1
20 epoch (click the image to see the animation)
Image 1