Fine-tuning Stable Diffusion with Dreambooth in AWS SageMaker Studio (2024)

Fine-tuning Stable Diffusion with Dreambooth in AWS SageMaker Studio (3)

In school, I went through engineering (read CAD) drawing. I can draw an isometric view of something I want to build or a part I need to fabricate. By hand, it’s not great quality, but it gets the job done. In CAD software ( like SketchUp), I can do a pretty good job. I am absolutely abysmal when it comes to free-hand drawing or doing any real kind of artistic drawing, even on a computer.

With Stable Diffusion and other generative AI of the same ilk, the ability to create some truly amazing content is well within reach for those of us whose artistic skill peaked somewhere around ten years old. Being an engineer, I’m not satisfied with just going to https://www.craiyon.com/ or https://stablediffusionweb.com/ and entering my prompts. I want to code something. More specifically, I want to code something that will generate super cool, exciting headshots of myself to add to the internet instead of my dull, real-world face (I want to look like an awesome, techno-wizard Dumbledore, from some noir-esque world that is a cross between Blade Runner/Sin City and Harry Potter!)

I had a tough time finding a Jupyter notebook that I could run in something like SageMaker Studio and fine-tune my own model. So, I threw one together. You can find a sample notebook at my GitHub repo.

Let’s dive right in and take a look at the notebook.

I’ve endeavored to comment on the notebook pretty well, so I will skip the uninteresting stuff like imports and PIP installs.

Initial configuration of the fine-tuning

We need to set some information about how we fine-tune. By supplying this list of concepts, we can tell Dreambooth about additional items we want to teach it. For the sample I have, I am teaching it one extra thing; however, this is an array so you can teach multiple new concepts in one training step.

concepts_list = [ 
{
"instance_prompt": "photo of cc person",
"class_prompt": "photo of a person",
"instance_data_dir": "./content/data/cc",
"class_data_dir": "./content/data/person"
},
]

Here is a quick breakdown of what each of those parameters means:

-instance_prompt - the prompt we would type to generate 
the image we are attempting to fine tune

-class_prompt - denotes a prompt without the unique identifier/instance.
This prompt is used for generating "class images" for
prior preservation. For our example, this prompt is
- "a photo of a person" versus a photo of a specific person.

-instance_data_dir - the location where our training images are stored
for finetuning

-class_data_dir - sample images for the general class of prompt we are
fine tuning - if there are no images here, samples will
be generated.
Otherwise, you can provie ~20 images of the general
concept you want to generate (but not the actual instance
images that we finetune on)

In your notebook, feel free to change ‘photo of cc person’ to ‘photo of YOUR_NAME person’. Make sure to keep the prompt itself the same, and switch out ‘cc’->YOUR_NAME. This prompt snippet guides Stable Diffusion to know how to use this new data when generating new images.

The only other thing that you may want to change here is the instance_data_dir. This is a relative path to the folder that has the training images you want to use. Whether you change this directory or not, now is the time to upload the images you want to train with. So you go ahead and do that; I’ll wait here.

HuggingFace Accelerate training

Welcome back. Now you have some images uploaded and are ready for a training run. I want to walk through this next section of code since it is doing all the heavy lifting for us.

MODEL_NAME = "runwayml/stable-diffusion-v1-5"
PRECISION = "fp16"
MAX_TRAIN_STEPS = 1200

#!accelerate launch --help
!accelerate launch --mixed_precision="fp16" --num_processes=1 --num_machines=1 --num_cpu_threads_per_process=2 \
train_dreambooth_ShivamShrirao.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--pretrained_vae_name_or_path="stabilityai/sd-vae-ft-mse" \
--output_dir=$OUTPUT_DIR \
--revision=$PRECISION \
--with_prior_preservation --prior_loss_weight=1.0 \
--seed=1337 \
--resolution=512 \
--train_batch_size=1 \
--train_text_encoder \
--mixed_precision=$PRECISION \
--use_8bit_adam \
--gradient_accumulation_steps=1 \
--learning_rate=1e-6 \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--num_class_images=50 \
--sample_batch_size=4 \
--max_train_steps=$MAX_TRAIN_STEPS \
--save_interval=100000 \
--save_sample_prompt="photo of cc person" \
--concepts_list="concepts_list_cc.json"

We are using the HuggingFace Accelerate python package to do the training for us, we supply the training script and some hyperparameters, and it does magic behind the scenes.

The training file is a fork of the HuggingFace example located here.

The custom training file focuses on performance improvements and memory optimization and can be found here.

Parameters worth calling out:

MODEL_NAME - what model version you want to run from HuggingFace. 
There are a few providers of SD models you can choose from.
For instance, you may want to try out v2 (v1-5 gives better outputs in my opinion)

PRECISION - leave this as fp16, especially if you are running in a low memory
(ie <16g) environment

MAX_TRAIN_STEPS - 800-1200 usually give good results based on my testing

seed - you can make this randomly generated, but it is
normal to see this hardcoded once you find a training run that
generates a good set of initial images

learning_rate - can be altered but I tend to see 1e-6 to 5e-6

save_interval - keep this higher than the MAX_TRAIN_STEPS or you may run out
of memory in CUDA during the save step

save_sample_prompt - if you updated this prompt in the concept list
update it here as well

That will take several minutes to run. You should see some output from the training itself. A quick word of warning: I originally put this notebook together in Nov/Dec 2022 and had no issues with the training. However, while I was working on the code for this post, I noticed that something has changed in the underlying libraries, and I have run into numerous CUDA out-of-memory errors. I have rolled back packages and found settings that still work, but your mileage may vary. I will revisit the notebook a few months later and see if future updates help fix the issue, but you should still have a functional notebook to play with until then.

Generate a grid of preview images

Now that the training has completed, running the next couple of cells in the notebook will bring you to a grid of the preview images. WARNING: these will not be flattering images. They will look weird; you may have too many fingers, or noses, or eyes. That’s ok, but if they look too surreal, change the seed value we discussed above and re-run until you get something that looks better.

Fine-tuning Stable Diffusion with Dreambooth in AWS SageMaker Studio (4)

See what I mean? They look weird right now, but generally ok. In the two right-most pictures, there is something wrong with the hands it has generated for me. The second image in the grid has the wrong eye color. Image number 7 decided to grow a baby out of my shoulder (Honest, I didn’t add any training images of me holding a baby, I have no idea why but even after running multiple iterations, it will randomly do this).

Inference

Finally, it’s time for the fun part. Run the rest of the cells until you get to this point.

#this is the normal prompt you are used to when generating images from text
#make sure to include the phrase 'photo of XX person'
#to force the model to use your finetuned results as a starting point
prompt = "hyper-maximalist overdetailed comic book illustration headshot
photo of cc person as hero. Give him a long, luxurious beard like Dumbledore.
Make the image dark and gritty"

#negative prompts allow for removing/limiting what will be included
#commonly would use 'dupliate' to ensure you don't get multiple copies
#of the instance iamge in a single output
negative_prompt = "duplicate"

with autocast("cuda"), torch.inference_mode():
images = pipe(
prompt,
height=512,
width=512,
negative_prompt=negative_prompt,
num_images_per_prompt=1,
num_inference_steps=100,
guidance_scale=8.5,
generator=g_cuda
).images

for img in images:
dt = datetime.now()
ts = datetime.timestamp(dt)

display(img)
img.save('./content/ccOutputs/'+str(ts) + ".jpg", "JPEG")

Again, parameters worth noting:

height/width - leave these at 512 as the underlying diffusers are using this

num_images_per_prompt - you can generate one or multiple images at a time

guidance_scale - this is how closely the algorithm will follow your prompt
versus how much "creativity" it can have in generation.
I tend to see this between 7 and 8.5, but feel free to
experiment here, it only affects these newly generated
images.

num_inference_steps - how many denoising steps the algorithm will use when
generating your image. Stable Diffusion should do well
with as few as 50 steps, but for more detailed images
you can go higher. I've had good luck with 100.

That’s it! When you run that cell, your model will generate a new image of you, display it on the screen, and save a copy, so you don’t lose it when you inevitably get caught up in the moment and power-run this cell a few dozen times looking at all the fantastic (and sometimes incredibly horrific) results you get.

Fine-tuning Stable Diffusion with Dreambooth in AWS SageMaker Studio (2024)
Top Articles
Why Is 9Anime Buffering So Much
Chicken Alfredo with Garlicky Spinach
Funny Roblox Id Codes 2023
Golden Abyss - Chapter 5 - Lunar_Angel
Www.paystubportal.com/7-11 Login
Joi Databas
DPhil Research - List of thesis titles
Shs Games 1V1 Lol
Evil Dead Rise Showtimes Near Massena Movieplex
Steamy Afternoon With Handsome Fernando
Which aspects are important in sales |#1 Prospection
Detroit Lions 50 50
18443168434
Newgate Honda
Zürich Stadion Letzigrund detailed interactive seating plan with seat & row numbers | Sitzplan Saalplan with Sitzplatz & Reihen Nummerierung
Grace Caroline Deepfake
978-0137606801
Nwi Arrests Lake County
Justified Official Series Trailer
London Ups Store
Committees Of Correspondence | Encyclopedia.com
Pizza Hut In Dinuba
Jinx Chapter 24: Release Date, Spoilers & Where To Read - OtakuKart
How Much You Should Be Tipping For Beauty Services - American Beauty Institute
Free Online Games on CrazyGames | Play Now!
Sizewise Stat Login
VERHUURD: Barentszstraat 12 in 'S-Gravenhage 2518 XG: Woonhuis.
Jet Ski Rental Conneaut Lake Pa
Unforeseen Drama: The Tower of Terror’s Mysterious Closure at Walt Disney World
Ups Print Store Near Me
C&T Wok Menu - Morrisville, NC Restaurant
How Taraswrld Leaks Exposed the Dark Side of TikTok Fame
University Of Michigan Paging System
Dashboard Unt
Access a Shared Resource | Computing for Arts + Sciences
Speechwire Login
Gopher Carts Pensacola Beach
Duke University Transcript Request
Lincoln Financial Field, section 110, row 4, home of Philadelphia Eagles, Temple Owls, page 1
Jambus - Definition, Beispiele, Merkmale, Wirkung
Ark Unlock All Skins Command
Craigslist Red Wing Mn
D3 Boards
Jail View Sumter
Nancy Pazelt Obituary
Birmingham City Schools Clever Login
Thotsbook Com
Funkin' on the Heights
Vci Classified Paducah
Www Pig11 Net
Ty Glass Sentenced
Latest Posts
Article information

Author: Rubie Ullrich

Last Updated:

Views: 6034

Rating: 4.1 / 5 (52 voted)

Reviews: 91% of readers found this page helpful

Author information

Name: Rubie Ullrich

Birthday: 1998-02-02

Address: 743 Stoltenberg Center, Genovevaville, NJ 59925-3119

Phone: +2202978377583

Job: Administration Engineer

Hobby: Surfing, Sailing, Listening to music, Web surfing, Kitesurfing, Geocaching, Backpacking

Introduction: My name is Rubie Ullrich, I am a enthusiastic, perfect, tender, vivacious, talented, famous, delightful person who loves writing and wants to share my knowledge and understanding with you.