Training Models using Dreambooth (Google Colab) (2024)

Imagine you don't have enough GPU to train and fine-tune the model. Your quick solution is Dreambooth, officially managed by Google is a way to fine tuning your subject with a set of relevant data. We can train any dataset like any objects, human faces, animals etc. The more detailed information can be gathered from the research paper.

Here, we are going to fine tune the pre-trained stable diffusion model with new image data set. To do this, there are multiple ways like LoRA, Hyper networks, etc. are available which we have covered.

Now, we will see what we can do using Dreambooth in Google Colab. Well, you can also do in locally to perform this operation but make sure you have at least 8GB(recommended) of VRAM.

Training Steps:

Google Colab provides the provision to test your project on the free tier. If you observe any performance issues while training then you can also switch to their paid plan. But, for tutorial, we are using the free plan.

1. First, move to the Dreambooth notebook file which will open into the Google Colab provided below.

~~https://colab.research.google.com/github/TheLastBen/fast-stable-diffusion/blob/main/fast-DreamBooth.ipynb~~

2. Click on the "Connect" button to get connected to T4 GPU. It's compulsory to activate it to use the VRAM. After getting connected it will show a green check mark.

3. Next is to click on "Play" button to connect to your Google Drive. Yeah, this will use your Google Drive for temporary purpose to store the models and perquisites.

You can check the respective repository by analyzing the stars and forks it got. This signifies that how much popular GitHub repository. This determines that the community is using it aggressively and its trustworthy.

Now, click "Run anyway" and again select "Connect to google Drive".

4. Select the model you want to do training with. By default it has Stable Diffusion1.5, you can also alternatively use different models like Stable Diffusion 2.1, Stable Diffusion XL1.0 from Hugging Face platform.

After uploading, a confirmation message will be shown "Done, proceed to the next cell".

11. Caption- The Caption section is used to caption your uploaded images with a description. Run the cell by pressing the "Play" button. after a few seconds you will see all the uploaded images.

Try to rename it as what is described in the image by selecting one by one and save it. Take in mind to clearly describe the image because the better you add the relevant information the better your model will generate the results.

12. Concept Images(Regularization)- Well, this is optional. There are some cases where you need to train you model in some specific type or art style like Picasso, Davincii or Anime. Then you need to upload the amount of 100-300 images of a similar specific style that you have wanted. For example you want to train a cat image then you have to use only the cat's images in different positions.

Similarly, like we explained earlier "IMAGES_FOLDER_OPTIONAL" is helps to use your Google drive folder path for Regularization images.

13. Training section- According to the developers of Dreambooth, Stable Diffusion easily over fits much easier. So, its to take care between its learning rate and the training steps. So, they instructed in their research paper is to use lower learning rate yielding in better results.

While training what we experienced is for human faces and objects (tabular form) the best combination is-

	Object	Human
Learning Rate	2e-6	1e-6 or 2e-6
Training steps	450	1200 or 1500

Here, the "Resume Training" option is used to resume the training wherever you have left it.

Now, after training if you observe that the images generated are not up to mark, like blurry, noisy then you need to play with the learning rate and training steps values.

If we don't get satisfactory results, then we will try to change the learning rate from 1500 to ~2000. Again if this will not be perfect then we will try with learning rate 1500 with training steps of "1e-6".

Sometimes you just don't need to do anything.

14. Text_Encoder_Training_Steps- This option is used for mentioning the steps. Usually, 200-450 is used for smaller data sets. In our case it's a very small amount of images. So, feeding it to 200 is a good value.

15. Text_Encoder_Concept_Training_Steps- This is used when you are training an object in different positions. This is set to 0 if you are using human faces.

16. Text_Encoder_Learning_Rate- Its been instructed to keep it low to avoid over fitting (1e-6 is higher than 4e-7). This means its 1 timesthe power of -6 and 4 timesthe power of -7.

17. Offset Noise- You should use this option if you are training your model with something particular style. But, we have unchecked this because we are only training the human faces.

18. External Captions- This is for external text files with captioning. It can be left as it is.

19. Resolution- Make sure to use the lower value as compared with the resolution you have set in after cropping the images. For instance, if you have uploaded images with 1024 pixels then you should set it between 512-960.

20. Save_Checkpoint_Every_n_Steps-Because training takes almost 20-30 minutes, an option to save your checkpoints with every step has been provided.

21. Test the trained model- This section is used to testing the already trained model. This will open the Stable Diffusion WebUI for using your local tunnel. To use this, you need to check the "Use_localtunnel" option.

The alternate way is to use the Gradio app with your credentials. Now, you can run the cell by clicking on the play button.

After a few seconds, you will see a live URL. Click on it to open a WebUI.

22. Save your trained model to Hugging Face- After training your model, you can sign in to the Hugging Face account and save your models as your repository. Now, if you want to load your saved model then simply copy its Hugging Face ID and paste inPath_to_HuggingFaceand use it in your workflow.

Important Tips:

According to the research paper, if you are not getting the perfect results in image generation then you should try some tweaking with the parameters which we have mentioned below:-

Try different sampling methods like Euler, DDIM or DPM++, etc.
Be focused on detailing when putting your image prompt.
Use negative prompts to add more perfection.
Symbols can also be used to give more importance.
Set the CFG scale to low from7-15for more creativity.
Using Sampling Steps values of20-25will be better but the generation time will be comparatively longer.

Training Models using Dreambooth (Google Colab) (2024)

FAQs

Can I train models on Google Colab? ›

Colaboratory by Google (Google Colab in short) is a Jupyter notebook based runtime environment which allows you to run code entirely on the cloud. This is necessary because it means that you can train large scale ML and DL models even if you don't have access to a powerful machine or a high speed internet access.

Know More ›

Is Google Colab enough for machine learning? ›

Colab allows anyone to write and run arbitrary Python code from the browser, making it especially suitable for machine learning, data analysis, and education. Technically, Colab is a hosted Jupyter notebook service that provides free access to computing resources, including GPUs, while requiring no setup to use.

Read The Full Story ›

How long does stable diffusion Dreambooth take to train? ›

Training with Dreambooth only requires a limited amount of training data. For example: 20 images is enough to create an embedding, and the data doesn't have to be high-quality. It will still work fine with poorly lit or incomplete images. The training time is typically around an hour on an RTX3090 with 24G GPU RAM.

Find Out More ›

How long can I train a model on Google Colab? ›

Not forgetting about the maximum lifetime of a Colab instance of 12 hours. This sort of makes sense, that they want to fully utilize their GPU and TPUs and ensure that they are always in use and not idling. But it does not help you and me if we want to train our models overnight and over many days.

See Details ›

Which is better, Google Colab or Jupyter Notebook? ›

- Use Jupyter Notebooks if you prefer local development or if your organization has its own JupyterHub setup. - Choose Google Colab if you require free access to GPU/TPU resources, seamless collaboration, or if you prefer a cloud-based environment integrated with Google Drive.

View Details ›

What is the disadvantage of Google Colab? ›

Disadvantages: Limited runtime, dependency on internet connection. Advantages: Provides computational resources for running CNN training, avoids software configuration. Disadvantages: Potential challenges and risks in relying on Colab as an educational platform.

Know More ›

Why not use Google Colab? ›

No Live-Editing: Writing a code and sharing the same with your partner or a team allows you to collaborate. However, the option for live editing is completely missing in Google Colab, which restricts two people to write, or edit codes at the same time. Hence, it further leads to a lot of back and forth re-sharing.

Get More Info ›

How many hours can colab run? ›

In the version of Colab that is free of charge notebooks can run for at most 12 hours, depending on availability and your usage patterns. Colab Pro and Pay As You Go offer you increased compute availability based on your compute unit balance.

Get More Info ›

Can you train your own Stable Diffusion model? ›

How to train Stable Diffusion model at home? You have the flexibility to train your Stable Diffusion model using a range of tools and platforms, including Jupyter Notebooks, or TensorFlow. These platforms allow you to conduct experiments, handle model management tasks, and generate images seamlessly.

Show Me More ›

What is the difference between DreamBooth and LoRA? ›

Dreambooth produces larger models, Textual Inversion results in very small and easily shareable embeddings, and LoRA offers a faster training time with smaller, portable layers.

Keep Reading ›

How many images do you need for Stable Diffusion training? ›

Training the Stable Diffusion Model. Now the pictures: You'll want at least 20 pictures or so for your AI model to analyze in order to avoid creating a bunch of generic person art or nightmare fuel. So bust out your phone and take some selfies!

Read The Full Story ›

Is Google Colab good for training? ›

Google Colab is particularly popular with machine learning researchers and practitioners. It provides access to free GPUs and TPUs, making it much easier to train machine learning models that require a lot of computing power.

Vertex AI quota	Colab Enterprise interaction	Limit
Resource management requests	Runtime and runtime template requests	600/minute
Job or long-running operation requests per minute	Runtime and runtime template requests	60/minute