Creating a dataset for Dreambooth

Formatting

Your dataset must be a directory. For Dreambooth, the directory structure must be:

- my_dog_images 
    - object 
        - charlie_image_one.jpeg
        - pic_of_dog.jpeg
        - another_picture.png 
    - prior_preservation
        - a_dog_1.jpeg
        - another_dog.png

Dataset contents:

A top level folder named anything you want
A subfolder named object that contains photos of the object you're interested in training on (e.g. pictures of your dog).
Optional: a subfolder named prior_preservation that contains photos of other objects of the same type (e.g. pictures of different dogs) to help Stable Diffusion learn what is unique about your object.

To upload and use your dataset in a FinetuningRun, you'll need to create a DatasetIdentifier object.

Tips

Get as many photos of the object as possible. The more images, the better the training. You need at least five.
Make sure the object is not blurry and can clearly be seen.
Angles are important, capturing as many angles as possible helps Stable Diffusion better learn your object.

Effective images:

The subject is clearly visible in the foreground and is photographed from different angles.

Example effective images

Ineffective images:

In these photos, the adorable subject is obscured behind blankets and other obstacles.

Example ineffective images

FAQ

Can I provide photos in any format?

Blueprint supports photos in JPEG, PNG, and more generally, any image format accepted by Pillow.

Does the name of the image matter?

The name of the image doesn't matter. As long as it's placed under the right subdirectory, Blueprint fine-tuning API will be able to use your dataset.

Does the size or resolution of the image matter?

Blueprint automatically scales down images to the correct resolution. This is determined by the resolution parameter in the DreamboothConfig.

What's next?

Use your dataset to create a fine-tuning run