Creating a dataset for Full Stable Diffusion

Formatting

Your dataset must be a directory. For Full Stable Diffusion, the directory structure must be:

- van_gogh_artwork
    - a.jpeg
    - a.txt
    - b.png 
    - b.txt

Dataset contents:

A top level folder named anything you want
For every image, there should be a corresponding .txt file that contains the caption.

Image and text file names must match

If a dataset has an image van_gogh_artwork_1.jpeg, Blueprint expects a corresponding text file named van_gogh_artwork_1.txt. If there are images without corresponding text files, the fine-tuning run will fail.

To upload and use your dataset in a FinetuningRun, you'll need to create a DatasetIdentifier object.

FAQ

What should my captions be?

When you invoke your fine-tuned model, the captions in your dataset will determine how you will prompt the model. If you'd like to be able to tell Stable Diffusion to generate a certain color, for example, mention the color in the caption!

Captions should be descriptive and detailed and they should follow the style of prompting you want to use when invoking the model. However, Stable Diffusion does have a limit on caption sizes. Captions shouldn't be greater than 75 tokens (1 token roughly equates to 4 letters in the English alphabet).

Example images with captions:

a drawing of a green pokemon with red eyes (left)
a green and yellow toy with a red nose (middle)
a red and white ball with an angry look on its face (right)

Example stable diffusion images

Can I provide photos in any format?

Blueprint supports photos in JPEG, PNG, and more generally, any image format accepted by Pillow.

Does the name of the image matter?

The name of the image file must match the name of the associated caption text file (name1.png matches name1.txt). But the name can be anything, as long as it matches.

Does the size or resolution of the image matter?

Blueprint automatically scales down images to the correct resolution. This is determined by the resolution parameter in the DreamboothConfig.

What's next?

Use your dataset to create a fine-tuning run