Uploading a dataset
To use fine-tune a model, you need a dataset. There are three ways to upload your dataset to Blueprint.
Before uploading a dataset, make sure your local setup is complete.
Uploading a dataset from your computer
Uses LocalPath
.
If your dataset lives on your local machine, pass a LocalPath
object to your FinetuningConfig
and your dataset will be uploaded automatically when your FinetuningRun
is created.
If your dataset is a directory (e.g. images for Dreambooth, Stable Diffusion):
from baseten.training import LocalPath
dataset = LocalPath(path="./my-dataset-directory", name="my-cool-dataset")
If your dataset is a single file (e.g. csv for FLAN-T5):
from baseten.training import LocalPath
dataset = LocalPath(path="./my-dataset.csv", name="my-cool-dataset")
Uploading a dataset from a URL
Uses PublicUrl
.
If your dataset is hosted at a publicly accessible URL, you can point to it by creating a PublicUrl
object.
from baseten.training import PublicUrl
dataset = PublicUrl(url="https://cdn.baseten.co/docs/production/DreamboothSampleDataset.zip")
If your dataset is a single file, it still must be zipped, e.g. https://cdn.baseten.co/docs/production/DatasetRecipes.csv.zip
.
Using a dataset already uploaded to Blueprint
Uses Dataset
.
If you have already uploaded a dataset to Blueprint, you can use it by instantiating a Dataset
object and accessing your dataset by its ID.
Uploading datasets manually
If you want to upload a dataset and get a Dataset ID, use this CLI command:
baseten dataset upload
is a bash command
Open a terminal window and run:
Notes:
- If the
name
parameter is not provided, Blueprint will name your dataset based on the directory name. - If you're doing a Full Stable Diffusion run, instead use
--training-type CLASSIC_STABLE_DIFFUSION
.