Uploading a dataset
To use fine-tune a model, you need a dataset. There are three ways to upload your dataset to Blueprint.
Before uploading a dataset, make sure your local setup is complete.
Uploading a dataset from your computer
If your dataset lives on your local machine, pass a
LocalPath object to your
FinetuningConfig and your dataset will be uploaded automatically when your
FinetuningRun is created.
If your dataset is a directory (e.g. images for Dreambooth, Stable Diffusion):
from baseten.training import LocalPath dataset = LocalPath(path="./my-dataset-directory", name="my-cool-dataset")
If your dataset is a single file (e.g. csv for FLAN-T5):
from baseten.training import LocalPath dataset = LocalPath(path="./my-dataset.csv", name="my-cool-dataset")
Uploading a dataset from a URL
If your dataset is hosted at a publicly accessible URL, you can point to it by creating a
from baseten.training import PublicUrl dataset = PublicUrl(url="https://cdn.baseten.co/docs/production/DreamboothSampleDataset.zip")
If your dataset is a single file, it still must be zipped, e.g.
Using a dataset already uploaded to Blueprint
If you have already uploaded a dataset to Blueprint, you can use it by instantiating a
Dataset object and accessing your dataset by its ID.
Uploading datasets manually
If you want to upload a dataset and get a Dataset ID, use this CLI command:
baseten dataset upload is a bash command
Open a terminal window and run:
- If the
nameparameter is not provided, Blueprint will name your dataset based on the directory name.
- If you're doing a Full Stable Diffusion run, instead use