Skip to content

How to Fine-Tune FLAN-T5

FLAN-T5 is an open-source LLM that specializes in answering questions and doing step-by-step reasoning.

By fine-tuning FLAN-T5, we can turn it into an expert LLM in a specific domain and use it for a variety of generative AI tasks.

In this tutorial, we'll fine-tune FLAN-T5 to generate a recipe when given the name of a dish.

Step 0: Prerequisites

After signing up for Blueprint by Baseten, you'll need to do three things to complete this tutorial:

  • Install the latest version of the Baseten Python client with pip install --upgrade baseten
  • Create an API key
  • In your terminal, run:
baseten login

And paste your API key when prompted.

Following this tutorial will consume credits/billable resources

The FLAN-T5 fine-tuning run in this tutorial guide will consume credits (if available on your account) or billable resources.

Step 1: Create your dataset

Fine-tuning FLAN-T5 is the process of giving the model examples of a problem and solution so that it can solve similar problems or answer similar questions. This data is provided as a CSV file with at least two columns.

Dataset tips:

  • Instruction Finetuning: FLAN-T5 is designed to interpret instructions. To benefit from this feature, prepend an instruction to your input sequences. For example, instead of passing a news article directly, prepend "Summarize:" or "Summarize the following text:". Check out more examples of instruction templates here.
  • Quality and Quantity: The better your dataset, the better the fine-tuned model. Gather as many high-quality input/output sequence examples as possible. Manually inspect your dataset to remove bad examples or formatting errors, as these can negatively affect your model's performance.
  • Balanced Data: Ensure your dataset represents diverse examples and avoids biases. This helps create a more versatile and accurate fine-tuned model.
  • Data Preprocessing: Clean and preprocess your data to remove irrelevant information, HTML tags, excessive white spaces, or any other noisy elements. This helps the model focus on the task at hand and improves its performance.

This file will be zipped during the upload process.

Our example dataset for this tutorial contains approximately 50,000 recipes. Here's an example input-output pair:

input_templated output_templated
COOK: No-Bake Nut Cookies INGREDIENTS:
1 c. firmly packed brown sugar,
1/2 c. evaporated milk,
1/2 tsp. vanilla,
1/2 c. broken nuts (pecans),
2 Tbsp. butter or margarine,
3 1/2 c. bite size shredded rice biscuits,

DIRECTIONS:
-In a heavy 2-quart saucepan, mix brown sugar, nuts, evaporated milk and butter or margarine.
-Stir over medium heat until mixture bubbles all over top.
-Boil and stir 5 minutes more. Take off heat.
-Stir in vanilla and cereal; mix well.
-Using 2 teaspoons, drop and shape into 30 clusters on wax paper.
-Let stand until firm, about 30 minutes.

Step 2: Upload dataset

There are three ways to provide a dataset to a fine-tuning run. Click through the tabs to see options.

A "public" URL means a link that you can access without logging in or providing an API key.

The dataset must be a zip of your CSV file.

If you want to follow the tutorial using a pre-built dataset, use the code below as-is. Otherwise, replace the link with a link to your hosted dataset zip file, or check the other tabs for different dataset upload options.

from baseten.training import PublicUrl
# A URL to a publicly accessible dataset as a zip file
dataset = PublicUrl("https://cdn.baseten.co/docs/production/DatasetRecipes.csv.zip")

If you have your dataset on the local machine that you're running your Python code on, you can use the LocalPath option to upload it as part of your fine-tuning script.

from baseten.training import LocalPath
# A path to a local folder with your dataset
dataset = LocalPath("/path/to/my-dataset.csv", "recipies")

If your fine-tuning script is running on one machine and your dataset lives on another, or you want to upload a dataset once and use it in multiple fine-tuning runs, you'll want to upload the dataset separately.

baseten dataset upload is a bash command

Open a terminal window and run:

baseten dataset upload --name my-cool-dataset --training-type FLAN_T5 ./my-dataset.csv

You should see:

Upload Progress: 100% |█████████████████████████████████████████████████████████
INFO 🔮 Upload successful!🔮

Dataset ID:
DATASET_ID

Then, for your fine-tuning config (your Python code), you'll use:

from baseten.training import Dataset
# The ID of a dataset already uploaded to Blueprint
dataset = Dataset("DATASET_ID")

Step 3: Assemble fine-tuning config

For the rest of this tutorial, we'll be using Python to configure, create, and deploy a fine-tuned model. Open up a Jupyter notebook or Python file in your local development environment to follow along.

Assembling the config is an opportunity to truly customize the fine-tuning run to meet our exact needs. For a complete reference of every configurable parameter, see the FlanT5BaseConfig docs.

Here's an example config:

from baseten.training import FlanT5BaseConfig

config = FlanT5BaseConfig(
    input_dataset=dataset,
    source_col_name="input_templated",
    target_col_name="output_templated",
    epochs=1,
    train_batch_size=32,
    learning_rate=0.00003,
)

Step 4: Run fine-tuning

Once your config is set, it's time to kick off the fine-tuning run. This process is straightforward, just use:

from baseten.training import FinetuningRun

my_run = FinetuningRun.create(
    trained_model_name="Recipe generator",
    fine_tuning_config=config
)

The trained_model_name will be assigned the deployed model.

Fine-tuning a model takes some time. As always, it's a tradeoff between run duration and resulting model quality. Exactly how long depends on:

  • The type of FLAN-T5 you're fine-tuning (base takes the shortest)
  • The size of your dataset (more data = longer run)
  • The configured epochs (more epochs = longer run)
  • The configured learning_rate (lower learning rate = longer run)

While you wait, you can monitor the run's progress with:

my_run.stream_logs()

Your model will be automatically deployed

Once the fine-tuning run is complete, your model will be automatically deployed. You'll receive an email when the deployment is finished and the model is ready to invoke.

You can turn off this behavior by setting auto_deploy=False in FinetuningRun.create() and instead deploy your model manually.

Step 5: Use fine-tuned model

It's time! You can finally invoke the model. Use:

from baseten.models import FlanT5

model = FlanT5("MODEL_ID")
recipe = model("COOK: Blueberry pancakes", max_length=512, early_stopping=True)

Example output:

INGREDIENTS:
1/2 c. blueberries,1 c. flour,1 tsp. soda,1 tsp. salt,1 Tbsp. sugar,1 egg,3 Tbsp. margarine, melted,1 c. buttermilk,
DIRECTIONS:
-Mix dry ingredients.
-Add egg, margarine and buttermilk. Stir by hand until well blended.
-Pour onto non-stick pan or griddle with a little oil added.
-Sprinkle blueberries into batter
-Cook until bubbly and turn over.