Skip to content

How to Fine-Tune LLaMA

LLaMA is an open-source LLM that is similar in its base operation to GPT-3.

By fine-tuning LLaMA, we can turn it into an expert LLM in a specific domain and use it for a variety of generative AI tasks.

Fine-tuning can radically transform a model output, especially with structured and semi-structured data. In this tutorial, we'll fine-tune LLaMA to predict an outcome for a given play in American Football. Given the name of a player and a playcall (pass or run), our fine-tuned model will generate possible outcomes.

LLaMA is not licensed for commercial use

The LLaMA model is not currently licensed for commercial use. LLaMA fine-tuning is offered for research purposes.

Step 0: Prerequisites

After signing up for Blueprint by Baseten, you'll need to do three things to complete this tutorial:

  • Install the latest version of the Baseten Python client with pip install --upgrade baseten
  • Create an API key
  • In your terminal, run:
baseten login

And paste your API key when prompted.

Following this tutorial will consume credits/billable resources

The LLaMA fine-tuning run in this tutorial guide will consume credits (if available on your account) or billable resources.

Step 1: Create your dataset

Fine-tuning LLaMA is the process of giving the model examples of a problem and solution so that it can solve similar problems or answer similar questions. This data is provided as a CSV file with at least two columns.

Dataset tips:

  • Prompt Structure: Fine-tuning LLaMA can be significantly improved by adding some structure to your prompt. For example, if your dataset consists of questions and answers about your favorite book:
source target
The following is a question from a user about a book. Please answer the question as succinctly as possible. Input: {question} Response: {answer}

Prompt structure

When imposing a prompt structure, it's important you use this structure when prompting your fine-tuned model. For example, with a new question, the prompt from above would look like this:

The following is a question from a user about a book. Please answer the question as succinctly as possible. Input: {question} Response:.

  • Quality and Quantity: The better your dataset, the better the fine-tuned model. Gather as many high-quality input/output sequence examples as possible. Manually inspect your dataset to remove bad examples or formatting errors, as these can negatively affect your model's performance.
  • Balanced Data: Ensure your dataset represents diverse examples and avoids biases. This helps create a more versatile and accurate fine-tuned model.
  • Data Preprocessing: Clean and pre-process your data to remove irrelevant information, HTML tags, excessive white spaces, or any other noisy elements. This helps the model focus on the task at hand and improves its performance.

This file will be zipped during the upload process.

Our example dataset for this tutorial contains approximately 35,500 plays from the 2021 NFL season. Here are three example input-output pairs:

source target
"{pass} (Shotgun) 12-T.Brady" " pass deep right to 87-R.Gronkowski for 20 yards, TOUCHDOWN."
"{run} 22-D.Henry" " right tackle to TEN 29 for 19 yards (34-J.Thompson)."
"{run} (Shotgun) 30-A.Ekeler" " left tackle for 3 yards, TOUCHDOWN."

Step 2: Upload dataset

There are three ways to provide a dataset to a fine-tuning run. Click through the tabs to see options.

A "public" URL means a link that you can access without logging in or providing an API key.

The dataset must be a zip of your CSV file.

If you want to follow the tutorial using a pre-built dataset, use the code below as-is. Otherwise, replace the link with a link to your hosted dataset zip file, or check the other tabs for different dataset upload options.

from import PublicUrl
# A URL to a publicly accessible dataset as a zip file
dataset = PublicUrl("")

If you have your dataset on the local machine that you're running your Python code on, you can use the LocalPath option to upload it as part of your fine-tuning script.

from import LocalPath
# A path to a local folder with your dataset
dataset = LocalPath("/path/to/my-dataset.csv", "plays")

If your fine-tuning script is running on one machine and your dataset lives on another, or you want to upload a dataset once and use it in multiple fine-tuning runs, you'll want to upload the dataset separately.

baseten dataset upload is a bash command

Open a terminal window and run:

baseten dataset upload --name my-cool-dataset --training-type LLAMA ./my-dataset.csv

You should see:

Upload Progress: 100% |█████████████████████████████████████████████████████████
INFO 🔮 Upload successful!🔮

Dataset ID:

Then, for your fine-tuning config (your Python code), you'll use:

from import Dataset
# The ID of a dataset already uploaded to Blueprint
dataset = Dataset("DATASET_ID")

Step 3: Assemble fine-tuning config

For the rest of this tutorial, we'll be using Python to configure, create, and deploy a fine-tuned model. Open up a Jupyter notebook or Python file in your local development environment to follow along.

Assembling the config is an opportunity to truly customize the fine-tuning run to meet our exact needs. For a complete reference of every configurable parameter, see the LlamaConfig docs.

Here's an example config:

from import LlamaConfig

config = LlamaConfig(

Step 4: Run fine-tuning

Once your config is set, it's time to kick off the fine-tuning run. This process is straightforward, just use:

from import FinetuningRun

my_run = FinetuningRun.create(
    trained_model_name="Football play generator",

The trained_model_name will be assigned the deployed model.

Fine-tuning a model takes some time. As always, it's a tradeoff between run duration and resulting model quality. Exactly how long depends on:

  • The size of your dataset (more data = longer run)
  • The configured epochs (more epochs = longer run)
  • The configured learning_rate (lower learning rate = longer run)

While you wait, you can monitor the run's progress with:


Your model will be automatically deployed

Once the fine-tuning run is complete, your model will be automatically deployed. You'll receive an email when the deployment is finished and the model is ready to invoke.

You can turn off this behavior by setting auto_deploy=False in FinetuningRun.create() and instead deploy your model manually.

Step 5: Use fine-tuned model

It's time! You can finally invoke the model. Use:

from baseten.models import Llama

model = Llama(model_id="MODEL_ID")

gen_kwargs = {
  "temperature": 1.2,
  "repetition_penalty": 0.9,

completion = model(
  "{pass} 15-P.Mahomes",

Example output:

{pass} 15-P.Mahomes pass short right to 11-M.Sweeney to KC 35 for 5 yards (33-J.White).'

One critical parameter in the model invocation is do_sample. By default, Transformers uses greedy decoding which results in the same output on every invocation with the same prompt. do_sample=True tells the model to use multinomial sampling which results in different results each invocation.