Skip to content

FlanT5BaseConfig

FlanT5BaseConfig dataclass

Bases: FinetuningConfig

Fine-tuning config for Flan-T5 fine-tuning on the Base version of the model.

Examples:

from baseten.training import Dataset, FlanT5BaseConfig

config = FlanT5BaseConfig(
    input_dataset=Dataset("DATASET_ID"),
    epochs=1,
    learning_rate=0.00003
)

Parameters:

Name Type Description Default
input_dataset DatasetIdentifier

An identifier, either an ID or a public URL, for the Dataset that Flan-T5 should use

required
wandb_api_key Optional[str]

API key for Weights & Biases to monitor your model training

None
model_id str

Base model to train (default: "google/flan-t5-base")

'google/flan-t5-base'
epochs int

Number of epochs to run

1
train_batch_size int

Batch size to use for training

16
sample_batch_size int

Batch size to use for sampling

16
generation_max_length int

Maximum length of the generated text during evaluation

140
generation_num_beams int

Number of beams to use for generation

1
learning_rate float

The learning rate to use for training

3e-05
gradient_checkpointing bool

Whether to use gradient checkpointing, which can reduce memory usage

True
logging_steps int

Number of steps between logs

10
source_col_name str

Name of the source column in the input CSV

'source'
target_col_name str

Name of the target column in the input CSV

'target'
weight_decay float

Weight decay to use for training

0.0
adam_beta1 float

Beta1 parameter for AdamW optimizer

0.9
adam_beta2 float

Beta2 parameter for AdamW optimizer

0.999
adam_epsilon float

Epsilon parameter for AdamW optimizer

1e-08
max_grad_norm float

Maximum gradient norm to use for training

1.0
lr_scheduler_type str

Type of learning rate scheduler to use. Options:

  • "linear"
  • "cosine"
  • "cosine_with_restarts"
  • "polynomial"
  • "constant"
  • "constant_with_warmup"
  • "inverse_sqrt"
'linear'
warmup_steps int

Number of steps used for a linear warmup from 0 to learning_rate. Overrides any effect of warmup_ratio.

0
warmup_ratio float

Ratio of total training steps used for a linear warmup from 0 to learning_rate.

0.0
optimizer str

Optimizer to use for training. Options:

  • "adamw_hf"
  • "adamw_torch"
  • "adamw_apex_fused"
  • "adamw_anyprecision"
  • "adafactor"
'adamw_hf'
label_smoothing_factor float

The label smoothing factor to use. Zero means no label smoothing, otherwise the underlying oneshot-encoded labels are changed from 0s and 1s to label_smoothing_factor/num_labels and 1 - label_smoothing_factor + label_smoothing_factor/num_labels respectively.

0.0
metric str

Metric to use for evaluation (options include all metrics supported by the Huggingface Evaluate library)

'rouge'