Skip to content

Blueprint documentation

FlanT5BaseConfig

FlanT5BaseConfig

FlanT5BaseConfig `dataclass`

Bases: FinetuningConfig

Fine-tuning config for Flan-T5 fine-tuning on the Base version of the model.

Examples:

from baseten.training import Dataset, FlanT5BaseConfig

config = FlanT5BaseConfig(
    input_dataset=Dataset("DATASET_ID"),
    epochs=1,
    learning_rate=0.00003
)

Parameters:

Name	Type	Description	Default
`input_dataset`	`DatasetIdentifier`	An identifier, either an ID or a public URL, for the Dataset that Flan-T5 should use	required
`wandb_api_key`	`Optional[str]`	API key for Weights & Biases to monitor your model training	`None`
`model_id`	`str`	Base model to train (default: "google/flan-t5-base")	`'google/flan-t5-base'`
`epochs`	`int`	Number of epochs to run	`1`
`train_batch_size`	`int`	Batch size to use for training	`16`
`sample_batch_size`	`int`	Batch size to use for sampling	`16`
`generation_max_length`	`int`	Maximum length of the generated text during evaluation	`140`
`generation_num_beams`	`int`	Number of beams to use for generation	`1`
`learning_rate`	`float`	The learning rate to use for training	`3e-05`
`gradient_checkpointing`	`bool`	Whether to use gradient checkpointing, which can reduce memory usage	`True`
`logging_steps`	`int`	Number of steps between logs	`10`
`source_col_name`	`str`	Name of the source column in the input CSV	`'source'`
`target_col_name`	`str`	Name of the target column in the input CSV	`'target'`
`weight_decay`	`float`	Weight decay to use for training	`0.0`
`adam_beta1`	`float`	Beta1 parameter for AdamW optimizer	`0.9`
`adam_beta2`	`float`	Beta2 parameter for AdamW optimizer	`0.999`
`adam_epsilon`	`float`	Epsilon parameter for AdamW optimizer	`1e-08`
`max_grad_norm`	`float`	Maximum gradient norm to use for training	`1.0`
`lr_scheduler_type`	`str`	Type of learning rate scheduler to use. Options: "linear" "cosine" "cosine_with_restarts" "polynomial" "constant" "constant_with_warmup" "inverse_sqrt"	`'linear'`
`warmup_steps`	`int`	Number of steps used for a linear warmup from 0 to learning_rate. Overrides any effect of warmup_ratio.	`0`
`warmup_ratio`	`float`	Ratio of total training steps used for a linear warmup from 0 to learning_rate.	`0.0`
`optimizer`	`str`	Optimizer to use for training. Options: "adamw_hf" "adamw_torch" "adamw_apex_fused" "adamw_anyprecision" "adafactor"	`'adamw_hf'`
`label_smoothing_factor`	`float`	The label smoothing factor to use. Zero means no label smoothing, otherwise the underlying oneshot-encoded labels are changed from 0s and 1s to label_smoothing_factor/num_labels and 1 - label_smoothing_factor + label_smoothing_factor/num_labels respectively.	`0.0`
`metric`	`str`	Metric to use for evaluation (options include all metrics supported by the Huggingface Evaluate library)	`'rouge'`