FlanT5BaseConfig
FlanT5BaseConfig
dataclass
Bases: FinetuningConfig
Fine-tuning config for Flan-T5 fine-tuning on the Base version of the model.
Examples:
from baseten.training import Dataset, FlanT5BaseConfig
config = FlanT5BaseConfig(
input_dataset=Dataset("DATASET_ID"),
epochs=1,
learning_rate=0.00003
)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_dataset |
DatasetIdentifier
|
An identifier, either an ID or a public URL, for the Dataset that Flan-T5 should use |
required |
wandb_api_key |
Optional[str]
|
API key for Weights & Biases to monitor your model training |
None
|
model_id |
str
|
Base model to train (default: "google/flan-t5-base") |
'google/flan-t5-base'
|
epochs |
int
|
Number of epochs to run |
1
|
train_batch_size |
int
|
Batch size to use for training |
16
|
sample_batch_size |
int
|
Batch size to use for sampling |
16
|
generation_max_length |
int
|
Maximum length of the generated text during evaluation |
140
|
generation_num_beams |
int
|
Number of beams to use for generation |
1
|
learning_rate |
float
|
The learning rate to use for training |
3e-05
|
gradient_checkpointing |
bool
|
Whether to use gradient checkpointing, which can reduce memory usage |
True
|
logging_steps |
int
|
Number of steps between logs |
10
|
source_col_name |
str
|
Name of the source column in the input CSV |
'source'
|
target_col_name |
str
|
Name of the target column in the input CSV |
'target'
|
weight_decay |
float
|
Weight decay to use for training |
0.0
|
adam_beta1 |
float
|
Beta1 parameter for AdamW optimizer |
0.9
|
adam_beta2 |
float
|
Beta2 parameter for AdamW optimizer |
0.999
|
adam_epsilon |
float
|
Epsilon parameter for AdamW optimizer |
1e-08
|
max_grad_norm |
float
|
Maximum gradient norm to use for training |
1.0
|
lr_scheduler_type |
str
|
Type of learning rate scheduler to use. Options:
|
'linear'
|
warmup_steps |
int
|
Number of steps used for a linear warmup from 0 to learning_rate. Overrides any effect of warmup_ratio. |
0
|
warmup_ratio |
float
|
Ratio of total training steps used for a linear warmup from 0 to learning_rate. |
0.0
|
optimizer |
str
|
Optimizer to use for training. Options:
|
'adamw_hf'
|
label_smoothing_factor |
float
|
The label smoothing factor to use. Zero means no label smoothing, otherwise the underlying oneshot-encoded labels are changed from 0s and 1s to label_smoothing_factor/num_labels and 1 - label_smoothing_factor + label_smoothing_factor/num_labels respectively. |
0.0
|
metric |
str
|
Metric to use for evaluation (options include all metrics supported by the Huggingface Evaluate library) |
'rouge'
|