kiwi.training.optimizers
OptimizerConfig
Base class for all pydantic configs. Used to configure base behaviour of configs.
get_noam_decay_schedule(optimizer: Optimizer, num_warmup_steps: int, model_size: int)
get_noam_decay_schedule
Create a schedule with the learning rate decay strategy from the AIAYN paper.
optimizer_class(name)
optimizer_class
optimizer_name(cls)
optimizer_name
from_config(config: OptimizerConfig, parameters: Iterator[Parameter], model_size: int = None, training_data_size: int = None) → Union[Optimizer, List[Optimizer], Tuple[List[Optimizer], List[Any]]]
from_config
common options shared by most optimizers
kiwi.training.optimizers.
logger
Bases: kiwi.utils.io.BaseConfig
kiwi.utils.io.BaseConfig
class_name
learning_rate
Starting learning rate. Recommended settings: sgd = 1, adagrad = 0.1, adadelta = 1, adam = 0.001
encoder_learning_rate
Different learning rate for the encoder. If set, the encoder will use a different learning rate from the rest of the parameters.
warmup_steps
Increase the learning rate until X steps. Integer steps for noam optimizer and adamw. If float, use it as portion of training_steps.
training_steps
Total number of training steps. Used for the adamw optimizer. If not specified, use training dataset size.
learning_rate_decay
new_lr = lr * factor. Scheduler is only used if this is greater than 0.
new_lr = lr * factor
Factor by which the learning rate will be reduced
learning_rate_decay_start
Number of epochs with no improvement after which learning rate will be reduced. Only applicable if learning_rate_decay is greater than 0.
load
cast_steps
optimizer – wrapped optimizer.
num_warmup_steps – the number of steps to linearly increase the learning rate.
model_size – the hidden size parameter which dominates the number of parameters in your model.
OPTIMIZERS_MAPPING
config – common options shared by most optimizers
parameters – model parameters
model_size – required for the Noam LR schedule; if not provided, the mode of all parameters’ last dimension is used
Single optimizer
List or Tuple - List of optimizers
learning-rate schedulers
Notes
We currently never return multiple optimizers or schedulers, so option 2 above is not taking place yet. Option 3 returns a single optimizer and scheduler inside lists.
kiwi.training.callbacks
kiwi.utils