kiwi.training.optimizers

Module Contents

Classes

OptimizerConfig

Base class for all pydantic configs. Used to configure base behaviour of configs.

Functions

get_noam_decay_schedule(optimizer: Optimizer, num_warmup_steps: int, model_size: int)

Create a schedule with the learning rate decay strategy from the AIAYN paper.

optimizer_class(name)

optimizer_name(cls)

from_config(config: OptimizerConfig, parameters: Iterator[Parameter], model_size: int = None, training_data_size: int = None) → Union[Optimizer, List[Optimizer], Tuple[List[Optimizer], List[Any]]]

param config

common options shared by most optimizers

kiwi.training.optimizers.logger
class kiwi.training.optimizers.OptimizerConfig

Bases: kiwi.utils.io.BaseConfig

Base class for all pydantic configs. Used to configure base behaviour of configs.

class_name :str
learning_rate :float

Starting learning rate. Recommended settings: sgd = 1, adagrad = 0.1, adadelta = 1, adam = 0.001

encoder_learning_rate :float

Different learning rate for the encoder. If set, the encoder will use a different learning rate from the rest of the parameters.

warmup_steps :Union[float, int]

Increase the learning rate until X steps. Integer steps for noam optimizer and adamw. If float, use it as portion of training_steps.

training_steps :int

Total number of training steps. Used for the adamw optimizer. If not specified, use training dataset size.

learning_rate_decay :float = 1.0

new_lr = lr * factor. Scheduler is only used if this is greater than 0.

Type

Factor by which the learning rate will be reduced

learning_rate_decay_start :int = 2

Number of epochs with no improvement after which learning rate will be reduced. Only applicable if learning_rate_decay is greater than 0.

load :Path
cast_steps(cls, v)
kiwi.training.optimizers.get_noam_decay_schedule(optimizer: Optimizer, num_warmup_steps: int, model_size: int)

Create a schedule with the learning rate decay strategy from the AIAYN paper.

Parameters
  • optimizer – wrapped optimizer.

  • num_warmup_steps – the number of steps to linearly increase the learning rate.

  • model_size – the hidden size parameter which dominates the number of parameters in your model.

kiwi.training.optimizers.OPTIMIZERS_MAPPING
kiwi.training.optimizers.optimizer_class(name)
kiwi.training.optimizers.optimizer_name(cls)
kiwi.training.optimizers.from_config(config: OptimizerConfig, parameters: Iterator[Parameter], model_size: int = None, training_data_size: int = None) → Union[Optimizer, List[Optimizer], Tuple[List[Optimizer], List[Any]]]
Parameters
  • config – common options shared by most optimizers

  • parameters – model parameters

  • model_size – required for the Noam LR schedule; if not provided, the mode of all parameters’ last dimension is used

Return: for compatibility with PyTorch-Lightning, any of these 3 options:
  • Single optimizer

  • List or Tuple - List of optimizers

  • Tuple of Two lists - The first with multiple optimizers, the second with

    learning-rate schedulers

Notes

We currently never return multiple optimizers or schedulers, so option 2 above is not taking place yet. Option 3 returns a single optimizer and scheduler inside lists.