kiwi.lib.train

Module Contents

Classes

TrainRunInfo

Encapsulate relevant information on training runs.

RunConfig

Options for each run.

CheckpointsConfig

Base class for all pydantic configs. Used to configure base behaviour of configs.

GPUConfig

Base class for all pydantic configs. Used to configure base behaviour of configs.

TrainerConfig

Base class for all pydantic configs. Used to configure base behaviour of configs.

Configuration

Base class for all pydantic configs. Used to configure base behaviour of configs.

Functions

train_from_file(filename) → TrainRunInfo

Load options from a config file and calls the training procedure.

train_from_configuration(configuration_dict) → TrainRunInfo

Run the entire training pipeline using the configuration options received.

setup_run(config: RunConfig, debug=False, quiet=False, anchor_dir: Path = None) → Tuple[Path, Optional[MLFlowTrackingLogger]]

Prepare for running the training pipeline.

run(config: Configuration, system_type: Union[Type[TLMSystem], Type[QESystem]] = QESystem) → TrainRunInfo

Instantiate the system according to the configuration and train it.

kiwi.lib.train.logger
class kiwi.lib.train.TrainRunInfo

Encapsulate relevant information on training runs.

model :QESystem

The last model when training finished.

best_metrics :Dict[str, Any]

Mapping of metrics of the best model.

best_model_path :Optional[Path]

Path of the best model, if it was saved to disk.

class kiwi.lib.train.RunConfig

Bases: kiwi.utils.io.BaseConfig

Options for each run.

seed :int = 42

Random seed

experiment_name :str = default

If using MLflow, it will log this run under this experiment name, which appears as a separate section in the UI. It will also be used in some messages and files.

output_dir :Path

Output several files for this run under this directory. If not specified, a directory under “./runs/” is created or reused based on the run_id. Files might also be sent to MLflow depending on the mlflow_always_log_artifacts option.

run_id :str

If specified, MLflow/Default Logger will log metrics and params under this ID. If it exists, the run status will change to running. This ID is also used for creating this run’s output directory if output_dir is not specified (Run ID must be a 32-character hex string).

use_mlflow :bool = False

Whether to use MLflow for tracking this run. If not installed, a message is shown

mlflow_tracking_uri :str = mlruns/

If using MLflow, logs model parameters, training metrics, and artifacts (files) to this MLflow server. Uses the localhost by default.

mlflow_always_log_artifacts :bool = False

If using MLFlow, always log (send) artifacts (files) to MLflow artifacts URI. By default (false), artifacts are only logged if MLflow is a remote server (as specified by –mlflow-tracking-uri option).All generated files are always saved in –output-dir, so it might be considered redundant to copy them to a local MLflow server. If this is not the case, set this option to true.

class kiwi.lib.train.CheckpointsConfig

Bases: kiwi.utils.io.BaseConfig

Base class for all pydantic configs. Used to configure base behaviour of configs.

validation_steps :Union[confloat(gt=0.0, le=1.0), PositiveInt] = 1.0

How often within one training epoch to check the validation set. If float, % of training epoch. If int, check every n batches.

save_top_k :int = 1

Save and keep only k best models according to main metric; -1 will keep all; 0 will never save a model.

early_stop_patience :conint(ge=0) = 0

Stop training if evaluation metrics do not improve after X validations; 0 disables this.

class kiwi.lib.train.GPUConfig

Bases: kiwi.utils.io.BaseConfig

Base class for all pydantic configs. Used to configure base behaviour of configs.

gpus :Union[int, List[int]] = 0

Use the number of GPUs specified if int, where 0 is no GPU. -1 is all GPUs. Alternatively, if a list, uses the GPU-ids specified (e.g., [0, 2]).

precision :Literal[16, 32] = 32

The floating point precision to be used while training the model. Available options are 32 or 16 bits.

amp_level :Literal['O0', 'O1', 'O2', 'O3'] = O0

The automatic-mixed-precision level to use. O0 is FP32 training. 01 is mixed precision training as popularized by NVIDIA Apex. O2 casts the model weights to FP16

but keeps certain master weights and batch norm in FP32 without patching Torch

functions. 03 is full FP16 training.

setup_gpu_ids(cls, v)

If asking to use CPU, let it be, outputting a warning if GPUs are available. If asking to use any GPU but none are available, fall back to CPU and warn user.

setup_amp_level(cls, v, values)

If precision is set to 16, amp_level needs to be greater than O0. Following the same logic, if amp_level is set to greater than O0, precision needs to be set to 16.

class kiwi.lib.train.TrainerConfig

Bases: kiwi.lib.train.GPUConfig

Base class for all pydantic configs. Used to configure base behaviour of configs.

resume :bool = False

Resume training a previous run. The run.run_id (and possibly run.experiment_name) option must be specified. Files are then searched under the “runs” directory. If not found, they are downloaded from the MLflow server (check the mlflow_tracking_uri option).

epochs :int = 50

Number of epochs for training.

gradient_accumulation_steps :int = 1

Accumulate gradients for the given number of steps (batches) before back-propagating.

gradient_max_norm :float = 0.0

Clip gradients with norm above this value; by default (0.0), do not clip.

main_metric :Union[str, List[str]]

Choose Primary Metric for this run.

log_interval :int = 100

Log every k batches.

log_save_interval :int = 100

Save accumulated log every k batches (does not seem to matter to MLflow logging).

checkpoint :CheckpointsConfig
deterministic :bool = True

If true enables cudnn.deterministic. Might make training slower, but ensures reproducibility.

class kiwi.lib.train.Configuration

Bases: kiwi.utils.io.BaseConfig

Base class for all pydantic configs. Used to configure base behaviour of configs.

run :RunConfig

Options specific to each run

trainer :TrainerConfig
data :WMTQEDataset.Config
system :QESystem.Config
debug :bool = False

Run training in fast_dev mode; only one batch is used for training and validation. This is useful to test out new models.

verbose :bool = False
quiet :bool = False
kiwi.lib.train.train_from_file(filename)TrainRunInfo

Load options from a config file and calls the training procedure.

Parameters

filename – of the configuration file.

Returns

an object with training information.

kiwi.lib.train.train_from_configuration(configuration_dict)TrainRunInfo

Run the entire training pipeline using the configuration options received.

Parameters

configuration_dict – dictionary with options.

Return: object with training information.

kiwi.lib.train.setup_run(config: RunConfig, debug=False, quiet=False, anchor_dir: Path = None) → Tuple[Path, Optional[MLFlowTrackingLogger]]

Prepare for running the training pipeline.

This includes setting up the output directory, random seeds, and loggers.

Parameters
  • config – configuration options.

  • quiet – whether to suppress info log messages.

  • debug – whether to additionally log debug messages (:param:`quiet` has precedence)

  • anchor_dir – directory to use as root for paths.

Returns

a tuple with the resolved path to the output directory and the experiment logger (None if not configured).

kiwi.lib.train.run(config: Configuration, system_type: Union[Type[TLMSystem], Type[QESystem]] = QESystem)TrainRunInfo

Instantiate the system according to the configuration and train it.

Load or create a trainer for doing it.

Parameters
  • config – generic training options.

  • system_type – class of system being used.

Returns

an object with training information.