kiwi.lib.train
TrainRunInfo
Encapsulate relevant information on training runs.
RunConfig
Options for each run.
CheckpointsConfig
Base class for all pydantic configs. Used to configure base behaviour of configs.
GPUConfig
TrainerConfig
Configuration
train_from_file(filename) → TrainRunInfo
train_from_file
Load options from a config file and calls the training procedure.
train_from_configuration(configuration_dict) → TrainRunInfo
train_from_configuration
Run the entire training pipeline using the configuration options received.
setup_run(config: RunConfig, debug=False, quiet=False, anchor_dir: Path = None) → Tuple[Path, Optional[MLFlowTrackingLogger]]
setup_run
Prepare for running the training pipeline.
run(config: Configuration, system_type: Union[Type[TLMSystem], Type[QESystem]] = QESystem) → TrainRunInfo
run
Instantiate the system according to the configuration and train it.
kiwi.lib.train.
logger
model
The last model when training finished.
best_metrics
Mapping of metrics of the best model.
best_model_path
Path of the best model, if it was saved to disk.
Bases: kiwi.utils.io.BaseConfig
kiwi.utils.io.BaseConfig
seed
Random seed
experiment_name
If using MLflow, it will log this run under this experiment name, which appears as a separate section in the UI. It will also be used in some messages and files.
output_dir
Output several files for this run under this directory. If not specified, a directory under “./runs/” is created or reused based on the run_id. Files might also be sent to MLflow depending on the mlflow_always_log_artifacts option.
run_id
mlflow_always_log_artifacts
If specified, MLflow/Default Logger will log metrics and params under this ID. If it exists, the run status will change to running. This ID is also used for creating this run’s output directory if output_dir is not specified (Run ID must be a 32-character hex string).
use_mlflow
Whether to use MLflow for tracking this run. If not installed, a message is shown
mlflow_tracking_uri
If using MLflow, logs model parameters, training metrics, and artifacts (files) to this MLflow server. Uses the localhost by default.
If using MLFlow, always log (send) artifacts (files) to MLflow artifacts URI. By default (false), artifacts are only logged if MLflow is a remote server (as specified by –mlflow-tracking-uri option).All generated files are always saved in –output-dir, so it might be considered redundant to copy them to a local MLflow server. If this is not the case, set this option to true.
validation_steps
How often within one training epoch to check the validation set. If float, % of training epoch. If int, check every n batches.
save_top_k
Save and keep only k best models according to main metric; -1 will keep all; 0 will never save a model.
k
early_stop_patience
Stop training if evaluation metrics do not improve after X validations; 0 disables this.
gpus
Use the number of GPUs specified if int, where 0 is no GPU. -1 is all GPUs. Alternatively, if a list, uses the GPU-ids specified (e.g., [0, 2]).
precision
The floating point precision to be used while training the model. Available options are 32 or 16 bits.
amp_level
The automatic-mixed-precision level to use. O0 is FP32 training. 01 is mixed precision training as popularized by NVIDIA Apex. O2 casts the model weights to FP16
but keeps certain master weights and batch norm in FP32 without patching Torch
functions. 03 is full FP16 training.
setup_gpu_ids
If asking to use CPU, let it be, outputting a warning if GPUs are available. If asking to use any GPU but none are available, fall back to CPU and warn user.
setup_amp_level
If precision is set to 16, amp_level needs to be greater than O0. Following the same logic, if amp_level is set to greater than O0, precision needs to be set to 16.
Bases: kiwi.lib.train.GPUConfig
kiwi.lib.train.GPUConfig
resume
Resume training a previous run. The run.run_id (and possibly run.experiment_name) option must be specified. Files are then searched under the “runs” directory. If not found, they are downloaded from the MLflow server (check the mlflow_tracking_uri option).
epochs
Number of epochs for training.
gradient_accumulation_steps
Accumulate gradients for the given number of steps (batches) before back-propagating.
gradient_max_norm
Clip gradients with norm above this value; by default (0.0), do not clip.
main_metric
Choose Primary Metric for this run.
log_interval
Log every k batches.
log_save_interval
Save accumulated log every k batches (does not seem to matter to MLflow logging).
checkpoint
deterministic
If true enables cudnn.deterministic. Might make training slower, but ensures reproducibility.
Options specific to each run
trainer
data
system
debug
Run training in fast_dev mode; only one batch is used for training and validation. This is useful to test out new models.
verbose
quiet
filename – of the configuration file.
an object with training information.
configuration_dict – dictionary with options.
Return: object with training information.
This includes setting up the output directory, random seeds, and loggers.
config – configuration options.
quiet – whether to suppress info log messages.
debug – whether to additionally log debug messages (:param:`quiet` has precedence)
anchor_dir – directory to use as root for paths.
a tuple with the resolved path to the output directory and the experiment logger (None if not configured).
None
Load or create a trainer for doing it.
config – generic training options.
system_type – class of system being used.
kiwi.lib.search
kiwi.lib.utils