kiwi.lib package¶
Submodules¶
kiwi.lib.evaluate module¶
-
kiwi.lib.evaluate.
evaluate_from_options
(options)[source]¶ - Evaluates a model’s predictions based on the flags received from
- the configuration files. Refer to configuration for a list of
available configuration flags for the evaluate pipeline.
Parameters: options (Namespace) – Namespace containing all pipeline options
kiwi.lib.jackknife module¶
kiwi.lib.predict module¶
-
kiwi.lib.predict.
load_model
(model_path)[source]¶ Load a pretrained model into a Predicter object.
Parameters: load_model (str) – A path to the saved model file. - Throws:
- Exception: If the path does not exist, or is not a valid model file.
-
kiwi.lib.predict.
predict_from_options
(options)[source]¶ Uses the configuration options to run the prediction pipeline. Iteratively calls setup, run and teardown.
Parameters: options (Namespace) – Namespace containing all parsed options.
-
kiwi.lib.predict.
run
(ModelClass, output_dir, pipeline_opts, model_opts)[source]¶ Runs the prediction pipeline. Loads the model and necessary files and creates the model’s predictions for all data received.
Parameters: - ModelClass (type) – Python Type of the Model to train
- output_dir – Directory to save predictions
- pipeline_options (Namespace) – Generic predict Options batch_size: Max batch size for predicting
- model_options (Namespace) – Model Specific options
Returns: Dictionary with format {‘target’:predictions}
Return type: Predictions (dict)
-
kiwi.lib.predict.
setup
(options)[source]¶ Analyze pipeline options and set up requirements to running the prediction pipeline. This includes setting up the output directory, random seeds and the device where predictions are run.
Parameters: options (Namespace) – Pipeline specific options Returns: Path to output directory Return type: output_dir(str)
kiwi.lib.search module¶
kiwi.lib.train module¶
-
class
kiwi.lib.train.
TrainRunInfo
(trainer)[source]¶ Bases:
object
Encapsulates relevant information on training runs.
Can be instantiated with a trainer object.
-
stats
¶ Stats of the best model so far
-
model_path
¶ Path of the best model so far
-
run_uuid
¶ Unique identifier of the current run
-
-
kiwi.lib.train.
log
(output_dir, config_options, config_file_name='train_config.yml', save_config=None)[source]¶ Logs configuration options for the current training run.
Parameters:
-
kiwi.lib.train.
retrieve_datasets
(fieldset, pipeline_options, model_options, output_dir)[source]¶ Creates Dataset objects for the training and validation sets.
Parses files according to pipeline and model options.
Parameters: - fieldset –
- pipeline_options (Namespace) –
Generic training options load_data (str): Input directory for loading preprocessed data
files.- load_model (str): Directory containing model.torch for loading
- pre-created model.
- resume (boolean): Indicates if you should resume training from a
- previous run.
- load_vocab (str): Directory containing vocab.torch file to be
- loaded.
- model_options (Namespace) – Model specific options.
- output_dir (str) – Path to directory where experiment files should be saved.
Returns: Training and validation datasets
Return type: datasets (Dataset)
-
kiwi.lib.train.
retrieve_trainer
(ModelClass, pipeline_options, model_options, vocabs, output_dir, device_id)[source]¶ Creates a Trainer object with an associated model.
This object encapsulates the logic behind training the model and checkpointing. This method uses the received pipeline options to instantiate a Trainer object with the the requested model and hyperparameters.
Parameters: - ModelClass –
- pipeline_options (Namespace) –
Generic training options resume (bool): Set to true if resuming an existing run. load_model (str): Directory containing model.torch for loading
pre-created model.- checkpoint_save (bool): Boolean indicating if snapshots should be
- saved after validation runs. warning: if false, will never save the model.
- checkpoint_keep_only_best (int): Indicates kiwi to keep the best
- n models.
- checkpoint_early_stop_patience (int): Stops training if metrics
- don’t improve after n validation runs.
- checkpoint_validation_steps (int): Perform validation every n
- training steps.
optimizer (string): The optimizer to be used in training. learning_rate (float): Starting learning rate. learning_rate_decay (float): Factor of learning rate decay. learning_rate_decay_start (int): Start decay after epoch x. log_interval (int): Log after k batches.
- model_options (Namespace) – Model specific options.
- vocabs (dict) – Vocab dictionary.
- output_dir (str or Path) – Output directory for models and stats concerning training.
- device_id (int) – The gpu id to be used in training. Set to negative to use cpu.
Returns: Trainer
-
kiwi.lib.train.
run
(ModelClass, output_dir, pipeline_options, model_options)[source]¶ Implements the main logic of the training module.
Instantiates the dataset, model class and sets their attributes according to the pipeline options received. Loads or creates a trainer and runs it.
Parameters: - ModelClass (Model) – Python Type of the Model to train
- output_dir – Directory to save models
- pipeline_options (Namespace) – Generic Train Options load_model: load pre-trained predictor model resume: load trainer state and resume training gpu_id: Set to non-negative integer to train on GPU train_batch_size: Batch Size for training valid_batch_size: Batch size for validation
- model_options (Namespace) – Model Specific options
Returns: The trainer object
-
kiwi.lib.train.
setup
(output_dir, seed=42, gpu_id=None, debug=False, quiet=False)[source]¶ Analyzes pipeline options and sets up requirements for running the training pipeline.
This includes setting up the output directory, random seeds and the device(s) where training is run.
Parameters: - output_dir – Path to directory to use or None, in which case one is created automatically.
- seed (int) – Random seed for all random engines (Python, PyTorch, NumPy).
- gpu_id (int) – GPU number to use or None to use the CPU.
- debug (bool) – Whether to increase the verbosity of output messages.
- quiet (bool) – Whether to decrease the verbosity of output messages. Takes precedence over debug.
Returns: Path to output directory
Return type: output_dir(str)
-
kiwi.lib.train.
teardown
(options)[source]¶ Tears down after executing prediction pipeline.
Parameters: options (Namespace) – Pipeline specific options
-
kiwi.lib.train.
train_from_file
(filename)[source]¶ Loads options from a config file and calls the training procedure.
Parameters: filename (str) – filename of the configuration file
-
kiwi.lib.train.
train_from_options
(options)[source]¶ Runs the entire training pipeline using the configuration options received.
These options include the pipeline and model options plus the model’s API.
Parameters: options (Namespace) – All the configuration options retrieved from either a config file or input flags and the model being used.
kiwi.lib.utils module¶
-
kiwi.lib.utils.
configure_device
(gpu_id)[source]¶ Configure gpu to be used in computation.
Parameters: gpu_id (int) – The id of the gpu to be used
-
kiwi.lib.utils.
configure_logging
(output_dir=None, debug=False, quiet=False)[source]¶ Configure the logger. Sets up the log format, logging level and output directory of logging.
Parameters:
-
kiwi.lib.utils.
configure_seed
(seed)[source]¶ Configure the random seed for all relevant packages. These include: random, numpy, torch and torch.cuda
Parameters: seed (int) – the random seed to be set
-
kiwi.lib.utils.
merge_namespaces
(*args)[source]¶ Utility function used to merge Namespaces. Useful for merging Argparse options.
Parameters: *args – Variable length list of Namespaces
-
kiwi.lib.utils.
parse_integer_with_positive_infinity
(string)[source]¶ Workaround to be able to pass both integers and infinity as CLAs.
Parameters: string – A string representation of an integer, or infinity
-
kiwi.lib.utils.
save_args_to_file
(file_name, **kwargs)[source]¶ Saves **kwargs to a file.
Parameters: file_name (str) – The name of the file where the args should be saved in.
-
kiwi.lib.utils.
save_config_file
(options, file_name)[source]¶ Saves a configuration file with OpenKiwi configuration options. Calls save_args_to_file.
Parameters: - options (Namespace) – Namespace with all configuration options that should be saved.
- file_name (str) – Name of the output configuration file.