kiwi.lib package


kiwi.lib.evaluate module

kiwi.lib.evaluate.eval_sentence_level(sent_gold, sent_preds)[source]
kiwi.lib.evaluate.eval_word_level(golds, pred_files, tag_name)[source]
Evaluates a model’s predictions based on the flags received from
the configuration files. Refer to configuration for a list of

available configuration flags for the evaluate pipeline.

Parameters:options (Namespace) – Namespace containing all pipeline options
kiwi.lib.evaluate.print_scores_table(scores, prefix='TARGET')[source]
kiwi.lib.evaluate.retrieve_gold_standard(pipeline_options, is_wmt18_format)[source]
kiwi.lib.evaluate.retrieve_predictions(pipeline_options, is_wmt18_pred_format)[source]
kiwi.lib.evaluate.retrieve_sentence_predictions(pipeline_options, pred_files)[source]
kiwi.lib.evaluate.score_sentence_level(gold, pred)[source]
kiwi.lib.evaluate.score_word_level(gold, prediction)[source]

kiwi.lib.jackknife module


Average an ensemble of predictions.

kiwi.lib.jackknife.reshape_by_lengths(sequence, lengths)[source], output_dir, pipeline_options, model_options, splits)[source]

kiwi.lib.predict module


Load a pretrained model into a Predicter object.

Parameters:load_model (str) – A path to the saved model file.
Exception: If the path does not exist, or is not a valid model file.

Uses the configuration options to run the prediction pipeline. Iteratively calls setup, run and teardown.

Parameters:options (Namespace) – Namespace containing all parsed options., output_dir, pipeline_opts, model_opts)[source]

Runs the prediction pipeline. Loads the model and necessary files and creates the model’s predictions for all data received.

  • ModelClass (type) – Python Type of the Model to train
  • output_dir – Directory to save predictions
  • pipeline_options (Namespace) – Generic predict Options batch_size: Max batch size for predicting
  • model_options (Namespace) – Model Specific options

Dictionary with format {‘target’:predictions}

Return type:

Predictions (dict)


Analyze pipeline options and set up requirements to running the prediction pipeline. This includes setting up the output directory, random seeds and the device where predictions are run.

Parameters:options (Namespace) – Pipeline specific options
Returns:Path to output directory
Return type:output_dir(str)

Tears down after executing prediction pipeline.

Parameters:options (Namespace) – Pipeline specific options module[source], external_options=None)[source], extra_options)[source][source]

kiwi.lib.train module

class kiwi.lib.train.TrainRunInfo(trainer)[source]

Bases: object

Encapsulates relevant information on training runs.

Can be instantiated with a trainer object.


Stats of the best model so far


Path of the best model so far


Unique identifier of the current run

kiwi.lib.train.log(output_dir, config_options, config_file_name='train_config.yml', save_config=None)[source]

Logs configuration options for the current training run.

  • output_dir (str) – Path to directory where experiment files should be saved.
  • config_options (Namespace) – Namespace representing all configuration options.
  • config_file_name (str) – Filename of the config file
  • save_config (str or Path) – Boolean stating if you should save a configuration file.
kiwi.lib.train.retrieve_datasets(fieldset, pipeline_options, model_options, output_dir)[source]

Creates Dataset objects for the training and validation sets.

Parses files according to pipeline and model options.

  • fieldset
  • pipeline_options (Namespace) –

    Generic training options load_data (str): Input directory for loading preprocessed data

    load_model (str): Directory containing model.torch for loading
    pre-created model.
    resume (boolean): Indicates if you should resume training from a
    previous run.
    load_vocab (str): Directory containing vocab.torch file to be
  • model_options (Namespace) – Model specific options.
  • output_dir (str) – Path to directory where experiment files should be saved.

Training and validation datasets

Return type:

datasets (Dataset)

kiwi.lib.train.retrieve_trainer(ModelClass, pipeline_options, model_options, vocabs, output_dir, device_id)[source]

Creates a Trainer object with an associated model.

This object encapsulates the logic behind training the model and checkpointing. This method uses the received pipeline options to instantiate a Trainer object with the the requested model and hyperparameters.

  • ModelClass
  • pipeline_options (Namespace) –

    Generic training options resume (bool): Set to true if resuming an existing run. load_model (str): Directory containing model.torch for loading

    pre-created model.
    checkpoint_save (bool): Boolean indicating if snapshots should be
    saved after validation runs. warning: if false, will never save the model.
    checkpoint_keep_only_best (int): Indicates kiwi to keep the best
    n models.
    checkpoint_early_stop_patience (int): Stops training if metrics
    don’t improve after n validation runs.
    checkpoint_validation_steps (int): Perform validation every n
    training steps.

    optimizer (string): The optimizer to be used in training. learning_rate (float): Starting learning rate. learning_rate_decay (float): Factor of learning rate decay. learning_rate_decay_start (int): Start decay after epoch x. log_interval (int): Log after k batches.

  • model_options (Namespace) – Model specific options.
  • vocabs (dict) – Vocab dictionary.
  • output_dir (str or Path) – Output directory for models and stats concerning training.
  • device_id (int) – The gpu id to be used in training. Set to negative to use cpu.

Trainer, output_dir, pipeline_options, model_options)[source]

Implements the main logic of the training module.

Instantiates the dataset, model class and sets their attributes according to the pipeline options received. Loads or creates a trainer and runs it.

  • ModelClass (Model) – Python Type of the Model to train
  • output_dir – Directory to save models
  • pipeline_options (Namespace) – Generic Train Options load_model: load pre-trained predictor model resume: load trainer state and resume training gpu_id: Set to non-negative integer to train on GPU train_batch_size: Batch Size for training valid_batch_size: Batch size for validation
  • model_options (Namespace) – Model Specific options

The trainer object

kiwi.lib.train.setup(output_dir, seed=42, gpu_id=None, debug=False, quiet=False)[source]

Analyzes pipeline options and sets up requirements for running the training pipeline.

This includes setting up the output directory, random seeds and the device(s) where training is run.

  • output_dir – Path to directory to use or None, in which case one is created automatically.
  • seed (int) – Random seed for all random engines (Python, PyTorch, NumPy).
  • gpu_id (int) – GPU number to use or None to use the CPU.
  • debug (bool) – Whether to increase the verbosity of output messages.
  • quiet (bool) – Whether to decrease the verbosity of output messages. Takes precedence over debug.

Path to output directory

Return type:



Tears down after executing prediction pipeline.

Parameters:options (Namespace) – Pipeline specific options

Loads options from a config file and calls the training procedure.

Parameters:filename (str) – filename of the configuration file

Runs the entire training pipeline using the configuration options received.

These options include the pipeline and model options plus the model’s API.

Parameters:options (Namespace) – All the configuration options retrieved from either a config file or input flags and the model being used.

kiwi.lib.utils module


Configure gpu to be used in computation.

Parameters:gpu_id (int) – The id of the gpu to be used
kiwi.lib.utils.configure_logging(output_dir=None, debug=False, quiet=False)[source]

Configure the logger. Sets up the log format, logging level and output directory of logging.

  • output_dir – The directory where log output will be stored. Defaults to None.
  • debug (bool) – Change logging level to debug.
  • quiet (bool) – Change logging level to warning to supress info logs.

Configure the random seed for all relevant packages. These include: random, numpy, torch and torch.cuda

Parameters:seed (int) – the random seed to be set

Utility function used to merge Namespaces. Useful for merging Argparse options.

Parameters:*args – Variable length list of Namespaces

Workaround to be able to pass both integers and infinity as CLAs.

Parameters:string – A string representation of an integer, or infinity
kiwi.lib.utils.save_args_to_file(file_name, **kwargs)[source]

Saves **kwargs to a file.

Parameters:file_name (str) – The name of the file where the args should be saved in.
kiwi.lib.utils.save_config_file(options, file_name)[source]

Saves a configuration file with OpenKiwi configuration options. Calls save_args_to_file.

  • options (Namespace) – Namespace with all configuration options that should be saved.
  • file_name (str) – Name of the output configuration file.
kiwi.lib.utils.setup_output_directory(output_dir, run_uuid=None, experiment_id=None, create=True)[source]

Sets up the output directory. This means either creating one, or verifying that the provided directory exists. Output directories are created using the run and experiment ids.

  • output_dir (str) – The target output directory
  • run_uuid – The current hash of the current run.
  • experiment_id – The id of the current experiment
  • create (bool) – Boolean indicating whether to create a new folder.

Module contents