kiwi.lib package

Submodules

kiwi.lib.evaluate module

kiwi.lib.evaluate.eval_sentence_level(sent_gold, sent_preds)[source]
kiwi.lib.evaluate.eval_word_level(golds, pred_files, tag_name)[source]
kiwi.lib.evaluate.evaluate_from_options(options)[source]
Evaluates a model’s predictions based on the flags received from
the configuration files. Refer to configuration for a list of

available configuration flags for the evaluate pipeline.

Parameters:options (Namespace) – Namespace containing all pipeline options
kiwi.lib.evaluate.print_scores_table(scores, prefix='TARGET')[source]
kiwi.lib.evaluate.print_sentences_ranking_table(scores)[source]
kiwi.lib.evaluate.print_sentences_scoring_table(scores)[source]
kiwi.lib.evaluate.retrieve_gold_standard(pipeline_options, is_wmt18_format)[source]
kiwi.lib.evaluate.retrieve_predictions(pipeline_options, is_wmt18_pred_format)[source]
kiwi.lib.evaluate.retrieve_sentence_predictions(pipeline_options, pred_files)[source]
kiwi.lib.evaluate.score_sentence_level(gold, pred)[source]
kiwi.lib.evaluate.score_word_level(gold, prediction)[source]
kiwi.lib.evaluate.setup()[source]
kiwi.lib.evaluate.teardown()[source]

kiwi.lib.jackknife module

kiwi.lib.jackknife.average_all(predictions)[source]
kiwi.lib.jackknife.average_predictions(ensemble)[source]

Average an ensemble of predictions.

kiwi.lib.jackknife.reshape_by_lengths(sequence, lengths)[source]
kiwi.lib.jackknife.run(ModelClass, output_dir, pipeline_options, model_options, splits)[source]
kiwi.lib.jackknife.run_from_options(options)[source]
kiwi.lib.jackknife.teardown(options)[source]

kiwi.lib.predict module

kiwi.lib.predict.load_model(model_path)[source]

Load a pretrained model into a Predicter object.

Parameters:load_model (str) – A path to the saved model file.
Throws:
Exception: If the path does not exist, or is not a valid model file.
kiwi.lib.predict.predict_from_options(options)[source]

Uses the configuration options to run the prediction pipeline. Iteratively calls setup, run and teardown.

Parameters:options (Namespace) – Namespace containing all parsed options.
kiwi.lib.predict.run(ModelClass, output_dir, pipeline_opts, model_opts)[source]

Runs the prediction pipeline. Loads the model and necessary files and creates the model’s predictions for all data received.

Parameters:
  • ModelClass (type) – Python Type of the Model to train
  • output_dir – Directory to save predictions
  • pipeline_options (Namespace) – Generic predict Options batch_size: Max batch size for predicting
  • model_options (Namespace) – Model Specific options
Returns:

Dictionary with format {‘target’:predictions}

Return type:

Predictions (dict)

kiwi.lib.predict.setup(options)[source]

Analyze pipeline options and set up requirements to running the prediction pipeline. This includes setting up the output directory, random seeds and the device where predictions are run.

Parameters:options (Namespace) – Pipeline specific options
Returns:Path to output directory
Return type:output_dir(str)
kiwi.lib.predict.teardown(options)[source]

Tears down after executing prediction pipeline.

Parameters:options (Namespace) – Pipeline specific options

kiwi.lib.search module

kiwi.lib.search.get_action(option)[source]
kiwi.lib.search.main(argv=None, external_options=None)[source]
kiwi.lib.search.run(options, extra_options)[source]
kiwi.lib.search.split_options(options)[source]

kiwi.lib.train module

class kiwi.lib.train.TrainRunInfo(trainer)[source]

Bases: object

Encapsulates relevant information on training runs.

Can be instantiated with a trainer object.

stats

Stats of the best model so far

model_path

Path of the best model so far

run_uuid

Unique identifier of the current run

kiwi.lib.train.log(output_dir, config_options, config_file_name='train_config.yml', save_config=None)[source]

Logs configuration options for the current training run.

Parameters:
  • output_dir (str) – Path to directory where experiment files should be saved.
  • config_options (Namespace) – Namespace representing all configuration options.
  • config_file_name (str) – Filename of the config file
  • save_config (str or Path) – Boolean stating if you should save a configuration file.
kiwi.lib.train.retrieve_datasets(fieldset, pipeline_options, model_options, output_dir)[source]

Creates Dataset objects for the training and validation sets.

Parses files according to pipeline and model options.

Parameters:
  • fieldset
  • pipeline_options (Namespace) –

    Generic training options load_data (str): Input directory for loading preprocessed data

    files.
    load_model (str): Directory containing model.torch for loading
    pre-created model.
    resume (boolean): Indicates if you should resume training from a
    previous run.
    load_vocab (str): Directory containing vocab.torch file to be
    loaded.
  • model_options (Namespace) – Model specific options.
  • output_dir (str) – Path to directory where experiment files should be saved.
Returns:

Training and validation datasets

Return type:

datasets (Dataset)

kiwi.lib.train.retrieve_trainer(ModelClass, pipeline_options, model_options, vocabs, output_dir, device_id)[source]

Creates a Trainer object with an associated model.

This object encapsulates the logic behind training the model and checkpointing. This method uses the received pipeline options to instantiate a Trainer object with the the requested model and hyperparameters.

Parameters:
  • ModelClass
  • pipeline_options (Namespace) –

    Generic training options resume (bool): Set to true if resuming an existing run. load_model (str): Directory containing model.torch for loading

    pre-created model.
    checkpoint_save (bool): Boolean indicating if snapshots should be
    saved after validation runs. warning: if false, will never save the model.
    checkpoint_keep_only_best (int): Indicates kiwi to keep the best
    n models.
    checkpoint_early_stop_patience (int): Stops training if metrics
    don’t improve after n validation runs.
    checkpoint_validation_steps (int): Perform validation every n
    training steps.

    optimizer (string): The optimizer to be used in training. learning_rate (float): Starting learning rate. learning_rate_decay (float): Factor of learning rate decay. learning_rate_decay_start (int): Start decay after epoch x. log_interval (int): Log after k batches.

  • model_options (Namespace) – Model specific options.
  • vocabs (dict) – Vocab dictionary.
  • output_dir (str or Path) – Output directory for models and stats concerning training.
  • device_id (int) – The gpu id to be used in training. Set to negative to use cpu.
Returns:

Trainer

kiwi.lib.train.run(ModelClass, output_dir, pipeline_options, model_options)[source]

Implements the main logic of the training module.

Instantiates the dataset, model class and sets their attributes according to the pipeline options received. Loads or creates a trainer and runs it.

Parameters:
  • ModelClass (Model) – Python Type of the Model to train
  • output_dir – Directory to save models
  • pipeline_options (Namespace) – Generic Train Options load_model: load pre-trained predictor model resume: load trainer state and resume training gpu_id: Set to non-negative integer to train on GPU train_batch_size: Batch Size for training valid_batch_size: Batch size for validation
  • model_options (Namespace) – Model Specific options
Returns:

The trainer object

kiwi.lib.train.setup(output_dir, seed=42, gpu_id=None, debug=False, quiet=False)[source]

Analyzes pipeline options and sets up requirements for running the training pipeline.

This includes setting up the output directory, random seeds and the device(s) where training is run.

Parameters:
  • output_dir – Path to directory to use or None, in which case one is created automatically.
  • seed (int) – Random seed for all random engines (Python, PyTorch, NumPy).
  • gpu_id (int) – GPU number to use or None to use the CPU.
  • debug (bool) – Whether to increase the verbosity of output messages.
  • quiet (bool) – Whether to decrease the verbosity of output messages. Takes precedence over debug.
Returns:

Path to output directory

Return type:

output_dir(str)

kiwi.lib.train.teardown(options)[source]

Tears down after executing prediction pipeline.

Parameters:options (Namespace) – Pipeline specific options
kiwi.lib.train.train_from_file(filename)[source]

Loads options from a config file and calls the training procedure.

Parameters:filename (str) – filename of the configuration file
kiwi.lib.train.train_from_options(options)[source]

Runs the entire training pipeline using the configuration options received.

These options include the pipeline and model options plus the model’s API.

Parameters:options (Namespace) – All the configuration options retrieved from either a config file or input flags and the model being used.

kiwi.lib.utils module

kiwi.lib.utils.configure_device(gpu_id)[source]

Configure gpu to be used in computation.

Parameters:gpu_id (int) – The id of the gpu to be used
kiwi.lib.utils.configure_logging(output_dir=None, debug=False, quiet=False)[source]

Configure the logger. Sets up the log format, logging level and output directory of logging.

Parameters:
  • output_dir – The directory where log output will be stored. Defaults to None.
  • debug (bool) – Change logging level to debug.
  • quiet (bool) – Change logging level to warning to supress info logs.
kiwi.lib.utils.configure_seed(seed)[source]

Configure the random seed for all relevant packages. These include: random, numpy, torch and torch.cuda

Parameters:seed (int) – the random seed to be set
kiwi.lib.utils.merge_namespaces(*args)[source]

Utility function used to merge Namespaces. Useful for merging Argparse options.

Parameters:*args – Variable length list of Namespaces
kiwi.lib.utils.parse_integer_with_positive_infinity(string)[source]

Workaround to be able to pass both integers and infinity as CLAs.

Parameters:string – A string representation of an integer, or infinity
kiwi.lib.utils.save_args_to_file(file_name, **kwargs)[source]

Saves **kwargs to a file.

Parameters:file_name (str) – The name of the file where the args should be saved in.
kiwi.lib.utils.save_config_file(options, file_name)[source]

Saves a configuration file with OpenKiwi configuration options. Calls save_args_to_file.

Parameters:
  • options (Namespace) – Namespace with all configuration options that should be saved.
  • file_name (str) – Name of the output configuration file.
kiwi.lib.utils.setup_output_directory(output_dir, run_uuid=None, experiment_id=None, create=True)[source]

Sets up the output directory. This means either creating one, or verifying that the provided directory exists. Output directories are created using the run and experiment ids.

Parameters:
  • output_dir (str) – The target output directory
  • run_uuid – The current hash of the current run.
  • experiment_id – The id of the current experiment
  • create (bool) – Boolean indicating whether to create a new folder.

Module contents