kiwi.lib package¶

Submodules¶

kiwi.lib.evaluate module¶

kiwi.lib.evaluate.eval_sentence_level(sent_gold, sent_preds)[source]¶

kiwi.lib.evaluate.eval_word_level(golds, pred_files, tag_name)[source]¶

kiwi.lib.evaluate.evaluate_from_options(options)[source]¶

Evaluates a model’s predictions based on the flags received from: the configuration files. Refer to configuration for a list of

available configuration flags for the evaluate pipeline.

Parameters:	options (Namespace) – Namespace containing all pipeline options

kiwi.lib.evaluate.print_scores_table(scores, prefix='TARGET')[source]¶

kiwi.lib.evaluate.print_sentences_ranking_table(scores)[source]¶

kiwi.lib.evaluate.print_sentences_scoring_table(scores)[source]¶

kiwi.lib.evaluate.retrieve_gold_standard(pipeline_options, is_wmt18_format)[source]¶

kiwi.lib.evaluate.retrieve_predictions(pipeline_options, is_wmt18_pred_format)[source]¶

kiwi.lib.evaluate.retrieve_sentence_predictions(pipeline_options, pred_files)[source]¶

kiwi.lib.evaluate.score_sentence_level(gold, pred)[source]¶

kiwi.lib.evaluate.score_word_level(gold, prediction)[source]¶

kiwi.lib.evaluate.setup()[source]¶

kiwi.lib.evaluate.teardown()[source]¶

kiwi.lib.jackknife module¶

kiwi.lib.jackknife.average_all(predictions)[source]¶

kiwi.lib.jackknife.average_predictions(ensemble)[source]¶: Average an ensemble of predictions.

kiwi.lib.jackknife.reshape_by_lengths(sequence, lengths)[source]¶

kiwi.lib.jackknife.run(ModelClass, output_dir, pipeline_options, model_options, splits)[source]¶

kiwi.lib.jackknife.run_from_options(options)[source]¶

kiwi.lib.jackknife.teardown(options)[source]¶

kiwi.lib.predict module¶

kiwi.lib.predict.load_model(model_path)[source]¶

Load a pretrained model into a Predicter object.

Parameters:	load_model (str) – A path to the saved model file.

Throws:: Exception: If the path does not exist, or is not a valid model file.

kiwi.lib.predict.predict_from_options(options)[source]¶

Uses the configuration options to run the prediction pipeline. Iteratively calls setup, run and teardown.

Parameters:	options (Namespace) – Namespace containing all parsed options.

kiwi.lib.predict.run(ModelClass, output_dir, pipeline_opts, model_opts)[source]¶

Runs the prediction pipeline. Loads the model and necessary files and creates the model’s predictions for all data received.

Parameters:	ModelClass (type) – Python Type of the Model to train output_dir – Directory to save predictions pipeline_options (Namespace) – Generic predict Options batch_size: Max batch size for predicting model_options (Namespace) – Model Specific options
Returns:	Dictionary with format {‘target’:predictions}
Return type:	Predictions (dict)

kiwi.lib.predict.setup(options)[source]¶

Analyze pipeline options and set up requirements to running the prediction pipeline. This includes setting up the output directory, random seeds and the device where predictions are run.

Parameters:	options (Namespace) – Pipeline specific options
Returns:	Path to output directory
Return type:	output_dir(str)

kiwi.lib.predict.teardown(options)[source]¶

Tears down after executing prediction pipeline.

Parameters:	options (Namespace) – Pipeline specific options

kiwi.lib.search module¶

kiwi.lib.search.get_action(option)[source]¶

kiwi.lib.search.main(argv=None, external_options=None)[source]¶

kiwi.lib.search.run(options, extra_options)[source]¶

kiwi.lib.search.split_options(options)[source]¶

kiwi.lib.train module¶

class kiwi.lib.train.TrainRunInfo(trainer)[source]¶

Bases: object

Encapsulates relevant information on training runs.

Can be instantiated with a trainer object.

stats¶: Stats of the best model so far

model_path¶: Path of the best model so far

run_uuid¶: Unique identifier of the current run

kiwi.lib.train.log(output_dir, config_options, config_file_name='train_config.yml', save_config=None)[source]¶

Logs configuration options for the current training run.

Parameters:	output_dir (str) – Path to directory where experiment files should be saved. config_options (Namespace) – Namespace representing all configuration options. config_file_name (str) – Filename of the config file save_config (str or Path) – Boolean stating if you should save a configuration file.

kiwi.lib.train.retrieve_datasets(fieldset, pipeline_options, model_options, output_dir)[source]¶

Creates Dataset objects for the training and validation sets.

Parses files according to pipeline and model options.

Parameters:	fieldset – pipeline_options (Namespace) – Generic training options load_data (str): Input directory for loading preprocessed data files. load_model (str): Directory containing model.torch for loading pre-created model. resume (boolean): Indicates if you should resume training from a previous run. load_vocab (str): Directory containing vocab.torch file to be loaded. model_options (Namespace) – Model specific options. output_dir (str) – Path to directory where experiment files should be saved.
Returns:	Training and validation datasets
Return type:	datasets (Dataset)

kiwi.lib.train.retrieve_trainer(ModelClass, pipeline_options, model_options, vocabs, output_dir, device_id)[source]¶

Creates a Trainer object with an associated model.

This object encapsulates the logic behind training the model and checkpointing. This method uses the received pipeline options to instantiate a Trainer object with the the requested model and hyperparameters.

Parameters:

ModelClass –
pipeline_options (Namespace) –
Generic training options resume (bool): Set to true if resuming an existing run. load_model (str): Directory containing model.torch for loading

pre-created model.

checkpoint_save (bool): Boolean indicating if snapshots should be

saved after validation runs. warning: if false, will never save the model.

checkpoint_keep_only_best (int): Indicates kiwi to keep the best

n models.

checkpoint_early_stop_patience (int): Stops training if metrics

don’t improve after n validation runs.

checkpoint_validation_steps (int): Perform validation every n

training steps.

optimizer (string): The optimizer to be used in training. learning_rate (float): Starting learning rate. learning_rate_decay (float): Factor of learning rate decay. learning_rate_decay_start (int): Start decay after epoch x. log_interval (int): Log after k batches.
model_options (Namespace) – Model specific options.
vocabs (dict) – Vocab dictionary.
output_dir (str or Path) – Output directory for models and stats concerning training.
device_id (int) – The gpu id to be used in training. Set to negative to use cpu.

Returns:

Trainer

kiwi.lib.train.run(ModelClass, output_dir, pipeline_options, model_options)[source]¶

Implements the main logic of the training module.

Instantiates the dataset, model class and sets their attributes according to the pipeline options received. Loads or creates a trainer and runs it.

Parameters:

ModelClass (Model) – Python Type of the Model to train
output_dir – Directory to save models
pipeline_options (Namespace) – Generic Train Options load_model: load pre-trained predictor model resume: load trainer state and resume training gpu_id: Set to non-negative integer to train on GPU train_batch_size: Batch Size for training valid_batch_size: Batch size for validation
model_options (Namespace) – Model Specific options

Returns:

The trainer object

kiwi.lib.train.setup(output_dir, seed=42, gpu_id=None, debug=False, quiet=False)[source]¶

Analyzes pipeline options and sets up requirements for running the training pipeline.

This includes setting up the output directory, random seeds and the device(s) where training is run.

Parameters:	output_dir – Path to directory to use or None, in which case one is created automatically. seed (int) – Random seed for all random engines (Python, PyTorch, NumPy). gpu_id (int) – GPU number to use or None to use the CPU. debug (bool) – Whether to increase the verbosity of output messages. quiet (bool) – Whether to decrease the verbosity of output messages. Takes precedence over debug.
Returns:	Path to output directory
Return type:	output_dir(str)

kiwi.lib.train.teardown(options)[source]¶

Tears down after executing prediction pipeline.

Parameters:	options (Namespace) – Pipeline specific options

kiwi.lib.train.train_from_file(filename)[source]¶

Loads options from a config file and calls the training procedure.

Parameters:	filename (str) – filename of the configuration file

kiwi.lib.train.train_from_options(options)[source]¶

Runs the entire training pipeline using the configuration options received.

These options include the pipeline and model options plus the model’s API.

Parameters:	options (Namespace) – All the configuration options retrieved from either a config file or input flags and the model being used.

kiwi.lib.utils module¶

kiwi.lib.utils.configure_device(gpu_id)[source]¶

Configure gpu to be used in computation.

Parameters:	gpu_id (int) – The id of the gpu to be used

kiwi.lib.utils.configure_logging(output_dir=None, debug=False, quiet=False)[source]¶

Configure the logger. Sets up the log format, logging level and output directory of logging.

Parameters:	output_dir – The directory where log output will be stored. Defaults to None. debug (bool) – Change logging level to debug. quiet (bool) – Change logging level to warning to supress info logs.

kiwi.lib.utils.configure_seed(seed)[source]¶

Configure the random seed for all relevant packages. These include: random, numpy, torch and torch.cuda

Parameters:	seed (int) – the random seed to be set

kiwi.lib.utils.merge_namespaces(*args)[source]¶

Utility function used to merge Namespaces. Useful for merging Argparse options.

Parameters:	*args – Variable length list of Namespaces

kiwi.lib.utils.parse_integer_with_positive_infinity(string)[source]¶

Workaround to be able to pass both integers and infinity as CLAs.

Parameters:	string – A string representation of an integer, or infinity

kiwi.lib.utils.save_args_to_file(file_name, **kwargs)[source]¶

Saves **kwargs to a file.

Parameters:	file_name (str) – The name of the file where the args should be saved in.

kiwi.lib.utils.save_config_file(options, file_name)[source]¶

Saves a configuration file with OpenKiwi configuration options. Calls save_args_to_file.

Parameters:	options (Namespace) – Namespace with all configuration options that should be saved. file_name (str) – Name of the output configuration file.

kiwi.lib.utils.setup_output_directory(output_dir, run_uuid=None, experiment_id=None, create=True)[source]¶

Sets up the output directory. This means either creating one, or verifying that the provided directory exists. Output directories are created using the run and experiment ids.

Parameters:	output_dir (str) – The target output directory run_uuid – The current hash of the current run. experiment_id – The id of the current experiment create (bool) – Boolean indicating whether to create a new folder.