kiwi.lib.predict

Module Contents

Classes

RunConfig

Base class for all pydantic configs. Used to configure base behaviour of configs.

Configuration

Base class for all pydantic configs. Used to configure base behaviour of configs.

Functions

load_system(system_path: Union[str, Path], gpu_id: Optional[int] = None)

Load a pretrained system (model) into a Runner object.

predict_from_configuration(configuration_dict: Dict[str, Any])

Run the entire prediction pipeline using the configuration options received.

run(config: Configuration, output_dir: Path) → Tuple[Dict[str, List], Optional[MetricsReport]]

Run the prediction pipeline.

make_predictions(output_dir: Path, best_model_path: Path, data_partition: Literal['train', 'valid', 'test'], data_config: WMTQEDataset.Config, outputs_config: QEOutputs.Config = None, batch_size: Union[int, BatchSizeConfig] = None, num_workers: int = 0, gpu_id: int = None)

Make predictions over the validation set using the best model created during

setup_run(config: RunConfig, quiet=False, debug=False, anchor_dir: Path = None) → Path

Prepare for running the prediction pipeline.

kiwi.lib.predict.logger
class kiwi.lib.predict.RunConfig

Bases: kiwi.utils.io.BaseConfig

Base class for all pydantic configs. Used to configure base behaviour of configs.

seed :int = 42

Random seed

run_id :str

If specified, MLflow/Default Logger will log metrics and params under this ID. If it exists, the run status will change to running. This ID is also used for creating this run’s output directory. (Run ID must be a 32-character hex string).

output_dir :Path

Output several files for this run under this directory. If not specified, a directory under “runs” is created or reused based on the Run UUID.

predict_on_data_partition :Literal['train', 'valid', 'test'] = test

Name of the data partition to predict upon. File names are read from the corresponding data configuration field.

check_consistency(cls, v, values)
class kiwi.lib.predict.Configuration

Bases: kiwi.utils.io.BaseConfig

Base class for all pydantic configs. Used to configure base behaviour of configs.

run :RunConfig
data :WMTQEDataset.Config
system :QESystem.Config
use_gpu :bool = False

If true and only if available, use the CUDA device specified in gpu_id or the first CUDA device. Otherwise, use the CPU.

gpu_id :Optional[int]

Use CUDA on the listed device, only if use_gpu is true.

verbose :bool = False
quiet :bool = False
enforce_loading(cls, v)
setup_gpu(cls, v)
setup_gpu_id(cls, v, values)
kiwi.lib.predict.load_system(system_path: Union[str, Path], gpu_id: Optional[int] = None)

Load a pretrained system (model) into a Runner object.

Parameters
  • system_path – A path to the saved checkpoint file produced by a training run.

  • gpu_id – id of the gpu to load the model into (-1 or None to use CPU)

Throws:

Exception: If the path does not exist, or is not a valid system file.

kiwi.lib.predict.predict_from_configuration(configuration_dict: Dict[str, Any])

Run the entire prediction pipeline using the configuration options received.

kiwi.lib.predict.run(config: Configuration, output_dir: Path) → Tuple[Dict[str, List], Optional[MetricsReport]]

Run the prediction pipeline.

Load the model and necessary files and create the model’s predictions for the configured data partition.

Parameters
  • config – validated configuration values for the (predict) pipeline.

  • output_dir – directory where to save predictions.

Returns

Dictionary with format {‘target’: predictions}

Return type

Predictions

kiwi.lib.predict.make_predictions(output_dir: Path, best_model_path: Path, data_partition: Literal[‘train’, ‘valid’, ‘test’], data_config: WMTQEDataset.Config, outputs_config: QEOutputs.Config = None, batch_size: Union[int, BatchSizeConfig] = None, num_workers: int = 0, gpu_id: int = None)

Make predictions over the validation set using the best model created during training.

Parameters
  • output_dir – output Directory where predictions should be saved.

  • best_model_path – path pointing to the checkpoint with best performance.

  • data_partition – on which dataset to predict (one of ‘train’, ‘valid’, ‘test’).

  • data_config – configuration containing options for the data_partition set.

  • outputs_config – configuration specifying which outputs to activate.

  • batch_size – for predicting.

  • num_workers – number of parallel data loaders.

  • gpu_id – GPU to use for predicting; 0 for CPU.

Returns

predictions}.

Return type

dictionary with predictions in the format {‘target’

kiwi.lib.predict.setup_run(config: RunConfig, quiet=False, debug=False, anchor_dir: Path = None) → Path

Prepare for running the prediction pipeline.

This includes setting up the output directory, random seeds, and loggers.

Parameters
  • config – configuration options.

  • quiet – whether to suppress info log messages.

  • debug – whether to additionally log debug messages (:param:`quiet` has precedence)

  • anchor_dir – directory to use as root for paths.

Returns

the resolved path to the output directory.