Kiwi’s functionality is split in pipelines:
pipelines
train, pretrain, predict, evaluate, search
Currently supported models are:
NuQE (http://www.aclweb.org/anthology/Q17-1015)
PredictorEstimator (http://www.aclweb.org/anthology/W17-4763)
BERT
XLM
XLM-Roberta
All pipelines and models are configured via:
YAML files, for CLI or API usage
a dictionary, for API usage
The predicting pipeline additionally provides a simplified interface with explicit arguments.
For CLI usage, the general command is:
kiwi (train|pretrain|predict|evaluate|search) CONFIG_FILE
Example configuration files can be found in config/. Details are covered in Configuration, including how to override options in the CLI.
config/
Alternatively, the same example configurations can be obtained by running:
kiwi (train|pretrain|predict|evaluate|search) --example
This will print the config to the terminal, but can easily be redirected to a file:
kiwi train --example > train.yaml
The PredictorEstimator requires pretraining the Predictor model with parallel data. This type of model is called a TLM model, for “Translation Language Model”.
PredictorEstimator
The BERT-based models are already pretrained. However, they can be fine-tuned before being used for QE. To do so, it is currently necessary to use the original scripts, which can be found in scripts/pre_finetuning_transformers.
scripts/pre_finetuning_transformers
Examples of how to call Kiwi to train BERT for QE:
kiwi train config/bert.yaml
Or:
from kiwi.lib.train import train_from_file run_info = train_from_file('config/bert.yaml')
from kiwi.lib.train import train_from_configuration from kiwi.lib.utils import file_to_configuration configuration_dict = file_to_configuration('config/bert.yaml') run_info = train_from_configuration(configuration_dict)
The configuration_dict is only validated inside train_from_configuration, which means other file formats can be used. In fact, file_to_configuration also supports JSON files (but that is a not well known fact, as YAML is preferred).
configuration_dict
train_from_configuration
file_to_configuration
Pretraining the Predictor can be down by calling:
Predictor
kiwi pretrain config/predictor.yaml
The predict pipeline takes a trained QE model as input and uses it to evaluate the quality of machine translations.
The API provides which returns a Predictor object that can be used to generate predictions.
To predict the quality of a set of machine translation dataset via the CLI, use a yaml file specifying model path, output directory, source and target data, etc. (details are explained in Configuration), and run:
yaml
kiwi predict config/predict.yaml
As a package, there a few alternatives, depending on the use.
To load a trained model and produce predictions on a full dataset, use:
from kiwi.lib.predict import predict_from_configuration from kiwi.lib.utils import file_to_configuration configuration_dict = file_to_configuration('config/predict.yaml') predictions, metrics = predict_from_configuration(configuration_dict)
To load a trained model and keep it in memory for predicting on-demand, use:
from kiwi.lib.predict import load_system runner = load_system('trained_models/model.ckpt') predictions = runner.predict( source=['Aqui vai um exemplo de texto'], target=['Here is an example text'], )
The predictions object will contain one or more of the following attributes:
predictions
sentences_hter target_tags_BAD_probabilities target_tags_labels source_tags_BAD_probabilities source_tags_labels gap_tags_BAD_probabilities gap_tags_labels
More details can be found in the code Command Reference.
The evaluate pipeline takes predictions of a trained model and a reference (gold) file and evaluates the performance based on several metrics.
To evaluate one of your models via the CLI, create a yaml file specifying the format of predictions, format of reference and the location of these files, and run:
kiwi evaluate config/predict.yaml
Or alternatively:
from kiwi.lib.evaluate import evaluate_from_configuration from kiwi.lib.utils import file_to_configuration configuration_dict = file_to_configuration('config/evaluate.yaml') report = evaluate_from_configuration(configuration_dict) print(report)
You can check all the configuration options in Configuration.
The search pipeline enables hyperparameter search for the Kiwi models using the Optuna library.
Examples of how to call Kiwi to search hyperparameters for BERT for QE:
kiwi search config/search.yaml
from kiwi.lib.search import search_from_file optuna_study = search_from_file('config/search.yaml')
from kiwi.lib.search import search_from_configuration from kiwi.lib.utils import file_to_configuration configuration_dict = file_to_configuration('config/search.yaml') optuna_study = search_from_configuration(configuration_dict)
The search configuration search.yaml points to the base training config (config/bert.yaml in the above BERT example) which defines the basic model, and the rest of the options are dedicated to configuring the hyperparameters to search and the ranges to search them in.
search.yaml
config/bert.yaml