Predictor-Estimator training

usage: kiwi train [-h] --train-source TRAIN_SOURCE
                  [--train-target TRAIN_TARGET]
                  [--train-source-tags TRAIN_SOURCE_TAGS]
                  [--train-target-tags TRAIN_TARGET_TAGS]
                  [--train-pe TRAIN_PE]
                  [--train-sentence-scores TRAIN_SENTENCE_SCORES]
                  [--split SPLIT] [--valid-source VALID_SOURCE]
                  [--valid-target VALID_TARGET]
                  [--valid-alignments VALID_ALIGNMENTS]
                  [--valid-source-tags VALID_SOURCE_TAGS]
                  [--valid-target-tags VALID_TARGET_TAGS]
                  [--valid-pe VALID_PE]
                  [--valid-sentence-scores VALID_SENTENCE_SCORES]
                  [--predict-side {tags,source_tags,gap_tags}]
                  [--wmt18-format [WMT18_FORMAT]]
                  [--source-max-length SOURCE_MAX_LENGTH]
                  [--source-min-length SOURCE_MIN_LENGTH]
                  [--target-max-length TARGET_MAX_LENGTH]
                  [--target-min-length TARGET_MIN_LENGTH]
                  [--source-vocab-size SOURCE_VOCAB_SIZE]
                  [--target-vocab-size TARGET_VOCAB_SIZE]
                  [--source-vocab-min-frequency SOURCE_VOCAB_MIN_FREQUENCY]
                  [--target-vocab-min-frequency TARGET_VOCAB_MIN_FREQUENCY]
                  [--extend-source-vocab EXTEND_SOURCE_VOCAB]
                  [--extend-target-vocab EXTEND_TARGET_VOCAB]
                  [--warmup WARMUP] [--rnn-layers-pred RNN_LAYERS_PRED]
                  [--dropout-pred DROPOUT_PRED] [--hidden-pred HIDDEN_PRED]
                  [--out-embeddings-size OUT_EMBEDDINGS_SIZE]
                  [--embedding-sizes EMBEDDING_SIZES]
                  [--share-embeddings [SHARE_EMBEDDINGS]]
                  [--predict-inverse [PREDICT_INVERSE]]
                  [--source-embeddings-size SOURCE_EMBEDDINGS_SIZE]
                  [--target-embeddings-size TARGET_EMBEDDINGS_SIZE]
                  [--start-stop [START_STOP]] [--predict-gaps [PREDICT_GAPS]]
                  [--predict-target [PREDICT_TARGET]]
                  [--predict-source [PREDICT_SOURCE]]
                  [--load-pred-source LOAD_PRED_SOURCE]
                  [--load-pred-target LOAD_PRED_TARGET]
                  [--rnn-layers-est RNN_LAYERS_EST]
                  [--dropout-est DROPOUT_EST] [--hidden-est HIDDEN_EST]
                  [--mlp-est [MLP_EST]] [--sentence-level [SENTENCE_LEVEL]]
                  [--sentence-ll [SENTENCE_LL]]
                  [--sentence-ll-predict-mean [SENTENCE_LL_PREDICT_MEAN]]
                  [--use-probs [USE_PROBS]] [--binary-level [BINARY_LEVEL]]
                  [--token-level [TOKEN_LEVEL]]
                  [--target-bad-weight TARGET_BAD_WEIGHT]
                  [--gaps-bad-weight GAPS_BAD_WEIGHT]
                  [--source-bad-weight SOURCE_BAD_WEIGHT]


--train-source Path to training source file
--train-target Path to training target file
 Path to validation label file for source (WMT18 format)
 Path to validation label file for target
--train-pe Path to file containing post-edited target.
 Path to file containing sentence level scores.

validation data

--split Split Train dataset in case that no validation set is given.
--valid-source Path to validation source file
--valid-target Path to validation target file
 Path to valid alignments between source and target.
 Path to validation label file for source (WMT18 format)
 Path to validation label file for target
--valid-pe Path to file containing postedited target.
 Path to file containing sentence level scores.

data processing options


Possible choices: tags, source_tags, gap_tags

Tagset to predict. Leave unchanged for WMT17 format.

Default: “tags”


Read target tags in WMT18 format.

Default: False


Maximum source sequence length

Default: inf


Truncate source sequence length.

Default: 0


Maximum target sequence length to keep.

Default: inf


Truncate target sequence length.

Default: 0

vocabulary options

Options for loading vocabulary from a previous run. This is used for e.g. training a source predictor via predict-inverse: True ; If set, other vocab options are ignored

 Size of the source vocabulary.
 Size of the target vocabulary.

Min word frequency for source vocabulary.

Default: 1


Min word frequency for target vocabulary.

Default: 1

PredEst data

Predictor Estimator specific data options. (POSTECH)

 Optionally load more data which is used only for vocabulary creation. Path to additional Data(Predictor)
 Optionally load more data which is used only for vocabulary creation. Path to additional Data(Predictor)

predictor training

Predictor Estimator (POSTECH)


Pretrain Predictor for this number of steps.

Default: 0


Layers in Pred RNN

Default: 2


Dropout in predictor

Default: 0.0


Size of hidden layers in LSTM

Default: 100


Word Embedding in Output layer

Default: 200


If set, takes precedence over other embedding params

Default: 0


Tie input and output embeddings for target.

Default: False


Predict target -> source instead of source -> target.

Default: False


Embedding layers size in case pre-trained embeddings are not used.


Word embedding size for source.

Default: 50


Word embedding size for target.

Default: 50

predictor-estimator training

Predictor Estimator (POSTECH). These settings are used to train the Predictor. They will be ignored if training a Predictor-Estimator and the load-model flag is set.


Append start and stop symbols to estimator feature sequence.

Default: False


Predict Gap Tags. Requires train-gap-tags, valid-gap-tags to be set.

Default: False


Predict Target Tags. Requires train-target-tags, valid-target-tags to be set.

Default: True


Predict Source Tags. Requires train-source-tags, valid-source-tags to be set.

Default: False

 If set, model architecture and vocabulary parameters are ignored. Load pretrained predictor tgt->src.
 If set, model architecture and vocabulary parameters are ignored. Load pretrained predictor src->tgt.

Layers in Estimator RNN

Default: 2


Dropout in estimator

Default: 0.0


Size of hidden layers in LSTM

Default: 100

Pass the Estimator input through a linear layer
reducing dimensionality before RNN.

Default: False

Predict Sentence Level Scores.
Requires setting train-sentence-scores, valid-sentence-scores

Default: False

Use probabilistic Loss for sentence scores instead of
squared error. If set, the model will output mean and variance of a truncated Gaussian distribution over the interval [0, 1], and use the NLL of ground truth hter as the loss. This seems to improve performance, and gives you uncertainty estimates for sentence level predictions as a byproduct. If sentence-level == False, this is without effect.

Default: False

If sentence-ll == True, by default the prediction for hter
will be the mean of the Guassian /before truncation/. After truncation, this will be the mode of the distribution, but not the mean as truncated Gaussian is skewed to one side. set this to True to use the True mean after truncation for prediction.

Default: False


Predict scores as product/sum of word level probs

Default: False

Predict binary sentence labels indicating hter == 0.0
Requires setting train-sentence-scores, valid-sentence-scores

Default: False

Continue training the predictor on the postedited text.
If set, will do an additional forward pass through the predictor Using the SRC, PE pair and add the Predictor loss for the tokens in the postedited text PE. Recommended if you have access to PE. Requires setting train-pe, valid-pe

Default: False


Relative weight for target bad labels.

Default: 3.0


Relative weight for gaps bad labels.

Default: 3.0


Relative weight for source bad labels.

Default: 3.0