Predictor-Estimator training

usage: kiwi train [-h] --train-source TRAIN_SOURCE
                  [--train-target TRAIN_TARGET]
                  [--train-source-tags TRAIN_SOURCE_TAGS]
                  [--train-target-tags TRAIN_TARGET_TAGS]
                  [--train-pe TRAIN_PE]
                  [--train-sentence-scores TRAIN_SENTENCE_SCORES]
                  [--split SPLIT] [--valid-source VALID_SOURCE]
                  [--valid-target VALID_TARGET]
                  [--valid-alignments VALID_ALIGNMENTS]
                  [--valid-source-tags VALID_SOURCE_TAGS]
                  [--valid-target-tags VALID_TARGET_TAGS]
                  [--valid-pe VALID_PE]
                  [--valid-sentence-scores VALID_SENTENCE_SCORES]
                  [--predict-side {tags,source_tags,gap_tags}]
                  [--wmt18-format [WMT18_FORMAT]]
                  [--source-max-length SOURCE_MAX_LENGTH]
                  [--source-min-length SOURCE_MIN_LENGTH]
                  [--target-max-length TARGET_MAX_LENGTH]
                  [--target-min-length TARGET_MIN_LENGTH]
                  [--source-vocab-size SOURCE_VOCAB_SIZE]
                  [--target-vocab-size TARGET_VOCAB_SIZE]
                  [--source-vocab-min-frequency SOURCE_VOCAB_MIN_FREQUENCY]
                  [--target-vocab-min-frequency TARGET_VOCAB_MIN_FREQUENCY]
                  [--extend-source-vocab EXTEND_SOURCE_VOCAB]
                  [--extend-target-vocab EXTEND_TARGET_VOCAB]
                  [--warmup WARMUP] [--rnn-layers-pred RNN_LAYERS_PRED]
                  [--dropout-pred DROPOUT_PRED] [--hidden-pred HIDDEN_PRED]
                  [--out-embeddings-size OUT_EMBEDDINGS_SIZE]
                  [--embedding-sizes EMBEDDING_SIZES]
                  [--share-embeddings [SHARE_EMBEDDINGS]]
                  [--predict-inverse [PREDICT_INVERSE]]
                  [--source-embeddings-size SOURCE_EMBEDDINGS_SIZE]
                  [--target-embeddings-size TARGET_EMBEDDINGS_SIZE]
                  [--start-stop [START_STOP]] [--predict-gaps [PREDICT_GAPS]]
                  [--predict-target [PREDICT_TARGET]]
                  [--predict-source [PREDICT_SOURCE]]
                  [--load-pred-source LOAD_PRED_SOURCE]
                  [--load-pred-target LOAD_PRED_TARGET]
                  [--rnn-layers-est RNN_LAYERS_EST]
                  [--dropout-est DROPOUT_EST] [--hidden-est HIDDEN_EST]
                  [--mlp-est [MLP_EST]] [--sentence-level [SENTENCE_LEVEL]]
                  [--sentence-ll [SENTENCE_LL]]
                  [--sentence-ll-predict-mean [SENTENCE_LL_PREDICT_MEAN]]
                  [--use-probs [USE_PROBS]] [--binary-level [BINARY_LEVEL]]
                  [--token-level [TOKEN_LEVEL]]
                  [--target-bad-weight TARGET_BAD_WEIGHT]
                  [--gaps-bad-weight GAPS_BAD_WEIGHT]
                  [--source-bad-weight SOURCE_BAD_WEIGHT]

data

--train-source Path to training source file
--train-target Path to training target file
--train-source-tags
 Path to validation label file for source (WMT18 format)
--train-target-tags
 Path to validation label file for target
--train-pe Path to file containing post-edited target.
--train-sentence-scores
 Path to file containing sentence level scores.

validation data

--split Split Train dataset in case that no validation set is given.
--valid-source Path to validation source file
--valid-target Path to validation target file
--valid-alignments
 Path to valid alignments between source and target.
--valid-source-tags
 Path to validation label file for source (WMT18 format)
--valid-target-tags
 Path to validation label file for target
--valid-pe Path to file containing postedited target.
--valid-sentence-scores
 Path to file containing sentence level scores.

data processing options

--predict-side

Possible choices: tags, source_tags, gap_tags

Tagset to predict. Leave unchanged for WMT17 format.

Default: “tags”

--wmt18-format

Read target tags in WMT18 format.

Default: False

--source-max-length
 

Maximum source sequence length

Default: inf

--source-min-length
 

Truncate source sequence length.

Default: 0

--target-max-length
 

Maximum target sequence length to keep.

Default: inf

--target-min-length
 

Truncate target sequence length.

Default: 0

vocabulary options

Options for loading vocabulary from a previous run. This is used for e.g. training a source predictor via predict-inverse: True ; If set, other vocab options are ignored

--source-vocab-size
 Size of the source vocabulary.
--target-vocab-size
 Size of the target vocabulary.
--source-vocab-min-frequency
 

Min word frequency for source vocabulary.

Default: 1

--target-vocab-min-frequency
 

Min word frequency for target vocabulary.

Default: 1

PredEst data

Predictor Estimator specific data options. (POSTECH)

--extend-source-vocab
 Optionally load more data which is used only for vocabulary creation. Path to additional Data(Predictor)
--extend-target-vocab
 Optionally load more data which is used only for vocabulary creation. Path to additional Data(Predictor)

predictor training

Predictor Estimator (POSTECH)

--warmup

Pretrain Predictor for this number of steps.

Default: 0

--rnn-layers-pred
 

Layers in Pred RNN

Default: 2

--dropout-pred

Dropout in predictor

Default: 0.0

--hidden-pred

Size of hidden layers in LSTM

Default: 100

--out-embeddings-size
 

Word Embedding in Output layer

Default: 200

--embedding-sizes
 

If set, takes precedence over other embedding params

Default: 0

--share-embeddings
 

Tie input and output embeddings for target.

Default: False

--predict-inverse
 

Predict target -> source instead of source -> target.

Default: False

model-embeddings

Embedding layers size in case pre-trained embeddings are not used.

--source-embeddings-size
 

Word embedding size for source.

Default: 50

--target-embeddings-size
 

Word embedding size for target.

Default: 50

predictor-estimator training

Predictor Estimator (POSTECH). These settings are used to train the Predictor. They will be ignored if training a Predictor-Estimator and the load-model flag is set.

--start-stop

Append start and stop symbols to estimator feature sequence.

Default: False

--predict-gaps

Predict Gap Tags. Requires train-gap-tags, valid-gap-tags to be set.

Default: False

--predict-target
 

Predict Target Tags. Requires train-target-tags, valid-target-tags to be set.

Default: True

--predict-source
 

Predict Source Tags. Requires train-source-tags, valid-source-tags to be set.

Default: False

--load-pred-source
 If set, model architecture and vocabulary parameters are ignored. Load pretrained predictor tgt->src.
--load-pred-target
 If set, model architecture and vocabulary parameters are ignored. Load pretrained predictor src->tgt.
--rnn-layers-est
 

Layers in Estimator RNN

Default: 2

--dropout-est

Dropout in estimator

Default: 0.0

--hidden-est

Size of hidden layers in LSTM

Default: 100

--mlp-est
Pass the Estimator input through a linear layer
reducing dimensionality before RNN.

Default: False

--sentence-level
 
Predict Sentence Level Scores.
Requires setting train-sentence-scores, valid-sentence-scores

Default: False

--sentence-ll
Use probabilistic Loss for sentence scores instead of
squared error. If set, the model will output mean and variance of a truncated Gaussian distribution over the interval [0, 1], and use the NLL of ground truth hter as the loss. This seems to improve performance, and gives you uncertainty estimates for sentence level predictions as a byproduct. If sentence-level == False, this is without effect.

Default: False

--sentence-ll-predict-mean
 
If sentence-ll == True, by default the prediction for hter
will be the mean of the Guassian /before truncation/. After truncation, this will be the mode of the distribution, but not the mean as truncated Gaussian is skewed to one side. set this to True to use the True mean after truncation for prediction.

Default: False

--use-probs

Predict scores as product/sum of word level probs

Default: False

--binary-level
Predict binary sentence labels indicating hter == 0.0
Requires setting train-sentence-scores, valid-sentence-scores

Default: False

--token-level
Continue training the predictor on the postedited text.
If set, will do an additional forward pass through the predictor Using the SRC, PE pair and add the Predictor loss for the tokens in the postedited text PE. Recommended if you have access to PE. Requires setting train-pe, valid-pe

Default: False

--target-bad-weight
 

Relative weight for target bad labels.

Default: 3.0

--gaps-bad-weight
 

Relative weight for gaps bad labels.

Default: 3.0

--source-bad-weight
 

Relative weight for source bad labels.

Default: 3.0