Predictor-Estimator training¶

Contents

data
validation data
data processing options
vocabulary options
PredEst data
predictor training
model-embeddings
predictor-estimator training

usage: kiwi train [-h] --train-source TRAIN_SOURCE
                  [--train-target TRAIN_TARGET]
                  [--train-source-tags TRAIN_SOURCE_TAGS]
                  [--train-target-tags TRAIN_TARGET_TAGS]
                  [--train-pe TRAIN_PE]
                  [--train-sentence-scores TRAIN_SENTENCE_SCORES]
                  [--split SPLIT] [--valid-source VALID_SOURCE]
                  [--valid-target VALID_TARGET]
                  [--valid-alignments VALID_ALIGNMENTS]
                  [--valid-source-tags VALID_SOURCE_TAGS]
                  [--valid-target-tags VALID_TARGET_TAGS]
                  [--valid-pe VALID_PE]
                  [--valid-sentence-scores VALID_SENTENCE_SCORES]
                  [--predict-side {tags,source_tags,gap_tags}]
                  [--wmt18-format [WMT18_FORMAT]]
                  [--source-max-length SOURCE_MAX_LENGTH]
                  [--source-min-length SOURCE_MIN_LENGTH]
                  [--target-max-length TARGET_MAX_LENGTH]
                  [--target-min-length TARGET_MIN_LENGTH]
                  [--source-vocab-size SOURCE_VOCAB_SIZE]
                  [--target-vocab-size TARGET_VOCAB_SIZE]
                  [--source-vocab-min-frequency SOURCE_VOCAB_MIN_FREQUENCY]
                  [--target-vocab-min-frequency TARGET_VOCAB_MIN_FREQUENCY]
                  [--extend-source-vocab EXTEND_SOURCE_VOCAB]
                  [--extend-target-vocab EXTEND_TARGET_VOCAB]
                  [--warmup WARMUP] [--rnn-layers-pred RNN_LAYERS_PRED]
                  [--dropout-pred DROPOUT_PRED] [--hidden-pred HIDDEN_PRED]
                  [--out-embeddings-size OUT_EMBEDDINGS_SIZE]
                  [--embedding-sizes EMBEDDING_SIZES]
                  [--share-embeddings [SHARE_EMBEDDINGS]]
                  [--predict-inverse [PREDICT_INVERSE]]
                  [--source-embeddings-size SOURCE_EMBEDDINGS_SIZE]
                  [--target-embeddings-size TARGET_EMBEDDINGS_SIZE]
                  [--start-stop [START_STOP]] [--predict-gaps [PREDICT_GAPS]]
                  [--predict-target [PREDICT_TARGET]]
                  [--predict-source [PREDICT_SOURCE]]
                  [--load-pred-source LOAD_PRED_SOURCE]
                  [--load-pred-target LOAD_PRED_TARGET]
                  [--rnn-layers-est RNN_LAYERS_EST]
                  [--dropout-est DROPOUT_EST] [--hidden-est HIDDEN_EST]
                  [--mlp-est [MLP_EST]] [--sentence-level [SENTENCE_LEVEL]]
                  [--sentence-ll [SENTENCE_LL]]
                  [--sentence-ll-predict-mean [SENTENCE_LL_PREDICT_MEAN]]
                  [--use-probs [USE_PROBS]] [--binary-level [BINARY_LEVEL]]
                  [--token-level [TOKEN_LEVEL]]
                  [--target-bad-weight TARGET_BAD_WEIGHT]
                  [--gaps-bad-weight GAPS_BAD_WEIGHT]
                  [--source-bad-weight SOURCE_BAD_WEIGHT]

data ¶

`--train-source`	Path to training source file
`--train-target`	Path to training target file
`--train-source-tags`
	Path to validation label file for source (WMT18 format)
`--train-target-tags`
	Path to validation label file for target
`--train-pe`	Path to file containing post-edited target.
`--train-sentence-scores`
	Path to file containing sentence level scores.

validation data ¶

`--split`	Split Train dataset in case that no validation set is given.
`--valid-source`	Path to validation source file
`--valid-target`	Path to validation target file
`--valid-alignments`
	Path to valid alignments between source and target.
`--valid-source-tags`
	Path to validation label file for source (WMT18 format)
`--valid-target-tags`
	Path to validation label file for target
`--valid-pe`	Path to file containing postedited target.
`--valid-sentence-scores`
	Path to file containing sentence level scores.

data processing options ¶

`--predict-side`	Possible choices: tags, source_tags, gap_tags Tagset to predict. Leave unchanged for WMT17 format. Default: “tags”
`--wmt18-format`	Read target tags in WMT18 format. Default: False
`--source-max-length`
	Maximum source sequence length Default: inf
`--source-min-length`
	Truncate source sequence length. Default: 0
`--target-max-length`
	Maximum target sequence length to keep. Default: inf
`--target-min-length`
	Truncate target sequence length. Default: 0

vocabulary options ¶

Options for loading vocabulary from a previous run. This is used for e.g. training a source predictor via predict-inverse: True ; If set, other vocab options are ignored

`--source-vocab-size`
	Size of the source vocabulary.
`--target-vocab-size`
	Size of the target vocabulary.
`--source-vocab-min-frequency`
	Min word frequency for source vocabulary. Default: 1
`--target-vocab-min-frequency`
	Min word frequency for target vocabulary. Default: 1

PredEst data ¶

Predictor Estimator specific data options. (POSTECH)

`--extend-source-vocab`
	Optionally load more data which is used only for vocabulary creation. Path to additional Data(Predictor)
`--extend-target-vocab`
	Optionally load more data which is used only for vocabulary creation. Path to additional Data(Predictor)

predictor training ¶

Predictor Estimator (POSTECH)

`--warmup`	Pretrain Predictor for this number of steps. Default: 0
`--rnn-layers-pred`
	Layers in Pred RNN Default: 2
`--dropout-pred`	Dropout in predictor Default: 0.0
`--hidden-pred`	Size of hidden layers in LSTM Default: 100
`--out-embeddings-size`
	Word Embedding in Output layer Default: 200
`--embedding-sizes`
	If set, takes precedence over other embedding params Default: 0
`--share-embeddings`
	Tie input and output embeddings for target. Default: False
`--predict-inverse`
	Predict target -> source instead of source -> target. Default: False

model-embeddings ¶

Embedding layers size in case pre-trained embeddings are not used.

`--source-embeddings-size`
	Word embedding size for source. Default: 50
`--target-embeddings-size`
	Word embedding size for target. Default: 50

predictor-estimator training ¶

Predictor Estimator (POSTECH). These settings are used to train the Predictor. They will be ignored if training a Predictor-Estimator and the load-model flag is set.

`--start-stop`	Append start and stop symbols to estimator feature sequence. Default: False
`--predict-gaps`	Predict Gap Tags. Requires train-gap-tags, valid-gap-tags to be set. Default: False
`--predict-target`
	Predict Target Tags. Requires train-target-tags, valid-target-tags to be set. Default: True
`--predict-source`
	Predict Source Tags. Requires train-source-tags, valid-source-tags to be set. Default: False
`--load-pred-source`
	If set, model architecture and vocabulary parameters are ignored. Load pretrained predictor tgt->src.
`--load-pred-target`
	If set, model architecture and vocabulary parameters are ignored. Load pretrained predictor src->tgt.
`--rnn-layers-est`
	Layers in Estimator RNN Default: 2
`--dropout-est`	Dropout in estimator Default: 0.0
`--hidden-est`	Size of hidden layers in LSTM Default: 100
`--mlp-est`	Pass the Estimator input through a linear layer reducing dimensionality before RNN. Default: False
`--sentence-level`
	Predict Sentence Level Scores. Requires setting train-sentence-scores, valid-sentence-scores Default: False
`--sentence-ll`	Use probabilistic Loss for sentence scores instead of squared error. If set, the model will output mean and variance of a truncated Gaussian distribution over the interval [0, 1], and use the NLL of ground truth hter as the loss. This seems to improve performance, and gives you uncertainty estimates for sentence level predictions as a byproduct. If sentence-level == False, this is without effect. Default: False
`--sentence-ll-predict-mean`
	If sentence-ll == True, by default the prediction for hter will be the mean of the Guassian /before truncation/. After truncation, this will be the mode of the distribution, but not the mean as truncated Gaussian is skewed to one side. set this to True to use the True mean after truncation for prediction. Default: False
`--use-probs`	Predict scores as product/sum of word level probs Default: False
`--binary-level`	Predict binary sentence labels indicating hter == 0.0 Requires setting train-sentence-scores, valid-sentence-scores Default: False
`--token-level`	Continue training the predictor on the postedited text. If set, will do an additional forward pass through the predictor Using the SRC, PE pair and add the Predictor loss for the tokens in the postedited text PE. Recommended if you have access to PE. Requires setting train-pe, valid-pe Default: False
`--target-bad-weight`
	Relative weight for target bad labels. Default: 3.0
`--gaps-bad-weight`
	Relative weight for gaps bad labels. Default: 3.0
`--source-bad-weight`
	Relative weight for source bad labels. Default: 3.0