Linear training¶
Contents
usage: kiwi train [-h] [--train-source TRAIN_SOURCE]
[--train-target TRAIN_TARGET]
[--train-alignments TRAIN_ALIGNMENTS]
[--train-source-tags TRAIN_SOURCE_TAGS]
[--train-target-tags TRAIN_TARGET_TAGS]
[--train-source-pos TRAIN_SOURCE_POS]
[--train-target-pos TRAIN_TARGET_POS]
[--train-target-parse TRAIN_TARGET_PARSE]
[--train-target-ngram TRAIN_TARGET_NGRAM]
[--train-target-stacked TRAIN_TARGET_STACKED]
[--valid-source VALID_SOURCE] [--valid-target VALID_TARGET]
[--valid-alignments VALID_ALIGNMENTS]
[--valid-source-tags VALID_SOURCE_TAGS]
[--valid-target-tags VALID_TARGET_TAGS]
[--valid-source-pos VALID_SOURCE_POS]
[--valid-target-pos VALID_TARGET_POS]
[--valid-target-parse VALID_TARGET_PARSE]
[--valid-target-ngram VALID_TARGET_NGRAM]
[--valid-target-stacked VALID_TARGET_STACKED]
[--source-vocab-size SOURCE_VOCAB_SIZE]
[--target-vocab-size TARGET_VOCAB_SIZE]
[--source-vocab-min-frequency SOURCE_VOCAB_MIN_FREQUENCY]
[--target-vocab-min-frequency TARGET_VOCAB_MIN_FREQUENCY]
[--use-basic-features-only USE_BASIC_FEATURES_ONLY]
[--use-bigrams USE_BIGRAMS]
[--use-simple-bigram-features USE_SIMPLE_BIGRAM_FEATURES]
[--training-algorithm TRAINING_ALGORITHM]
[--regularization-constant REGULARIZATION_CONSTANT]
[--cost-false-positives COST_FALSE_POSITIVES]
[--cost-false-negatives COST_FALSE_NEGATIVES]
[--evaluation-metric EVALUATION_METRIC]
data¶
--train-source | Path to training source file |
--train-target | Path to training target file |
--train-alignments | |
Path to train alignments between source and target. | |
--train-source-tags | |
Path to validation label file for source (WMT18 format) | |
--train-target-tags | |
Path to validation label file for target | |
--train-source-pos | |
Path to training PoS tags file for source | |
--train-target-pos | |
Path to training PoS tags file for target | |
--train-target-parse | |
Path to training dependency parsing file for target (tabular format) | |
--train-target-ngram | |
Path to training highest order ngram file for target (tabular format) | |
--train-target-stacked | |
Path to training stacked predictions file for target (tabular format) |
validation data¶
--valid-source | Path to validation source file |
--valid-target | Path to validation target file |
--valid-alignments | |
Path to valid alignments between source and target. | |
--valid-source-tags | |
Path to validation label file for source (WMT18 format) | |
--valid-target-tags | |
Path to validation label file for target | |
--valid-source-pos | |
Path to training PoS tags file for source | |
--valid-target-pos | |
Path to training PoS tags file for target | |
--valid-target-parse | |
Path to validation dependency parsing file for target (tabular format) | |
--valid-target-ngram | |
Path to validation highest order ngram file for target (tabular format) | |
--valid-target-stacked | |
Path to validation stacked predictions file for target (tabular format) |
vocabulary options¶
--source-vocab-size | |
Size of the source vocabulary. | |
--target-vocab-size | |
Size of the target vocabulary. | |
--source-vocab-min-frequency | |
Min word frequency for source vocabulary. Default: 1 | |
--target-vocab-min-frequency | |
Min word frequency for target vocabulary. Default: 1 |
linear¶
Linear Quality Estimation
--use-basic-features-only | |
1 for using only basic features (words). Default: 0 | |
--use-bigrams | 1 for using bigram features (i.e. a CRF-like model). Default: 1 |
--use-simple-bigram-features | |
1 for using only label indicators as bigram features. Default: 0 | |
--training-algorithm | |
Algorithm for training the model (svm_mira, svm_sgd, perceptron). Default: “svm_mira” | |
--regularization-constant | |
L2 regularization constant. Default: 0.001 | |
--cost-false-positives | |
Cost for false positives (svm_mira and svm_sgd only). Default: 0.2 | |
--cost-false-negatives | |
Cost for false negatives (svm_mira and svm_sgd only). Default: 0.8 | |
--evaluation-metric | |
Evaluation metric (f1_mult or f1_bad). Default: “f1_mult” |