kiwi.data.encoders.parallel_data_encoder
InputFields
EmbeddingsConfig
Paths to word embeddings file for each input field.
VocabularyConfig
Base class for all pydantic configs. Used to configure base behaviour of configs.
ParallelDataEncoder
kiwi.data.encoders.parallel_data_encoder.
logger
T
Bases: pydantic.generics.GenericModel, Generic[T]
pydantic.generics.GenericModel
Generic[T]
source
target
Bases: kiwi.utils.io.BaseConfig
kiwi.utils.io.BaseConfig
format
Word embeddings format. See README for specific formatting instructions.
min_frequency
Only add to vocabulary words that occur more than this number of times in the training dataset (doesn’t apply to loaded or pretrained vocabularies).
max_size
Only create vocabulary with up to this many words (doesn’t apply to loaded or pretrained vocabularies).
keep_rare_words_with_embeddings
Keep words that occur less then min-frequency but are in embeddings vocabulary.
add_embeddings_vocab
Add words from embeddings vocabulary to source/target vocabulary.
check_nested_options
Bases: kiwi.data.encoders.base.DataEncoders
kiwi.data.encoders.base.DataEncoders
Config
share_input_fields_encoders
Share encoding/vocabs between source and target fields.
vocab
embeddings
warn_missing_feature
fit_vocabularies
load_vocabularies
Load serialized Vocabularies from disk into fields.
vocabularies_from_dict
vocabularies
Return the vocabularies for all encoders that have one.
A dict mapping encoder names to Vocabulary instances.
collate_fn
kiwi.data.encoders.field_encoders
kiwi.data.encoders.wmt_qe_data_encoder