kiwi.data.encoders.field_encoders

Module Contents

Classes

TextEncoder

Encode a field, handling vocabulary, tokenization and embeddings.

TagEncoder

Encode a field, handling vocabulary, tokenization and embeddings.

InputEncoder

ScoreEncoder

BinaryScoreEncoder

Transform HTER score into binary OK/BAD label.

AlignmentEncoder

kiwi.data.encoders.field_encoders.logger
class kiwi.data.encoders.field_encoders.TextEncoder(tokenize=tokenizers.tokenize, detokenize=tokenizers.detokenize, subtokenize=None, pad_token=PAD, unk_token=UNK, bos_token=START, eos_token=STOP, unaligned_token=UNALIGNED, specials_first=True, include_lengths=True, include_bounds=True)

Encode a field, handling vocabulary, tokenization and embeddings.

Heavily inspired in torchtext and torchnlp.

fit_vocab(self, samples, vocab_size=None, vocab_min_freq=0, embeddings_name=None, keep_rare_words_with_embeddings=False, add_embeddings_vocab=False)
property vocabulary(self)
property padding_index(self)
encode(self, example)
batch_encode(self, iterator)
class kiwi.data.encoders.field_encoders.TagEncoder(tokenize=tokenizers.tokenize, detokenize=tokenizers.detokenize, pad_token=PAD, include_lengths=True)

Bases: kiwi.data.encoders.field_encoders.TextEncoder

Encode a field, handling vocabulary, tokenization and embeddings.

Heavily inspired in torchtext and torchnlp.

class kiwi.data.encoders.field_encoders.InputEncoder
class kiwi.data.encoders.field_encoders.ScoreEncoder(dtype=torch.float)

Bases: kiwi.data.encoders.field_encoders.InputEncoder

encode(self, example)
batch_encode(self, iterator)
class kiwi.data.encoders.field_encoders.BinaryScoreEncoder(dtype=torch.float)

Bases: kiwi.data.encoders.field_encoders.ScoreEncoder

Transform HTER score into binary OK/BAD label.

encode(self, example)
class kiwi.data.encoders.field_encoders.AlignmentEncoder(dtype=torch.int, account_for_bos_token=True, account_for_eos_token=True)

Bases: kiwi.data.encoders.field_encoders.InputEncoder

encode(self, example)
batch_encode(self, iterator)