kiwi.data.vocabulary

Module Contents

Classes

Vocabulary

Define a vocabulary object that will be used to numericalize a field.

kiwi.data.vocabulary.logger
class kiwi.data.vocabulary.Vocabulary(counter, max_size=None, min_freq=1, unk_token=None, pad_token=None, bos_token=None, eos_token=None, specials=None, vectors=None, unk_init=None, vectors_cache=None, specials_first=True, rare_with_vectors=True, add_vectors_vocab=False)

Define a vocabulary object that will be used to numericalize a field.

counter

A collections.Counter object holding the frequencies of tokens in the data used to build the Vocab.

stoi

A dictionary mapping token strings to numerical identifiers; NOTE: use token_to_id() to do the conversion.

itos

A list of token strings indexed by their numerical identifiers; NOTE: use id_to_token() to do the conversion.

token_to_id(self, token)
id_to_token(self, idx)
property pad_id(self)
property bos_id(self)
property eos_id(self)
__len__(self)
net_length(self)
max_size(self, max_size)

Limit the vocabulary size.

The assumption here is that the vocabulary was created from a list of tokens sorted by descending frequency.

__getstate__(self)
__setstate__(self, state)