kiwi.data.datasets.wmt_qe_dataset
InputConfig
Base class for all pydantic configs. Used to configure base behaviour of configs.
OutputConfig
TrainingConfig
TestConfig
WMTQEDataset
An abstract class representing a Dataset.
Dataset
read_file(path, reader)
read_file
kiwi.data.datasets.wmt_qe_dataset.
logger
Bases: kiwi.utils.io.BaseConfig
kiwi.utils.io.BaseConfig
source
Path to a corpus file in the source language.
target
Path to a corpus file in the target language.
alignments
Path to alignments between source and target.
post_edit
Path to file containing post-edited target.
source_pos
Path to input file with POS tags for source.
target_pos
target_tags
Path to label file for target.
source_tags
Path to label file for source.
sentence_scores
Path to file containing sentence level scores (HTER).
input
output
Bases: torch.utils.data.Dataset
torch.utils.data.Dataset
All datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite __getitem__(), supporting fetching a data sample for a given key. Subclasses could also optionally overwrite __len__(), which is expected to return the size of the dataset by many Sampler implementations and the default options of DataLoader.
__getitem__()
__len__()
Sampler
DataLoader
Note
DataLoader by default constructs a index sampler that yields integral indices. To make it work with a map-style dataset with non-integral indices/keys, a custom sampler must be provided.
Config
buffer_size
Number of consecutive instances to be temporarily stored in the buffer, which will be used later for batching/bucketing.
train
valid
test
split
Split train dataset in case that no validation set is given.
ensure_there_is_validation_data
build
Build training, validation, and test datasets.
config – configuration object with file paths and processing flags; check out the docs for Config.
directory – if provided and paths in configuration are not absolute, use it to anchor them.
train – whether to build the training dataset.
valid – whether to build the validation dataset.
test – whether to build the testing dataset.
split (float) – If no validation set is provided, randomly sample \(1-split\) of training examples as validation set.
__getitem__
Get a row with data from all fields or all rows for a given field
__len__
__contains__
sort_key
kiwi.data.datasets.parallel_dataset
kiwi.data.encoders