`kiwi.data.datasets.wmt_qe_dataset`¶

Module Contents¶

Classes¶

`InputConfig`	Base class for all pydantic configs. Used to configure base behaviour of configs.
`OutputConfig`	Base class for all pydantic configs. Used to configure base behaviour of configs.
`TrainingConfig`	Base class for all pydantic configs. Used to configure base behaviour of configs.
`TestConfig`	Base class for all pydantic configs. Used to configure base behaviour of configs.
`WMTQEDataset`	An abstract class representing a `Dataset`.

Functions¶

read_file(path, reader)

kiwi.data.datasets.wmt_qe_dataset.logger¶

class kiwi.data.datasets.wmt_qe_dataset.InputConfig¶

Bases: kiwi.utils.io.BaseConfig

Base class for all pydantic configs. Used to configure base behaviour of configs.

source :FilePath¶: Path to a corpus file in the source language.

target :FilePath¶: Path to a corpus file in the target language.

alignments :Optional[FilePath]¶: Path to alignments between source and target.

post_edit :Optional[FilePath]¶: Path to file containing post-edited target.

source_pos :Optional[FilePath]¶: Path to input file with POS tags for source.

target_pos :Optional[FilePath]¶: Path to input file with POS tags for source.

class kiwi.data.datasets.wmt_qe_dataset.OutputConfig¶

Bases: kiwi.utils.io.BaseConfig

Base class for all pydantic configs. Used to configure base behaviour of configs.

target_tags :Optional[FilePath]¶: Path to label file for target.

source_tags :Optional[FilePath]¶: Path to label file for source.

sentence_scores :Optional[FilePath]¶: Path to file containing sentence level scores (HTER).

class kiwi.data.datasets.wmt_qe_dataset.TrainingConfig¶

Bases: kiwi.utils.io.BaseConfig

Base class for all pydantic configs. Used to configure base behaviour of configs.

input :InputConfig¶

output :OutputConfig¶

class kiwi.data.datasets.wmt_qe_dataset.TestConfig¶

Bases: kiwi.utils.io.BaseConfig

Base class for all pydantic configs. Used to configure base behaviour of configs.

input :InputConfig¶

class kiwi.data.datasets.wmt_qe_dataset.WMTQEDataset(columns: Dict[Any, Union[Iterable, List]])¶

Bases: torch.utils.data.Dataset

An abstract class representing a Dataset.

All datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite __getitem__(), supporting fetching a data sample for a given key. Subclasses could also optionally overwrite __len__(), which is expected to return the size of the dataset by many Sampler implementations and the default options of DataLoader.

Note

DataLoader by default constructs a index sampler that yields integral indices. To make it work with a map-style dataset with non-integral indices/keys, a custom sampler must be provided.

class Config¶

Bases: kiwi.utils.io.BaseConfig

Base class for all pydantic configs. Used to configure base behaviour of configs.

buffer_size :int¶: Number of consecutive instances to be temporarily stored in the buffer, which will be used later for batching/bucketing.

train :TrainingConfig¶

valid :TrainingConfig¶

test :TestConfig¶

split :Optional[confloat(gt=0.0, lt=1.0)]¶: Split train dataset in case that no validation set is given.

ensure_there_is_validation_data(cls, v, values)¶

static build(config: Config, directory=None, train=False, valid=False, test=False, split=0)¶

Build training, validation, and test datasets.

Parameters

config – configuration object with file paths and processing flags; check out the docs for Config.
directory – if provided and paths in configuration are not absolute, use it to anchor them.
train – whether to build the training dataset.
valid – whether to build the validation dataset.
test – whether to build the testing dataset.
split (float) – If no validation set is provided, randomly sample \(1-split\) of training examples as validation set.

__getitem__(self, index_or_field: Union[int, str]) → Union[List[Any], Dict[str, Any]]¶: Get a row with data from all fields or all rows for a given field

__len__(self)¶

__contains__(self, item)¶

sort_key(self, field='source')¶

kiwi.data.datasets.wmt_qe_dataset.read_file(path, reader)¶

kiwi.data.datasets.parallel_dataset kiwi.data.encoders

kiwi.data.datasets.wmt_qe_dataset¶

Module Contents¶

Classes¶

Functions¶

`kiwi.data.datasets.wmt_qe_dataset`¶