kiwi.systems.encoders.xlm

Module Contents

Classes

XLMTextEncoder

Encode a field, handling vocabulary, tokenization and embeddings.

XLMEncoder

XLM model using Hugging Face’s transformers library.

kiwi.systems.encoders.xlm.logger
class kiwi.systems.encoders.xlm.XLMTextEncoder(tokenizer_name)

Bases: kiwi.data.encoders.field_encoders.TextEncoder

Encode a field, handling vocabulary, tokenization and embeddings.

Heavily inspired in torchtext and torchnlp.

fit_vocab(self, samples, vocab_size=None, vocab_min_freq=0, embeddings_name=None, keep_rare_words_with_embeddings=False, add_embeddings_vocab=False)
class kiwi.systems.encoders.xlm.XLMEncoder(vocabs: Dict[str, Vocabulary], config: Config, pre_load_model: bool = True)

Bases: kiwi.systems._meta_module.MetaModule

XLM model using Hugging Face’s transformers library.

The following command was used to fine-tune XLM on the in-domain data (retrieved from .pth file):

python train.py --exp_name tlm_clm --dump_path './dumped/'             --data_path '/mnt/shared/datasets/kiwi/parallel/en_de_indomain'             --lgs 'ar-bg-de-el-en-es-fr-hi-ru-sw-th-tr-ur-vi-zh'             --clm_steps 'en-de,de-en' --mlm_steps 'en-de,de-en'             --reload_model 'models/mlm_tlm_xnli15_1024.pth' --encoder_only True             --emb_dim 1024 --n_layers 12 --n_heads 8 --dropout '0.1'             --attention_dropout '0.1' --gelu_activation true --batch_size 32             --bptt 256 --optimizer
    'adam_inverse_sqrt,beta1=0.9,beta2=0.98,lr=0.0001,weight_decay=0'             --epoch_size 200000 --validation_metrics _valid_mlm_ppl --max_vocab 95000             --tokens_per_batch 1200 --exp_id "5114"

Old version was converted using hf-transformers util method:

convert_xlm_checkpoint_to_pytorch(
    self.config.model_name / 'indomain.pth',
    self.config.model_name / 'finetuned_wmt_en-de'
)

Old settings in QE not really used for the best run and submission:

fb-causal-lambda: 0.0
fb-keep-prob: 0.1
fb-mask-prob: 0.8
fb-model: data/trained_models/fb_pretrain/xnli/indomain.pth
fb-pred-prob: 0.15
fb-rand-prob: 0.1
fb-src-lang: en
fb-tgt-lang: de
fb-tlm-lambda: 0.0
fb-vocab: data/trained_models/fb_pretrain/xnli/vocab_xnli_15.txt
class Config

Bases: kiwi.utils.io.BaseConfig

Base class for all pydantic configs. Used to configure base behaviour of configs.

model_name :Union[str, Path] = xlm-mlm-tlm-xnli15-1024

Pre-trained XLM model to use.

source_language :str = en
target_language :str = de
use_mismatch_features :bool = False

Use Alibaba’s mismatch features.

use_predictor_features :bool = False

Use features originally proposed in the Predictor model.

interleave_input :bool = False

Concatenate SOURCE and TARGET without internal padding (111222000 instead of 111002220)

freeze :bool = False

Freeze XLM during training.

use_mlp :bool = True

Apply a linear layer on top of XLM.

hidden_size :int = 100

Size of the linear layer on top of XLM.

fix_relative_path(cls, v)
no_implementation(cls, v)
load_state_dict(self, state_dict: Union[Dict[str, Tensor], Dict[str, Tensor]], strict: bool = True)

Copies parameters and buffers from state_dict into this module and its descendants. If strict is True, then the keys of state_dict must exactly match the keys returned by this module’s state_dict() function.

Parameters
  • state_dict (dict) – a dict containing parameters and persistent buffers.

  • strict (bool, optional) – whether to strictly enforce that the keys in state_dict match the keys returned by this module’s state_dict() function. Default: True

Returns

  • missing_keys is a list of str containing the missing keys

  • unexpected_keys is a list of str containing the unexpected keys

Return type

NamedTuple with missing_keys and unexpected_keys fields

classmethod input_data_encoders(cls, config: Config)
size(self, field=None)
forward(self, batch_inputs, *args, include_target_logits=False, include_source_logits=False)
static concat_input(batch_a, batch_b, pad_id, lang_a=None, lang_b=None)

Concatenate tensors of two batches into one tensor.

Returns

the concatenation, a mask of types (a as zeroes and b as ones)

and concatenation of attention_mask.

static interleave_input(batch_a, batch_b, pad_id, lang_a=None, lang_b=None)

Interleave the source + target embeddings into one tensor.

This means making the input as [batch, target [SEP] source].

Returns

interleave of embds, mask of target (as zeroes) and source (as ones)

and concatenation of attention_mask.

static split_outputs(features: torch.Tensor, batch_inputs, interleaved: bool = False, label_a: str = const.SOURCE, label_b: str = const.TARGET)

Split contexts to get tag_side outputs.

Parameters
  • features (tensor) – XLM output: <s> source </s> </s> target </s> Shape of (bs, 1 + source_len + 2 + target_len + 1, 2)

  • batch_inputs

  • interleaved (bool) – whether the concat strategy was ‘interleaved’.

  • label_a – dictionary key for sequence A in features.

  • label_b – dictionary key for sequence B in features.

Returns

dict of tensors, one per tag side.