kiwi.systems.encoders.xlm
XLMTextEncoder
Encode a field, handling vocabulary, tokenization and embeddings.
XLMEncoder
XLM model using Hugging Face’s transformers library.
kiwi.systems.encoders.xlm.
logger
Bases: kiwi.data.encoders.field_encoders.TextEncoder
kiwi.data.encoders.field_encoders.TextEncoder
Heavily inspired in torchtext and torchnlp.
fit_vocab
Bases: kiwi.systems._meta_module.MetaModule
kiwi.systems._meta_module.MetaModule
The following command was used to fine-tune XLM on the in-domain data (retrieved from .pth file):
python train.py --exp_name tlm_clm --dump_path './dumped/' --data_path '/mnt/shared/datasets/kiwi/parallel/en_de_indomain' --lgs 'ar-bg-de-el-en-es-fr-hi-ru-sw-th-tr-ur-vi-zh' --clm_steps 'en-de,de-en' --mlm_steps 'en-de,de-en' --reload_model 'models/mlm_tlm_xnli15_1024.pth' --encoder_only True --emb_dim 1024 --n_layers 12 --n_heads 8 --dropout '0.1' --attention_dropout '0.1' --gelu_activation true --batch_size 32 --bptt 256 --optimizer 'adam_inverse_sqrt,beta1=0.9,beta2=0.98,lr=0.0001,weight_decay=0' --epoch_size 200000 --validation_metrics _valid_mlm_ppl --max_vocab 95000 --tokens_per_batch 1200 --exp_id "5114"
Old version was converted using hf-transformers util method:
convert_xlm_checkpoint_to_pytorch( self.config.model_name / 'indomain.pth', self.config.model_name / 'finetuned_wmt_en-de' )
Old settings in QE not really used for the best run and submission:
fb-causal-lambda: 0.0 fb-keep-prob: 0.1 fb-mask-prob: 0.8 fb-model: data/trained_models/fb_pretrain/xnli/indomain.pth fb-pred-prob: 0.15 fb-rand-prob: 0.1 fb-src-lang: en fb-tgt-lang: de fb-tlm-lambda: 0.0 fb-vocab: data/trained_models/fb_pretrain/xnli/vocab_xnli_15.txt
Config
Bases: kiwi.utils.io.BaseConfig
kiwi.utils.io.BaseConfig
Base class for all pydantic configs. Used to configure base behaviour of configs.
model_name
Pre-trained XLM model to use.
source_language
target_language
use_mismatch_features
Use Alibaba’s mismatch features.
use_predictor_features
Use features originally proposed in the Predictor model.
interleave_input
Concatenate SOURCE and TARGET without internal padding (111222000 instead of 111002220)
freeze
Freeze XLM during training.
use_mlp
Apply a linear layer on top of XLM.
hidden_size
Size of the linear layer on top of XLM.
fix_relative_path
no_implementation
load_state_dict
Copies parameters and buffers from state_dict into this module and its descendants. If strict is True, then the keys of state_dict must exactly match the keys returned by this module’s state_dict() function.
state_dict
strict
True
state_dict()
state_dict (dict) – a dict containing parameters and persistent buffers.
strict (bool, optional) – whether to strictly enforce that the keys in state_dict match the keys returned by this module’s state_dict() function. Default: True
missing_keys is a list of str containing the missing keys
unexpected_keys is a list of str containing the unexpected keys
NamedTuple with missing_keys and unexpected_keys fields
NamedTuple
missing_keys
unexpected_keys
input_data_encoders
size
forward
concat_input
Concatenate tensors of two batches into one tensor.
and concatenation of attention_mask.
Interleave the source + target embeddings into one tensor.
This means making the input as [batch, target [SEP] source].
split_outputs
Split contexts to get tag_side outputs.
features (tensor) – XLM output: <s> source </s> </s> target </s> Shape of (bs, 1 + source_len + 2 + target_len + 1, 2)
batch_inputs –
interleaved (bool) – whether the concat strategy was ‘interleaved’.
label_a – dictionary key for sequence A in features.
features
label_b – dictionary key for sequence B in features.
dict of tensors, one per tag side.
kiwi.systems.encoders.quetch
kiwi.systems.encoders.xlmroberta