Library Reference
Multilingual Encoders
Encoder Model base
Module defining the common interface between all pretrained encoder models.
- class comet.encoders.base.Encoder[source]
Base class for an encoder model.
- abstract forward(tokens: torch.Tensor, lengths: torch.Tensor) Dict[str, torch.Tensor] [source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- abstract classmethod from_pretrained(pretrained_model)[source]
Function that loads a pretrained encoder and the respective tokenizer.
- Returns
Encoder model
- abstract layerwise_lr(lr: float, decay: float)[source]
- Parameters
lr – Learning rate for the highest encoder layer.
decay – decay percentage for the lower layers.
- Returns
List of model parameters with layer-wise decay learning rate
- abstract property max_positions
Max number of tokens the encoder handles.
- abstract property num_layers
Number of model layers available.
- abstract property output_units
Max number of tokens the encoder handles.
BERT Encoder
Pretrained BERT encoder from Hugging Face.
- class comet.encoders.bert.BERTEncoder(pretrained_model: str)[source]
BERT encoder.
- Parameters
pretrained_model – Pretrained model from hugging face.
- forward(input_ids: torch.Tensor, attention_mask: torch.Tensor, **kwargs) Dict[str, torch.Tensor] [source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- classmethod from_pretrained(pretrained_model: str) comet.encoders.base.Encoder [source]
Function that loads a pretrained encoder from Hugging Face. :param pretrained_model: Name of the pretrain model to be loaded.
- Returns
Encoder model
- layerwise_lr(lr: float, decay: float)[source]
- Parameters
lr – Learning rate for the highest encoder layer.
decay – decay percentage for the lower layers.
- Returns
List of model parameters with layer-wise decay learning rate
- property max_positions
Max number of tokens the encoder handles.
- property num_layers
Number of model layers available.
- property output_units
Max number of tokens the encoder handles.
MiniLM Encoder
Pretrained MiniLM encoder from Microsoft. This encoder uses a BERT architecture with an XLMR tokenizer.
XLM-RoBERTa Encoder
Pretrained XLM-RoBERTa encoder from Hugging Face.
- class comet.encoders.xlmr.XLMREncoder(pretrained_model: str)[source]
XLM-RoBERTA Encoder encoder.
- Parameters
pretrained_model – Pretrained model from hugging face.
- forward(input_ids: torch.Tensor, attention_mask: torch.Tensor, **kwargs) Dict[str, torch.Tensor] [source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- classmethod from_pretrained(pretrained_model: str) comet.encoders.base.Encoder [source]
Function that loads a pretrained encoder from Hugging Face. :param pretrained_model: Name of the pretrain model to be loaded.
- Returns
Encoder model
Base Model
CometModel
Abstract Model class that implements some of the Pytorch Lightning logic. Extend this class to create new model and metrics within COMET.
- class comet.models.base.CometModel(nr_frozen_epochs: Union[float, int] = 0.3, keep_embeddings_frozen: bool = False, optimizer: str = 'AdamW', encoder_learning_rate: float = 1e-05, learning_rate: float = 3e-05, layerwise_decay: float = 0.95, encoder_model: str = 'XLM-RoBERTa', pretrained_model: str = 'xlm-roberta-large', pool: str = 'avg', layer: Union[str, int] = 'mix', dropout: float = 0.1, batch_size: int = 4, train_data: Optional[str] = None, validation_data: Optional[str] = None, load_weights_from_checkpoint: Optional[str] = None, class_identifier: Optional[str] = None)[source]
CometModel:
- Parameters
nr_frozen_epochs – Number of epochs (% of epoch) that the encoder is frozen.
keep_embeddings_frozen – Keeps the encoder frozen during training.
optimizer – Optimizer used during training.
encoder_learning_rate – Learning rate used to fine-tune the encoder model.
learning_rate – Learning rate used to fine-tune the top layers.
layerwise_decay – Learning rate % decay from top-to-bottom encoder layers.
encoder_model – Encoder model to be used.
pretrained_model – Pretrained model from Hugging Face.
pool – Pooling strategy to derive a sentence embedding [‘cls’, ‘max’, ‘avg’].
layer – Encoder layer to be used (‘mix’ for pooling info from all layers.)
dropout – Dropout used in the top-layers.
batch_size – Batch size used during training.
train_data – Path to a csv file containing the training data.
validation_data – Path to a csv file containing the validation data.
load_weights_from_checkpoint – Path to a checkpoint file.
class_identifier – subclass identifier.
- abstract configure_optimizers()[source]
Choose what optimizers and learning-rate schedulers to use in your optimization. Normally you’d need one. But in the case of GANs or similar you might have multiple.
- Returns
Any of these 6 options.
Single optimizer.
List or Tuple of optimizers.
Two lists - The first list has multiple optimizers, and the second has multiple LR schedulers (or multiple
lr_scheduler_config
).Dictionary, with an
"optimizer"
key, and (optionally) a"lr_scheduler"
key whose value is a single LR scheduler orlr_scheduler_config
.Tuple of dictionaries as described above, with an optional
"frequency"
key.None - Fit will run without any optimizer.
The
lr_scheduler_config
is a dictionary which contains the scheduler and its associated configuration. The default configuration is shown below.lr_scheduler_config = { # REQUIRED: The scheduler instance "scheduler": lr_scheduler, # The unit of the scheduler's step size, could also be 'step'. # 'epoch' updates the scheduler on epoch end whereas 'step' # updates it after a optimizer update. "interval": "epoch", # How many epochs/steps should pass between calls to # `scheduler.step()`. 1 corresponds to updating the learning # rate after every epoch/step. "frequency": 1, # Metric to to monitor for schedulers like `ReduceLROnPlateau` "monitor": "val_loss", # If set to `True`, will enforce that the value specified 'monitor' # is available when the scheduler is updated, thus stopping # training if not found. If set to `False`, it will only produce a warning "strict": True, # If using the `LearningRateMonitor` callback to monitor the # learning rate progress, this keyword can be used to specify # a custom logged name "name": None, }
When there are schedulers in which the
.step()
method is conditioned on a value, such as thetorch.optim.lr_scheduler.ReduceLROnPlateau
scheduler, Lightning requires that thelr_scheduler_config
contains the keyword"monitor"
set to the metric name that the scheduler should be conditioned on.Metrics can be made available to monitor by simply logging it using
self.log('metric_to_track', metric_val)
in yourLightningModule
.Note
The
frequency
value specified in a dict along with theoptimizer
key is an int corresponding to the number of sequential batches optimized with the specific optimizer. It should be given to none or to all of the optimizers. There is a difference between passing multiple optimizers in a list, and passing multiple optimizers in dictionaries with a frequency of 1:In the former case, all optimizers will operate on the given batch in each optimization step.
In the latter, only one optimizer will operate on the given batch at every step.
This is different from the
frequency
value specified in thelr_scheduler_config
mentioned above.def configure_optimizers(self): optimizer_one = torch.optim.SGD(self.model.parameters(), lr=0.01) optimizer_two = torch.optim.SGD(self.model.parameters(), lr=0.01) return [ {"optimizer": optimizer_one, "frequency": 5}, {"optimizer": optimizer_two, "frequency": 10}, ]
In this example, the first optimizer will be used for the first 5 steps, the second optimizer for the next 10 steps and that cycle will continue. If an LR scheduler is specified for an optimizer using the
lr_scheduler
key in the above dict, the scheduler will only be updated when its optimizer is being used.Examples:
# most cases. no learning rate scheduler def configure_optimizers(self): return Adam(self.parameters(), lr=1e-3) # multiple optimizer case (e.g.: GAN) def configure_optimizers(self): gen_opt = Adam(self.model_gen.parameters(), lr=0.01) dis_opt = Adam(self.model_dis.parameters(), lr=0.02) return gen_opt, dis_opt # example with learning rate schedulers def configure_optimizers(self): gen_opt = Adam(self.model_gen.parameters(), lr=0.01) dis_opt = Adam(self.model_dis.parameters(), lr=0.02) dis_sch = CosineAnnealing(dis_opt, T_max=10) return [gen_opt, dis_opt], [dis_sch] # example with step-based learning rate schedulers # each optimizer has its own scheduler def configure_optimizers(self): gen_opt = Adam(self.model_gen.parameters(), lr=0.01) dis_opt = Adam(self.model_dis.parameters(), lr=0.02) gen_sch = { 'scheduler': ExponentialLR(gen_opt, 0.99), 'interval': 'step' # called after each training step } dis_sch = CosineAnnealing(dis_opt, T_max=10) # called every epoch return [gen_opt, dis_opt], [gen_sch, dis_sch] # example with optimizer frequencies # see training procedure in `Improved Training of Wasserstein GANs`, Algorithm 1 # https://arxiv.org/abs/1704.00028 def configure_optimizers(self): gen_opt = Adam(self.model_gen.parameters(), lr=0.01) dis_opt = Adam(self.model_dis.parameters(), lr=0.02) n_critic = 5 return ( {'optimizer': dis_opt, 'frequency': n_critic}, {'optimizer': gen_opt, 'frequency': 1} )
Note
Some things to know:
Lightning calls
.backward()
and.step()
on each optimizer and learning rate scheduler as needed.If you use 16-bit precision (
precision=16
), Lightning will automatically handle the optimizers.If you use multiple optimizers,
training_step()
will have an additionaloptimizer_idx
parameter.If you use
torch.optim.LBFGS
, Lightning handles the closure function automatically for you.If you use multiple optimizers, gradients will be calculated only for the parameters of current optimizer at each training step.
If you need to control how often those optimizers step or override the default
.step()
schedule, override theoptimizer_step()
hook.
- abstract forward(*args, **kwargs) Dict[str, torch.Tensor] [source]
Same as
torch.nn.Module.forward()
.- Parameters
*args – Whatever you decide to pass into the forward method.
**kwargs – Keyword arguments are also possible.
- Returns
Your model’s output
- get_sentence_embedding(input_ids: torch.Tensor, attention_mask: torch.Tensor) torch.Tensor [source]
- Function that extracts sentence embeddings for
a single sentence.
- Parameters
tokens – sequences [batch_size x seq_len]
lengths – lengths [batch_size]
- Returns
torch.Tensor [batch_size x hidden_size]
- load_weights(checkpoint: str) None [source]
Function that loads the weights from a given checkpoint file. .. note:
If the checkpoint model architecture is different then `self`, only the common parts will be loaded.
- Parameters
checkpoint – Path to the checkpoint containing the weights to be loaded.
- predict(samples: List[Dict[str, str]], batch_size: int = 8, gpus: int = 1, mc_dropout: Union[int, bool] = False, progress_bar: bool = True, accelerator: str = 'ddp', num_workers: Optional[int] = None, length_batching: bool = True) Union[Tuple[List[float], float], Tuple[List[float], List[float], float]] [source]
Function that receives a list of samples (dictionaries with translations, sources and/or references) and returns segment level scores and a system level score. If mc_dropout is set, it also returns for each segment score, a confidence value.
- Parameters
samples – List with dictionaries with source, translations and/or references.
batch_size – Batch size used during inference.
gpus – Number of GPUs to be used.
mc_dropout – Number of inference steps to run using MCD. Its disabled by default!
progress_bar – Flag that turns on and off the predict progress bar.
accelarator – Pytorch Lightning accelerator (e.g: dp, ddp).
num_workers – Number of workers to use when loading data from dataloaders.
length_batching – If set to true, reduces padding by sorting samples by MT length.
- Returns
List with segment-level scores and a system-score or segment-level scores, segment-level confidence and a system-score.
- predict_step(batch: Dict[str, torch.Tensor], batch_idx: Optional[int] = None, dataloader_idx: Optional[int] = None) torch.Tensor [source]
Runs one prediction step and returns the predicted values.
- Parameters
batch – The output of your prepare_sample function.
batch_nb – Integer displaying which batch this is.
dataloader_idx – Integer displaying which dataloader this is.
- prepare_for_inference(sample)[source]
Ideally this should be a lamba function but for some reason python does not copy local lambda functions. This functions replaces collate_fn=lambda x: self.prepare_sample(x, inference=True) from line 434.
- retrieve_sentence_embedding(input_ids: torch.Tensor, attention_mask: torch.Tensor) torch.Tensor [source]
Wrapper for get_sentence_embedding function that caches results.
- setup(stage) None [source]
Data preparation function called before training by Lightning.
- Parameters
stage – either ‘fit’, ‘validate’, ‘test’, or ‘predict’
- train_dataloader() torch.utils.data.dataloader.DataLoader [source]
Function that loads the train set.
- training_step(batch: Tuple[Dict[str, torch.Tensor], Dict[str, torch.Tensor]], batch_nb: int) torch.Tensor [source]
Runs one training step and logs the training loss.
- Parameters
batch – The output of your prepare_sample function.
batch_nb – Integer displaying which batch this is.
- Returns
Loss value
- val_dataloader() torch.utils.data.dataloader.DataLoader [source]
Function that loads the validation set.
- validation_step(batch: Tuple[Dict[str, torch.Tensor], Dict[str, torch.Tensor]], batch_nb: int, dataloader_idx: int) None [source]
Runs one validation step and logs metrics.
- Parameters
batch – The output of your prepare_sample function.
batch_nb – Integer displaying which batch this is.
dataloader_idx – Integer displaying which dataloader this is.
Regression Models
RegressionMetric
Regression Metric that learns to predict a quality assessment by looking at source, translation and reference.
- class comet.models.regression.regression_metric.RegressionMetric(nr_frozen_epochs: Union[float, int] = 0.3, keep_embeddings_frozen: bool = False, optimizer: str = 'AdamW', encoder_learning_rate: float = 1e-05, learning_rate: float = 3e-05, layerwise_decay: float = 0.95, encoder_model: str = 'XLM-RoBERTa', pretrained_model: str = 'xlm-roberta-base', pool: str = 'avg', layer: Union[str, int] = 'mix', dropout: float = 0.1, batch_size: int = 4, train_data: Optional[str] = None, validation_data: Optional[str] = None, hidden_sizes: List[int] = [2304, 768], activations: str = 'Tanh', final_activation: Optional[str] = None, load_weights_from_checkpoint: Optional[str] = None)[source]
RegressionMetric:
- Parameters
nr_frozen_epochs – Number of epochs (% of epoch) that the encoder is frozen.
keep_embeddings_frozen – Keeps the encoder frozen during training.
optimizer – Optimizer used during training.
encoder_learning_rate – Learning rate used to fine-tune the encoder model.
learning_rate – Learning rate used to fine-tune the top layers.
layerwise_decay – Learning rate % decay from top-to-bottom encoder layers.
encoder_model – Encoder model to be used.
pretrained_model – Pretrained model from Hugging Face.
pool – Pooling strategy to derive a sentence embedding [‘cls’, ‘max’, ‘avg’].
layer – Encoder layer to be used (‘mix’ for pooling info from all layers.)
dropout – Dropout used in the top-layers.
batch_size – Batch size used during training.
train_data – Path to a csv file containing the training data.
validation_data – Path to a csv file containing the validation data.
hidden_sizes – Hidden sizes for the Feed Forward regression.
activations – Feed Forward activation function.
load_weights_from_checkpoint – Path to a checkpoint file.
- configure_optimizers() Tuple[List[torch.optim.optimizer.Optimizer], List[torch.optim.lr_scheduler.LambdaLR]] [source]
Sets the optimizers to be used during training.
- forward(src_input_ids: None._VariableFunctionsClass.tensor, src_attention_mask: None._VariableFunctionsClass.tensor, mt_input_ids: None._VariableFunctionsClass.tensor, mt_attention_mask: None._VariableFunctionsClass.tensor, ref_input_ids: None._VariableFunctionsClass.tensor, ref_attention_mask: None._VariableFunctionsClass.tensor, **kwargs) Dict[str, torch.Tensor] [source]
Same as
torch.nn.Module.forward()
.- Parameters
*args – Whatever you decide to pass into the forward method.
**kwargs – Keyword arguments are also possible.
- Returns
Your model’s output
- prepare_sample(sample: List[Dict[str, Union[str, float]]], inference: bool = False) Union[Tuple[Dict[str, torch.Tensor], Dict[str, torch.Tensor]], Dict[str, torch.Tensor]] [source]
Function that prepares a sample to input the model.
- Parameters
sample – list of dictionaries.
inference – If set to true prepares only the model inputs.
- Returns
Tuple with 2 dictionaries (model inputs and targets). If inference=True returns only the model inputs.
ReferencelessRegression
Referenceless Regression Metric that learns to predict a quality assessment by looking at source and translation.
- class comet.models.regression.referenceless.ReferencelessRegression(nr_frozen_epochs: Union[float, int] = 0.3, keep_embeddings_frozen: bool = False, optimizer: str = 'AdamW', encoder_learning_rate: float = 1e-05, learning_rate: float = 3e-05, layerwise_decay: float = 0.95, encoder_model: str = 'XLM-RoBERTa', pretrained_model: str = 'xlm-roberta-base', pool: str = 'avg', layer: Union[str, int] = 'mix', dropout: float = 0.1, batch_size: int = 4, train_data: Optional[str] = None, validation_data: Optional[str] = None, hidden_sizes: List[int] = [1024], activations: str = 'Tanh', final_activation: Optional[str] = None, load_weights_from_checkpoint: Optional[str] = None)[source]
ReferencelessRegression:
- Parameters
nr_frozen_epochs – Number of epochs (% of epoch) that the encoder is frozen.
keep_embeddings_frozen – Keeps the encoder frozen during training.
optimizer – Optimizer used during training.
encoder_learning_rate – Learning rate used to fine-tune the encoder model.
learning_rate – Learning rate used to fine-tune the top layers.
layerwise_decay – Learning rate % decay from top-to-bottom encoder layers.
encoder_model – Encoder model to be used.
pretrained_model – Pretrained model from Hugging Face.
pool – Pooling strategy to derive a sentence embedding [‘cls’, ‘max’, ‘avg’].
layer – Encoder layer to be used (‘mix’ for pooling info from all layers.)
dropout – Dropout used in the top-layers.
batch_size – Batch size used during training.
train_data – Path to a csv file containing the training data.
validation_data – Path to a csv file containing the validation data.
hidden_sizes – Hidden sizes for the Feed Forward regression.
activations – Feed Forward activation function.
load_weights_from_checkpoint – Path to a checkpoint file.
- forward(src_input_ids: None._VariableFunctionsClass.tensor, src_attention_mask: None._VariableFunctionsClass.tensor, mt_input_ids: None._VariableFunctionsClass.tensor, mt_attention_mask: None._VariableFunctionsClass.tensor, **kwargs) Dict[str, torch.Tensor] [source]
Same as
torch.nn.Module.forward()
.- Parameters
*args – Whatever you decide to pass into the forward method.
**kwargs – Keyword arguments are also possible.
- Returns
Your model’s output
- prepare_sample(sample: List[Dict[str, Union[str, float]]], inference: bool = False) Union[Tuple[Dict[str, torch.Tensor], Dict[str, torch.Tensor]], Dict[str, torch.Tensor]] [source]
Function that prepares a sample to input the model.
- Parameters
sample – list of dictionaries.
inference – If set to true prepares only the model inputs.
- Returns
Tuple with 2 dictionaries (model inputs and targets). If inference=True returns only the model inputs.
Translation Ranking Model
Ranking Metric
- Translation Ranking metric was introduced by
[Rei, et al. 2020](https://aclanthology.org/2020.emnlp-main.213/)
and it is trained on top of Direct Assessment Relative Ranks (DARR) to encode good translations closer to the anchors (source & reference) than worse translations.
- class comet.models.ranking.ranking_metric.RankingMetric(nr_frozen_epochs: Union[float, int] = 0.05, keep_embeddings_frozen: bool = False, optimizer: str = 'AdamW', encoder_learning_rate: float = 1e-05, learning_rate: float = 3e-05, layerwise_decay: float = 0.95, encoder_model: str = 'XLM-RoBERTa', pretrained_model: str = 'xlm-roberta-base', pool: str = 'avg', layer: Union[str, int] = 'mix', dropout: float = 0.1, batch_size: int = 8, train_data: Optional[str] = None, validation_data: Optional[str] = None, load_weights_from_checkpoint: Optional[str] = None)[source]
- Parameters
nr_frozen_epochs – Number of epochs (% of epoch) that the encoder is frozen.
keep_embeddings_frozen – Keeps the encoder frozen during training.
optimizer – Optimizer used during training.
encoder_learning_rate – Learning rate used to fine-tune the encoder model.
learning_rate – Learning rate used to fine-tune the top layers.
layerwise_decay – Learning rate % decay from top-to-bottom encoder layers.
encoder_model – Encoder model to be used.
pretrained_model – Pretrained model from Hugging Face.
pool – Pooling strategy to derive a sentence embedding [‘cls’, ‘max’, ‘avg’].
layer – Encoder layer to be used (‘mix’ for pooling info from all layers.)
dropout – Dropout used in the top-layers.
batch_size – Batch size used during training.
train_data – Path to a csv file containing the training data.
validation_data – Path to a csv file containing the validation data.
load_weights_from_checkpoint – Path to a checkpoint file.
- configure_optimizers() Tuple[List[torch.optim.optimizer.Optimizer], List[torch.optim.lr_scheduler.LambdaLR]] [source]
Sets the optimizers to be used during training.
- forward(src_input_ids: None._VariableFunctionsClass.tensor, ref_input_ids: None._VariableFunctionsClass.tensor, pos_input_ids: None._VariableFunctionsClass.tensor, neg_input_ids: None._VariableFunctionsClass.tensor, src_attention_mask: None._VariableFunctionsClass.tensor, ref_attention_mask: None._VariableFunctionsClass.tensor, pos_attention_mask: None._VariableFunctionsClass.tensor, neg_attention_mask: None._VariableFunctionsClass.tensor, **kwargs) Dict[str, torch.Tensor] [source]
Same as
torch.nn.Module.forward()
.- Parameters
*args – Whatever you decide to pass into the forward method.
**kwargs – Keyword arguments are also possible.
- Returns
Your model’s output
- predict_step(batch: Dict[str, torch.Tensor], batch_idx: Optional[int] = None, dataloader_idx: Optional[int] = None) List[float] [source]
Runs one prediction step and returns the predicted values.
- Parameters
batch – The output of your prepare_sample function.
batch_nb – Integer displaying which batch this is.
dataloader_idx – Integer displaying which dataloader this is.
- read_csv(path: str, regression: bool = False) List[dict] [source]
Reads a comma separated value file.
- Parameters
path – path to a csv file.
- Returns
List of records as dictionaries
- training_step(batch: Tuple[Dict[str, torch.Tensor], Dict[str, torch.Tensor]], batch_nb: int) Dict[str, torch.Tensor] [source]
Runs one training step. This usually consists in the forward function followed by the loss function.
- Parameters
batch – The output of your prepare_sample function.
batch_nb – Integer displaying which batch this is.
- Returns
dictionary containing the loss and the metrics to be added to the lightning logger.
- validation_step(batch: Tuple[Dict[str, torch.Tensor], Dict[str, torch.Tensor]], batch_nb: int, dataloader_idx: int) Dict[str, torch.Tensor] [source]
Similar to the training step but with the model in eval mode.
- Parameters
batch – The output of your prepare_sample function.
batch_nb – Integer displaying which batch this is.
dataloader_idx – Integer displaying which dataloader this is.
- Returns
dictionary passed to the validation_end function.
Auxiliary Modules
Feed Forward
Feed Forward Neural Network module that can be used for classification or regression
- class comet.modules.feedforward.FeedForward(in_dim: int, out_dim: int = 1, hidden_sizes: List[int] = [3072, 768], activations: str = 'Sigmoid', final_activation: Optional[str] = None, dropout: float = 0.1)[source]
Feed Forward Neural Network.
- Parameters
in_dim – Number input features.
out_dim – Number of output features. Default is just a score.
hidden_sizes – List with hidden layer sizes.
activations – Name of the activation function to be used in the hidden layers.
final_activation – Name of the final activation function if any.
dropout – dropout to be used in the hidden layers.
- forward(in_features: torch.Tensor) torch.Tensor [source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
Layer-Wise Attention Mechanism
- Computes a parameterised scalar mixture of N tensors,
mixture = gamma * sum(s_k * tensor_k)
where s = softmax(w), with w and gamma scalar parameters.
If layer_norm=True then apply layer normalization.
If dropout > 0, then for each scalar weight, adjust its softmax weight mass to 0 with the dropout probability (i.e., setting the unnormalized weight to -inf). This effectively should redistribute dropped probability mass to all other weights.
- Original implementation:
- class comet.modules.layerwise_attention.LayerwiseAttention(num_layers: int, layer_norm: bool = False, layer_weights: Optional[List[int]] = None, dropout: Optional[float] = None)[source]
- forward(tensors: List[torch.Tensor], mask: Optional[torch.Tensor] = None) torch.Tensor [source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.