Library Reference

Multilingual Encoders

Encoder Model base

Module defining the common interface between all pretrained encoder models.

class comet.encoders.base.Encoder[source]

Base class for an encoder model.

abstract forward(tokens: torch.Tensor, lengths: torch.Tensor) → Dict[str, torch.Tensor][source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

freeze() → None[source]: Frezees the entire encoder.

abstract freeze_embeddings() → None[source]: Frezees the embedding layer.

abstract classmethod from_pretrained(pretrained_model)[source]

Function that loads a pretrained encoder and the respective tokenizer.

Returns: Encoder model

abstract layerwise_lr(lr: float, decay: float)[source]

Parameters

lr – Learning rate for the highest encoder layer.
decay – decay percentage for the lower layers.

Returns

List of model parameters with layer-wise decay learning rate

abstract property max_positions: Max number of tokens the encoder handles.

abstract property num_layers: Number of model layers available.

abstract property output_units: Max number of tokens the encoder handles.

prepare_sample(sample: List[str]) → Dict[str, torch.Tensor][source]

Receives a list of strings and applies tokenization and vectorization.

Parameters: sample – List with text segments to be tokenized and padded.
Returns: Dictionary with HF model inputs.

unfreeze() → None[source]: Unfrezees the entire encoder.

BERT Encoder

Pretrained BERT encoder from Hugging Face.

class comet.encoders.bert.BERTEncoder(pretrained_model: str)[source]

BERT encoder.

Parameters: pretrained_model – Pretrained model from hugging face.

forward(input_ids: torch.Tensor, attention_mask: torch.Tensor, **kwargs) → Dict[str, torch.Tensor][source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

freeze_embeddings() → None[source]: Frezees the embedding layer.

classmethod from_pretrained(pretrained_model: str) → comet.encoders.base.Encoder[source]

Function that loads a pretrained encoder from Hugging Face. :param pretrained_model: Name of the pretrain model to be loaded.

Returns: Encoder model

layerwise_lr(lr: float, decay: float)[source]

Parameters

lr – Learning rate for the highest encoder layer.
decay – decay percentage for the lower layers.

Returns

List of model parameters with layer-wise decay learning rate

property max_positions: Max number of tokens the encoder handles.

property num_layers: Number of model layers available.

property output_units: Max number of tokens the encoder handles.

MiniLM Encoder

Pretrained MiniLM encoder from Microsoft. This encoder uses a BERT architecture with an XLMR tokenizer.

class comet.encoders.minilm.MiniLMEncoder(pretrained_model: str)[source]

MiniLMEncoder encoder.

Parameters: pretrained_model – Pretrained model from hugging face.

XLM-RoBERTa Encoder

Pretrained XLM-RoBERTa encoder from Hugging Face.

class comet.encoders.xlmr.XLMREncoder(pretrained_model: str)[source]

XLM-RoBERTA Encoder encoder.

Parameters: pretrained_model – Pretrained model from hugging face.

forward(input_ids: torch.Tensor, attention_mask: torch.Tensor, **kwargs) → Dict[str, torch.Tensor][source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

classmethod from_pretrained(pretrained_model: str) → comet.encoders.base.Encoder[source]

Function that loads a pretrained encoder from Hugging Face. :param pretrained_model: Name of the pretrain model to be loaded.

Returns: Encoder model

Base Model

CometModel

Abstract Model class that implements some of the Pytorch Lightning logic. Extend this class to create new model and metrics within COMET.

class comet.models.base.CometModel(nr_frozen_epochs: Union[float, int] = 0.3, keep_embeddings_frozen: bool = False, optimizer: str = 'AdamW', encoder_learning_rate: float = 1e-05, learning_rate: float = 3e-05, layerwise_decay: float = 0.95, encoder_model: str = 'XLM-RoBERTa', pretrained_model: str = 'xlm-roberta-large', pool: str = 'avg', layer: Union[str, int] = 'mix', dropout: float = 0.1, batch_size: int = 4, train_data: Optional[str] = None, validation_data: Optional[str] = None, load_weights_from_checkpoint: Optional[str] = None, class_identifier: Optional[str] = None)[source]

CometModel:

Parameters

nr_frozen_epochs – Number of epochs (% of epoch) that the encoder is frozen.
keep_embeddings_frozen – Keeps the encoder frozen during training.
optimizer – Optimizer used during training.
encoder_learning_rate – Learning rate used to fine-tune the encoder model.
learning_rate – Learning rate used to fine-tune the top layers.
layerwise_decay – Learning rate % decay from top-to-bottom encoder layers.
encoder_model – Encoder model to be used.
pretrained_model – Pretrained model from Hugging Face.
pool – Pooling strategy to derive a sentence embedding [‘cls’, ‘max’, ‘avg’].
layer – Encoder layer to be used (‘mix’ for pooling info from all layers.)
dropout – Dropout used in the top-layers.
batch_size – Batch size used during training.
train_data – Path to a csv file containing the training data.
validation_data – Path to a csv file containing the validation data.
load_weights_from_checkpoint – Path to a checkpoint file.
class_identifier – subclass identifier.

abstract configure_optimizers()[source]

Choose what optimizers and learning-rate schedulers to use in your optimization. Normally you’d need one. But in the case of GANs or similar you might have multiple.

Returns

Any of these 6 options.

Single optimizer.
List or Tuple of optimizers.
Two lists - The first list has multiple optimizers, and the second has multiple LR schedulers (or multiple lr_scheduler_config).
Dictionary, with an "optimizer" key, and (optionally) a "lr_scheduler" key whose value is a single LR scheduler or lr_scheduler_config.
Tuple of dictionaries as described above, with an optional "frequency" key.
None - Fit will run without any optimizer.

The lr_scheduler_config is a dictionary which contains the scheduler and its associated configuration. The default configuration is shown below.

lr_scheduler_config = {
    # REQUIRED: The scheduler instance
    "scheduler": lr_scheduler,
    # The unit of the scheduler's step size, could also be 'step'.
    # 'epoch' updates the scheduler on epoch end whereas 'step'
    # updates it after a optimizer update.
    "interval": "epoch",
    # How many epochs/steps should pass between calls to
    # `scheduler.step()`. 1 corresponds to updating the learning
    # rate after every epoch/step.
    "frequency": 1,
    # Metric to to monitor for schedulers like `ReduceLROnPlateau`
    "monitor": "val_loss",
    # If set to `True`, will enforce that the value specified 'monitor'
    # is available when the scheduler is updated, thus stopping
    # training if not found. If set to `False`, it will only produce a warning
    "strict": True,
    # If using the `LearningRateMonitor` callback to monitor the
    # learning rate progress, this keyword can be used to specify
    # a custom logged name
    "name": None,
}

When there are schedulers in which the .step() method is conditioned on a value, such as the torch.optim.lr_scheduler.ReduceLROnPlateau scheduler, Lightning requires that the lr_scheduler_config contains the keyword "monitor" set to the metric name that the scheduler should be conditioned on.

Metrics can be made available to monitor by simply logging it using self.log('metric_to_track', metric_val) in your LightningModule.

Note

The frequency value specified in a dict along with the optimizer key is an int corresponding to the number of sequential batches optimized with the specific optimizer. It should be given to none or to all of the optimizers. There is a difference between passing multiple optimizers in a list, and passing multiple optimizers in dictionaries with a frequency of 1:

In the former case, all optimizers will operate on the given batch in each optimization step.

In the latter, only one optimizer will operate on the given batch at every step.

This is different from the frequency value specified in the lr_scheduler_config mentioned above.

def configure_optimizers(self):
    optimizer_one = torch.optim.SGD(self.model.parameters(), lr=0.01)
    optimizer_two = torch.optim.SGD(self.model.parameters(), lr=0.01)
    return [
        {"optimizer": optimizer_one, "frequency": 5},
        {"optimizer": optimizer_two, "frequency": 10},
    ]

In this example, the first optimizer will be used for the first 5 steps, the second optimizer for the next 10 steps and that cycle will continue. If an LR scheduler is specified for an optimizer using the lr_scheduler key in the above dict, the scheduler will only be updated when its optimizer is being used.

Examples:

# most cases. no learning rate scheduler
def configure_optimizers(self):
    return Adam(self.parameters(), lr=1e-3)

# multiple optimizer case (e.g.: GAN)
def configure_optimizers(self):
    gen_opt = Adam(self.model_gen.parameters(), lr=0.01)
    dis_opt = Adam(self.model_dis.parameters(), lr=0.02)
    return gen_opt, dis_opt

# example with learning rate schedulers
def configure_optimizers(self):
    gen_opt = Adam(self.model_gen.parameters(), lr=0.01)
    dis_opt = Adam(self.model_dis.parameters(), lr=0.02)
    dis_sch = CosineAnnealing(dis_opt, T_max=10)
    return [gen_opt, dis_opt], [dis_sch]

# example with step-based learning rate schedulers
# each optimizer has its own scheduler
def configure_optimizers(self):
    gen_opt = Adam(self.model_gen.parameters(), lr=0.01)
    dis_opt = Adam(self.model_dis.parameters(), lr=0.02)
    gen_sch = {
        'scheduler': ExponentialLR(gen_opt, 0.99),
        'interval': 'step'  # called after each training step
    }
    dis_sch = CosineAnnealing(dis_opt, T_max=10) # called every epoch
    return [gen_opt, dis_opt], [gen_sch, dis_sch]

# example with optimizer frequencies
# see training procedure in `Improved Training of Wasserstein GANs`, Algorithm 1
# https://arxiv.org/abs/1704.00028
def configure_optimizers(self):
    gen_opt = Adam(self.model_gen.parameters(), lr=0.01)
    dis_opt = Adam(self.model_dis.parameters(), lr=0.02)
    n_critic = 5
    return (
        {'optimizer': dis_opt, 'frequency': n_critic},
        {'optimizer': gen_opt, 'frequency': 1}
    )

Note

Some things to know:

Lightning calls .backward() and .step() on each optimizer and learning rate scheduler as needed.
If you use 16-bit precision (precision=16), Lightning will automatically handle the optimizers.
If you use multiple optimizers, training_step() will have an additional optimizer_idx parameter.
If you use torch.optim.LBFGS, Lightning handles the closure function automatically for you.
If you use multiple optimizers, gradients will be calculated only for the parameters of current optimizer at each training step.
If you need to control how often those optimizers step or override the default .step() schedule, override the optimizer_step() hook.

abstract forward(*args, **kwargs) → Dict[str, torch.Tensor][source]

Same as torch.nn.Module.forward().

Parameters

*args – Whatever you decide to pass into the forward method.
**kwargs – Keyword arguments are also possible.

Returns

Your model’s output

get_sentence_embedding(input_ids: torch.Tensor, attention_mask: torch.Tensor) → torch.Tensor[source]

Function that extracts sentence embeddings for: a single sentence.

Parameters

tokens – sequences [batch_size x seq_len]
lengths – lengths [batch_size]

Returns

torch.Tensor [batch_size x hidden_size]

load_weights(checkpoint: str) → None[source]

Function that loads the weights from a given checkpoint file. .. note:

If the checkpoint model architecture is different then `self`, only
the common parts will be loaded.

Parameters: checkpoint – Path to the checkpoint containing the weights to be loaded.

on_predict_start() → None[source]: Called when predict begins.

on_train_epoch_end() → None[source]: Hook used to unfreeze encoder during training.

predict(samples: List[Dict[str, str]], batch_size: int = 8, gpus: int = 1, mc_dropout: Union[int, bool] = False, progress_bar: bool = True, accelerator: str = 'ddp', num_workers: Optional[int] = None, length_batching: bool = True) → Union[Tuple[List[float], float], Tuple[List[float], List[float], float]][source]

Function that receives a list of samples (dictionaries with translations, sources and/or references) and returns segment level scores and a system level score. If mc_dropout is set, it also returns for each segment score, a confidence value.

Parameters

samples – List with dictionaries with source, translations and/or references.
batch_size – Batch size used during inference.
gpus – Number of GPUs to be used.
mc_dropout – Number of inference steps to run using MCD. Its disabled by default!
progress_bar – Flag that turns on and off the predict progress bar.
accelarator – Pytorch Lightning accelerator (e.g: dp, ddp).
num_workers – Number of workers to use when loading data from dataloaders.
length_batching – If set to true, reduces padding by sorting samples by MT length.

Returns

List with segment-level scores and a system-score or segment-level scores, segment-level confidence and a system-score.

predict_step(batch: Dict[str, torch.Tensor], batch_idx: Optional[int] = None, dataloader_idx: Optional[int] = None) → torch.Tensor[source]

Runs one prediction step and returns the predicted values.

Parameters

batch – The output of your prepare_sample function.
batch_nb – Integer displaying which batch this is.
dataloader_idx – Integer displaying which dataloader this is.

prepare_for_inference(sample)[source]: Ideally this should be a lamba function but for some reason python does not copy local lambda functions. This functions replaces collate_fn=lambda x: self.prepare_sample(x, inference=True) from line 434.

retrieve_sentence_embedding(input_ids: torch.Tensor, attention_mask: torch.Tensor) → torch.Tensor[source]: Wrapper for get_sentence_embedding function that caches results.

set_embedding_cache()[source]: Function that when called turns embedding caching on.

setup(stage) → None[source]

Data preparation function called before training by Lightning.

Parameters: stage – either ‘fit’, ‘validate’, ‘test’, or ‘predict’

train_dataloader() → torch.utils.data.dataloader.DataLoader[source]: Function that loads the train set.

training_step(batch: Tuple[Dict[str, torch.Tensor], Dict[str, torch.Tensor]], batch_nb: int) → torch.Tensor[source]

Runs one training step and logs the training loss.

Parameters

batch – The output of your prepare_sample function.
batch_nb – Integer displaying which batch this is.

Returns

Loss value

val_dataloader() → torch.utils.data.dataloader.DataLoader[source]: Function that loads the validation set.

validation_epoch_end(*args, **kwargs) → None[source]: Computes and logs metrics.

validation_step(batch: Tuple[Dict[str, torch.Tensor], Dict[str, torch.Tensor]], batch_nb: int, dataloader_idx: int) → None[source]

Runs one validation step and logs metrics.

Parameters

batch – The output of your prepare_sample function.
batch_nb – Integer displaying which batch this is.
dataloader_idx – Integer displaying which dataloader this is.

class comet.models.base.OrderedSampler(indices: List[int])[source]: Sampler that returns the indices in a deterministic order.

Regression Models

RegressionMetric

Regression Metric that learns to predict a quality assessment by looking at source, translation and reference.

class comet.models.regression.regression_metric.RegressionMetric(nr_frozen_epochs: Union[float, int] = 0.3, keep_embeddings_frozen: bool = False, optimizer: str = 'AdamW', encoder_learning_rate: float = 1e-05, learning_rate: float = 3e-05, layerwise_decay: float = 0.95, encoder_model: str = 'XLM-RoBERTa', pretrained_model: str = 'xlm-roberta-base', pool: str = 'avg', layer: Union[str, int] = 'mix', dropout: float = 0.1, batch_size: int = 4, train_data: Optional[str] = None, validation_data: Optional[str] = None, hidden_sizes: List[int] = [2304, 768], activations: str = 'Tanh', final_activation: Optional[str] = None, load_weights_from_checkpoint: Optional[str] = None)[source]

RegressionMetric:

Parameters

nr_frozen_epochs – Number of epochs (% of epoch) that the encoder is frozen.
keep_embeddings_frozen – Keeps the encoder frozen during training.
optimizer – Optimizer used during training.
encoder_learning_rate – Learning rate used to fine-tune the encoder model.
learning_rate – Learning rate used to fine-tune the top layers.
layerwise_decay – Learning rate % decay from top-to-bottom encoder layers.
encoder_model – Encoder model to be used.
pretrained_model – Pretrained model from Hugging Face.
pool – Pooling strategy to derive a sentence embedding [‘cls’, ‘max’, ‘avg’].
layer – Encoder layer to be used (‘mix’ for pooling info from all layers.)
dropout – Dropout used in the top-layers.
batch_size – Batch size used during training.
train_data – Path to a csv file containing the training data.
validation_data – Path to a csv file containing the validation data.
hidden_sizes – Hidden sizes for the Feed Forward regression.
activations – Feed Forward activation function.
load_weights_from_checkpoint – Path to a checkpoint file.

configure_optimizers() → Tuple[List[torch.optim.optimizer.Optimizer], List[torch.optim.lr_scheduler.LambdaLR]][source]: Sets the optimizers to be used during training.

forward(src_input_ids: None._VariableFunctionsClass.tensor, src_attention_mask: None._VariableFunctionsClass.tensor, mt_input_ids: None._VariableFunctionsClass.tensor, mt_attention_mask: None._VariableFunctionsClass.tensor, ref_input_ids: None._VariableFunctionsClass.tensor, ref_attention_mask: None._VariableFunctionsClass.tensor, **kwargs) → Dict[str, torch.Tensor][source]

Same as torch.nn.Module.forward().

Parameters

*args – Whatever you decide to pass into the forward method.
**kwargs – Keyword arguments are also possible.

Returns

Your model’s output

prepare_sample(sample: List[Dict[str, Union[str, float]]], inference: bool = False) → Union[Tuple[Dict[str, torch.Tensor], Dict[str, torch.Tensor]], Dict[str, torch.Tensor]][source]

Function that prepares a sample to input the model.

Parameters

sample – list of dictionaries.
inference – If set to true prepares only the model inputs.

Returns

Tuple with 2 dictionaries (model inputs and targets). If inference=True returns only the model inputs.

read_csv(path: str) → List[dict][source]

Reads a comma separated value file.

Parameters: path – path to a csv file.
Returns: List of records as dictionaries

ReferencelessRegression

Referenceless Regression Metric that learns to predict a quality assessment by looking at source and translation.

class comet.models.regression.referenceless.ReferencelessRegression(nr_frozen_epochs: Union[float, int] = 0.3, keep_embeddings_frozen: bool = False, optimizer: str = 'AdamW', encoder_learning_rate: float = 1e-05, learning_rate: float = 3e-05, layerwise_decay: float = 0.95, encoder_model: str = 'XLM-RoBERTa', pretrained_model: str = 'xlm-roberta-base', pool: str = 'avg', layer: Union[str, int] = 'mix', dropout: float = 0.1, batch_size: int = 4, train_data: Optional[str] = None, validation_data: Optional[str] = None, hidden_sizes: List[int] = [1024], activations: str = 'Tanh', final_activation: Optional[str] = None, load_weights_from_checkpoint: Optional[str] = None)[source]

ReferencelessRegression:

Parameters

nr_frozen_epochs – Number of epochs (% of epoch) that the encoder is frozen.
keep_embeddings_frozen – Keeps the encoder frozen during training.
optimizer – Optimizer used during training.
encoder_learning_rate – Learning rate used to fine-tune the encoder model.
learning_rate – Learning rate used to fine-tune the top layers.
layerwise_decay – Learning rate % decay from top-to-bottom encoder layers.
encoder_model – Encoder model to be used.
pretrained_model – Pretrained model from Hugging Face.
pool – Pooling strategy to derive a sentence embedding [‘cls’, ‘max’, ‘avg’].
layer – Encoder layer to be used (‘mix’ for pooling info from all layers.)
dropout – Dropout used in the top-layers.
batch_size – Batch size used during training.
train_data – Path to a csv file containing the training data.
validation_data – Path to a csv file containing the validation data.
hidden_sizes – Hidden sizes for the Feed Forward regression.
activations – Feed Forward activation function.
load_weights_from_checkpoint – Path to a checkpoint file.

forward(src_input_ids: None._VariableFunctionsClass.tensor, src_attention_mask: None._VariableFunctionsClass.tensor, mt_input_ids: None._VariableFunctionsClass.tensor, mt_attention_mask: None._VariableFunctionsClass.tensor, **kwargs) → Dict[str, torch.Tensor][source]

Same as torch.nn.Module.forward().

Parameters

*args – Whatever you decide to pass into the forward method.
**kwargs – Keyword arguments are also possible.

Returns

Your model’s output

prepare_sample(sample: List[Dict[str, Union[str, float]]], inference: bool = False) → Union[Tuple[Dict[str, torch.Tensor], Dict[str, torch.Tensor]], Dict[str, torch.Tensor]][source]

Function that prepares a sample to input the model.

Parameters

sample – list of dictionaries.
inference – If set to true prepares only the model inputs.

Returns

Tuple with 2 dictionaries (model inputs and targets). If inference=True returns only the model inputs.

read_csv(path: str) → List[dict][source]

Reads a comma separated value file.

Parameters: path – path to a csv file.
Returns: List of records as dictionaries

Translation Ranking Model

Ranking Metric

Translation Ranking metric was introduced by
[Rei, et al. 2020](https://aclanthology.org/2020.emnlp-main.213/)

and it is trained on top of Direct Assessment Relative Ranks (DARR) to encode good translations closer to the anchors (source & reference) than worse translations.

class comet.models.ranking.ranking_metric.RankingMetric(nr_frozen_epochs: Union[float, int] = 0.05, keep_embeddings_frozen: bool = False, optimizer: str = 'AdamW', encoder_learning_rate: float = 1e-05, learning_rate: float = 3e-05, layerwise_decay: float = 0.95, encoder_model: str = 'XLM-RoBERTa', pretrained_model: str = 'xlm-roberta-base', pool: str = 'avg', layer: Union[str, int] = 'mix', dropout: float = 0.1, batch_size: int = 8, train_data: Optional[str] = None, validation_data: Optional[str] = None, load_weights_from_checkpoint: Optional[str] = None)[source]

Parameters

nr_frozen_epochs – Number of epochs (% of epoch) that the encoder is frozen.
keep_embeddings_frozen – Keeps the encoder frozen during training.
optimizer – Optimizer used during training.
encoder_learning_rate – Learning rate used to fine-tune the encoder model.
learning_rate – Learning rate used to fine-tune the top layers.
layerwise_decay – Learning rate % decay from top-to-bottom encoder layers.
encoder_model – Encoder model to be used.
pretrained_model – Pretrained model from Hugging Face.
pool – Pooling strategy to derive a sentence embedding [‘cls’, ‘max’, ‘avg’].
layer – Encoder layer to be used (‘mix’ for pooling info from all layers.)
dropout – Dropout used in the top-layers.
batch_size – Batch size used during training.
train_data – Path to a csv file containing the training data.
validation_data – Path to a csv file containing the validation data.
load_weights_from_checkpoint – Path to a checkpoint file.

configure_optimizers() → Tuple[List[torch.optim.optimizer.Optimizer], List[torch.optim.lr_scheduler.LambdaLR]][source]: Sets the optimizers to be used during training.

forward(src_input_ids: None._VariableFunctionsClass.tensor, ref_input_ids: None._VariableFunctionsClass.tensor, pos_input_ids: None._VariableFunctionsClass.tensor, neg_input_ids: None._VariableFunctionsClass.tensor, src_attention_mask: None._VariableFunctionsClass.tensor, ref_attention_mask: None._VariableFunctionsClass.tensor, pos_attention_mask: None._VariableFunctionsClass.tensor, neg_attention_mask: None._VariableFunctionsClass.tensor, **kwargs) → Dict[str, torch.Tensor][source]

Same as torch.nn.Module.forward().

Parameters

*args – Whatever you decide to pass into the forward method.
**kwargs – Keyword arguments are also possible.

Returns

Your model’s output

predict_step(batch: Dict[str, torch.Tensor], batch_idx: Optional[int] = None, dataloader_idx: Optional[int] = None) → List[float][source]

Runs one prediction step and returns the predicted values.

Parameters

batch – The output of your prepare_sample function.
batch_nb – Integer displaying which batch this is.
dataloader_idx – Integer displaying which dataloader this is.

read_csv(path: str, regression: bool = False) → List[dict][source]

Reads a comma separated value file.

Parameters: path – path to a csv file.
Returns: List of records as dictionaries

training_step(batch: Tuple[Dict[str, torch.Tensor], Dict[str, torch.Tensor]], batch_nb: int) → Dict[str, torch.Tensor][source]

Runs one training step. This usually consists in the forward function followed by the loss function.

Parameters

batch – The output of your prepare_sample function.
batch_nb – Integer displaying which batch this is.

Returns

dictionary containing the loss and the metrics to be added to the lightning logger.

validation_step(batch: Tuple[Dict[str, torch.Tensor], Dict[str, torch.Tensor]], batch_nb: int, dataloader_idx: int) → Dict[str, torch.Tensor][source]

Similar to the training step but with the model in eval mode.

Parameters

batch – The output of your prepare_sample function.
batch_nb – Integer displaying which batch this is.
dataloader_idx – Integer displaying which dataloader this is.

Returns

dictionary passed to the validation_end function.

Auxiliary Modules

Feed Forward

Feed Forward Neural Network module that can be used for classification or regression

class comet.modules.feedforward.FeedForward(in_dim: int, out_dim: int = 1, hidden_sizes: List[int] = [3072, 768], activations: str = 'Sigmoid', final_activation: Optional[str] = None, dropout: float = 0.1)[source]

Feed Forward Neural Network.

Parameters

in_dim – Number input features.
out_dim – Number of output features. Default is just a score.
hidden_sizes – List with hidden layer sizes.
activations – Name of the activation function to be used in the hidden layers.
final_activation – Name of the final activation function if any.
dropout – dropout to be used in the hidden layers.

forward(in_features: torch.Tensor) → torch.Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Layer-Wise Attention Mechanism

Computes a parameterised scalar mixture of N tensors,
mixture = gamma * sum(s_k * tensor_k)

where s = softmax(w), with w and gamma scalar parameters.

If layer_norm=True then apply layer normalization.

If dropout > 0, then for each scalar weight, adjust its softmax weight mass to 0 with the dropout probability (i.e., setting the unnormalized weight to -inf). This effectively should redistribute dropped probability mass to all other weights.

Original implementation:

https://github.com/Hyperparticle/udify

class comet.modules.layerwise_attention.LayerwiseAttention(num_layers: int, layer_norm: bool = False, layer_weights: Optional[List[int]] = None, dropout: Optional[float] = None)[source]

forward(tensors: List[torch.Tensor], mask: Optional[torch.Tensor] = None) → torch.Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.