COMET Metrics

Since COMET was released we have been training and releasing different models. In this page we will try to briefly explain the underlying differences and point you to the papers that used them.

Model Architectures:

All COMET metrics follow one of the following architectures:

Model Architectures

  1. Regression Metric (top-left diagram): This is the architecture that most models use. This model is trained on a regression task using source, MT and reference.

  2. Ranking Metric (top-middle diagram): Models that follow this architecture are trained in a Translation Ranking Task using a Triple Margin Loss. This means that the model will learn to optimize the embedding space to encode good translations closer to the anchors (source and reference) while pushing bad translations away.

  3. Referenceless Metric (top-right diagram): This architecture resembles architecture 1) but it does not use the reference translation! This is purely a Quality Estimation system.

  4. Unified Metric (bottom diagram): Unified architecture was proposed in (Wan et al., ACL 2022) and it is closely related to BLEURT and OpenKiwi models. This model can be trained with and without references

Available Evaluation Models

The two main COMET models are:

  • Default model: Unbabel/wmt22-comet-da - This model uses a reference-based regression approach and is built on top of XLM-R. It has been trained on direct assessments from WMT17 to WMT20 and provides scores ranging from 0 to 1, where 1 represents a perfect translation.

  • Upcoming model: Unbabel/wmt22-cometkiwi-da - This reference-free model uses a regression approach and is built on top of InfoXLM. It has been trained on direct assessments from WMT17 to WMT20, as well as direct assessments from the MLQE-PE corpus. Like the default model, it also provides scores ranging from 0 to 1.

These two models were part of the final ensemble used in our WMT22 Metrics and QE shared tasks.

For versions prior to 2.0, you can still use Unbabel/wmt20-comet-da, which is the primary metric, and Unbabel/Unbabel/wmt20-comet-qe-da for the respective reference-free version.

All other models developed through the years can be accessed through the following links:

Model Download Link Paper
emnlp20-comet-rank 🔗 🔗
wmt20-comet-qe-da 🔗 🔗
wmt21-comet-da 🔗 🔗
wmt21-comet-mqm 🔗 🔗
wmt21-comet-qe-da 🔗 🔗
wmt21-comet-qe-mqm 🔗 🔗
wmt21-comet-qe-da 🔗 🔗
wmt21-cometinho-mqm 🔗 🔗
wmt21-cometinho-da 🔗 🔗
eamt22-cometinho-da 🔗 🔗
eamt22-prune-comet-da 🔗 🔗

Example :

wget https://unbabel-experimental-models.s3.amazonaws.com/comet/eamt22/eamt22-cometinho-da.tar.gz
tar -xf eamt22-cometinho-da.tar.gz
comet-score -s src.de -t hyp1.en -r ref.en --model eamt22-cometinho-da/checkpoints/model.ckpt