# Train your own Metric To train your own metric we recommend you to install directly from source: ```bash git clone https://github.com/Unbabel/COMET.git poetry install ``` After having your repo locally installed you can train your own model/metric with the following command: ```bash comet-score -s src.de -t hyp1.en -r ref.en --model PATH/TO/CHECKPOINT ``` You can also upload your model to [Hugging Face Hub](https://huggingface.co/docs/hub/index). Use [`Unbabel/wmt22-comet-da`](https://huggingface.co/Unbabel/wmt22-comet-da) as example. Then you can use your model directly from the hub. ## Config Files In COMET uses [PyTorch-Lightning](https://pytorch-lightning.readthedocs.io/en/stable/) to train models. With that said, YAML files will be used to initialize various Lightning objects. Config files for Lightning classes: - [trainer.yaml: ](https://github.com/Unbabel/COMET/blob/master/configs/trainer.yaml) used to initialize [Pytorch-Lightning Trainer](https://pytorch-lightning.readthedocs.io/en/stable/common/trainer.html#trainer-class-api/). - [model_checkpoint.yaml: ](https://github.com/Unbabel/COMET/blob/master/configs/model_checkpoint.yaml) used to initialize [Pytorch-Lightning ModelCheckpoint](https://pytorch-lightning.readthedocs.io/en/stable/api/pytorch_lightning.callbacks.ModelCheckpoint.html?highlight=ModelCheckpoint#modelcheckpoint/). - [early_stopping.yaml: ](https://github.com/Unbabel/COMET/blob/master/configs/early_stopping.yaml) used to initialize [Pytorch-Lightning EarlyStopping](https://pytorch-lightning.readthedocs.io/en/stable/api/pytorch_lightning.callbacks.EarlyStopping.html#pytorch_lightning.callbacks.EarlyStopping/). Then after setting up the these Lightning classes you can setup your model architecture. There are 4 different model architectures: - [RegressionMetric: ](https://github.com/Unbabel/COMET/blob/master/comet/models/regression/regression_metric.py#L32>) used to build metrics that regress on a score given a source, hypothesis and reference. - [ReferencelessRegression: ](https://github.com/Unbabel/COMET/blob/master/comet/models/regression/referenceless.py#L30) used to build metrics that regress on a score **without a reference translation!** (using only the source, hypothesis). - [RankingMetric: ](https://github.com/Unbabel/COMET/blob/master/comet/models/ranking/ranking_metric.py#L36>) used to build metrics that learn to rank *good* translations above *bad* transations. - [UnifiedMetric: ](https://github.com/Unbabel/COMET/blob/master/comet/models/multitask/unified_metric.py#L39>) was proposed in [(Wan et al., ACL 2022)](https://aclanthology.org/2022.acl-long.558.pdf) and it is closely related to [BLEURT](https://aclanthology.org/2020.acl-main.704/) and [OpenKiwi](https://aclanthology.org/P19-3020/) models. This model can be trained with and without references For each class you can find a config example in [configs/models/](https://github.com/Unbabel/COMET/tree/master/configs/models). The `init_args` will then be used to initialize your model/metric. ## Input Data To train your models you need to pass a train set and a validation set using the `training_data` and `validation_data` arguments respectively. Depending on the underlying models your data need to be formatted differently. RegressionMetrics expect the following format: | src | mt | ref | score | | :----: | :----: | :----: | :----: | | isto é um exemplo | this is a example | this is an example | 0.2 | For ReferencelessRegression you can drop the `ref` column but, if passed, it is ignored. Finally, Ranking Metrics expect two contrastive examples. E.g: | src | neg | pos | ref | | :----: | :----: | :----: | :----: | | isto é um exemplo | this is a example | this is an example | this is an example | where `pos` column contains a postive sample and `neg` a negative sample. You can check the available [data from previous WMT editions in here](https://unbabel.github.io/COMET/html/faqs.html#where-can-i-find-the-data-used-to-train-comet-models). ## Available Encoders All COMET models depend on an underlying encoder. We currently support the following encoders: - BERT - XLM-RoBERTa - MiniLM - XLM-RoBERTa-XL - RemBERT You can change the underlying encoder architecture using the ``encoder_model`` argument in your config file. Then, you can select any compatible model from [HuggingFace Transformers](https://huggingface.co/models) using the ``pretrained_model`` argument.