`kiwi.lib.evaluate`¶

Module Contents¶

Classes¶

`OutputConfig`	Base class for all pydantic configs. Used to configure base behaviour of configs.
`Configuration`	Base class for all pydantic configs. Used to configure base behaviour of configs.
`MetricsReport`

Functions¶

`evaluate_from_configuration`(configuration_dict: Dict[str, Any])	Evaluate a model’s predictions based on the flags received from the configuration
`run`(config: Configuration) → MetricsReport	Runs the evaluation pipeline for evaluating a model’s predictions. Essentially
`retrieve_gold_standard`(config: OutputConfig)
`normalize_prediction_files`(predicted_files_config: List[OutputConfig], predicted_dir_config: List[Path])
`split_wmt18_tags`(tags: List[List[Any]])	Split tags list of lists in WMT18 format into target and gap tags.
`read_sentence_scores_file`(sent_file)	Read file with numeric scores for sentences.
`to_numeric_values`(predictions: Union[str, List[str], List[List[str]]]) → Union[int, float, List[int], List[float], List[List[int]], List[List[float]]]	Convert text labels or string probabilities (for BAD) to int or float values,
`to_numeric_binary_labels`(predictions: Union[str, float, List[str], List[List[str]], List[float], List[List[float]]], threshold: float = 0.5)	Generate numeric labels from text labels or probabilities (for BAD).
`report_lengths_mismatch`(gold, prediction)	Checks if the number of gold and predictions labels match. Prints a warning and
`lengths_match`(gold, prediction)	Checks if the number of gold and predictions labels match. Returns false if they
`word_level_scores`(true_targets, predicted_targets, labels=const.LABELS)
`eval_word_level`(true_targets, predictions: Dict[str, List[List[int]]]) → np.ndarray
`sentence_level_scores`(true_targets: List[float], predicted_targets: List[float]) → Tuple[Tuple, Tuple]
`eval_sentence_level`(true_targets, predictions: Dict[str, List[float]]) → Tuple[np.ndarray, np.ndarray]
`_extract_path_prefix`(file_names)

kiwi.lib.evaluate.logger¶

class kiwi.lib.evaluate.OutputConfig¶

Bases: kiwi.data.datasets.wmt_qe_dataset.OutputConfig

Base class for all pydantic configs. Used to configure base behaviour of configs.

gap_tags :Optional[FilePath]¶: Path to label file for gaps (only for predictions).

targetgaps_tags :Optional[FilePath]¶: Path to label file for target+gaps (only for predictions).

class kiwi.lib.evaluate.Configuration¶

Bases: kiwi.utils.io.BaseConfig

Base class for all pydantic configs. Used to configure base behaviour of configs.

gold_files :wmt_qe_dataset.OutputConfig¶

predicted_files :Optional[List[OutputConfig]]¶

predicted_dir :Optional[List[Path]]¶: One or more directories from where to read predicted files (using standard output names.

verbose :bool = False¶

quiet :bool = False¶

ensure_list(cls, v)¶

check_consistency(cls, v, values)¶

class kiwi.lib.evaluate.MetricsReport¶

word_scores :Dict[str, np.ndarray]¶

sentence_scores :Dict[str, np.ndarray]¶

add_word_level_scores(self, name: str, scores: np.ndarray)¶

add_sentence_level_scores(self, name: str, scores: np.ndarray)¶

print_scores_table(self)¶

__str__(self)¶: Return str(self).

static _scores_str(scores: np.ndarray) → str ¶

kiwi.lib.evaluate.evaluate_from_configuration(configuration_dict: Dict[str, Any])¶

Evaluate a model’s predictions based on the flags received from the configuration files.

Refer to configuration for a list of available configuration flags for the evaluate pipeline.

Parameters: configuration_dict – options read from file or CLI

kiwi.lib.evaluate.run(config: Configuration) → MetricsReport ¶

Runs the evaluation pipeline for evaluating a model’s predictions. Essentially calculating metrics using gold_targets and prediction_files.

Refer to configuration for a list of available options for this pipeline.

Parameters: config – Configuration Namespace
Returns: Object with information for both word and sentence level metrics
Return type: MetricsReport

kiwi.lib.evaluate.retrieve_gold_standard(config: OutputConfig)¶

kiwi.lib.evaluate.normalize_prediction_files(predicted_files_config: List[OutputConfig], predicted_dir_config: List[Path])¶

kiwi.lib.evaluate.split_wmt18_tags(tags: List[List[Any]])¶: Split tags list of lists in WMT18 format into target and gap tags.

kiwi.lib.evaluate.read_sentence_scores_file(sent_file)¶: Read file with numeric scores for sentences.

kiwi.lib.evaluate.to_numeric_values(predictions: Union[str, List[str], List[List[str]]]) → Union[int, float, List[int], List[float], List[List[int]], List[List[float]]]¶: Convert text labels or string probabilities (for BAD) to int or float values, respectively.

kiwi.lib.evaluate.to_numeric_binary_labels(predictions: Union[str, float, List[str], List[List[str]], List[float], List[List[float]]], threshold: float = 0.5)¶: Generate numeric labels from text labels or probabilities (for BAD).

kiwi.lib.evaluate.report_lengths_mismatch(gold, prediction)¶

Checks if the number of gold and predictions labels match. Prints a warning and returns false if they do not.

Parameters

gold – list of gold labels
prediction – list of predicted labels

Returns

True if all lenghts match, False if not

Return type

bool

kiwi.lib.evaluate.lengths_match(gold, prediction)¶

Checks if the number of gold and predictions labels match. Returns false if they: do not.