kiwi.lib.evaluate

Module Contents

Classes

OutputConfig

Base class for all pydantic configs. Used to configure base behaviour of configs.

Configuration

Base class for all pydantic configs. Used to configure base behaviour of configs.

MetricsReport

Functions

evaluate_from_configuration(configuration_dict: Dict[str, Any])

Evaluate a model’s predictions based on the flags received from the configuration

run(config: Configuration) → MetricsReport

Runs the evaluation pipeline for evaluating a model’s predictions. Essentially

retrieve_gold_standard(config: OutputConfig)

normalize_prediction_files(predicted_files_config: List[OutputConfig], predicted_dir_config: List[Path])

split_wmt18_tags(tags: List[List[Any]])

Split tags list of lists in WMT18 format into target and gap tags.

read_sentence_scores_file(sent_file)

Read file with numeric scores for sentences.

to_numeric_values(predictions: Union[str, List[str], List[List[str]]]) → Union[int, float, List[int], List[float], List[List[int]], List[List[float]]]

Convert text labels or string probabilities (for BAD) to int or float values,

to_numeric_binary_labels(predictions: Union[str, float, List[str], List[List[str]], List[float], List[List[float]]], threshold: float = 0.5)

Generate numeric labels from text labels or probabilities (for BAD).

report_lengths_mismatch(gold, prediction)

Checks if the number of gold and predictions labels match. Prints a warning and

lengths_match(gold, prediction)

Checks if the number of gold and predictions labels match. Returns false if they

word_level_scores(true_targets, predicted_targets, labels=const.LABELS)

eval_word_level(true_targets, predictions: Dict[str, List[List[int]]]) → np.ndarray

sentence_level_scores(true_targets: List[float], predicted_targets: List[float]) → Tuple[Tuple, Tuple]

eval_sentence_level(true_targets, predictions: Dict[str, List[float]]) → Tuple[np.ndarray, np.ndarray]

_extract_path_prefix(file_names)

kiwi.lib.evaluate.logger
class kiwi.lib.evaluate.OutputConfig

Bases: kiwi.data.datasets.wmt_qe_dataset.OutputConfig

Base class for all pydantic configs. Used to configure base behaviour of configs.

gap_tags :Optional[FilePath]

Path to label file for gaps (only for predictions).

targetgaps_tags :Optional[FilePath]

Path to label file for target+gaps (only for predictions).

class kiwi.lib.evaluate.Configuration

Bases: kiwi.utils.io.BaseConfig

Base class for all pydantic configs. Used to configure base behaviour of configs.

gold_files :wmt_qe_dataset.OutputConfig
predicted_files :Optional[List[OutputConfig]]
predicted_dir :Optional[List[Path]]

One or more directories from where to read predicted files (using standard output names.

verbose :bool = False
quiet :bool = False
ensure_list(cls, v)
check_consistency(cls, v, values)
class kiwi.lib.evaluate.MetricsReport
word_scores :Dict[str, np.ndarray]
sentence_scores :Dict[str, np.ndarray]
add_word_level_scores(self, name: str, scores: np.ndarray)
add_sentence_level_scores(self, name: str, scores: np.ndarray)
print_scores_table(self)
__str__(self)

Return str(self).

static _scores_str(scores: np.ndarray)str
kiwi.lib.evaluate.evaluate_from_configuration(configuration_dict: Dict[str, Any])

Evaluate a model’s predictions based on the flags received from the configuration files.

Refer to configuration for a list of available configuration flags for the evaluate pipeline.

Parameters

configuration_dict – options read from file or CLI

kiwi.lib.evaluate.run(config: Configuration)MetricsReport

Runs the evaluation pipeline for evaluating a model’s predictions. Essentially calculating metrics using gold_targets and prediction_files.

Refer to configuration for a list of available options for this pipeline.

Parameters

config – Configuration Namespace

Returns

Object with information for both word and sentence level metrics

Return type

MetricsReport

kiwi.lib.evaluate.retrieve_gold_standard(config: OutputConfig)
kiwi.lib.evaluate.normalize_prediction_files(predicted_files_config: List[OutputConfig], predicted_dir_config: List[Path])
kiwi.lib.evaluate.split_wmt18_tags(tags: List[List[Any]])

Split tags list of lists in WMT18 format into target and gap tags.

kiwi.lib.evaluate.read_sentence_scores_file(sent_file)

Read file with numeric scores for sentences.

kiwi.lib.evaluate.to_numeric_values(predictions: Union[str, List[str], List[List[str]]]) → Union[int, float, List[int], List[float], List[List[int]], List[List[float]]]

Convert text labels or string probabilities (for BAD) to int or float values, respectively.

kiwi.lib.evaluate.to_numeric_binary_labels(predictions: Union[str, float, List[str], List[List[str]], List[float], List[List[float]]], threshold: float = 0.5)

Generate numeric labels from text labels or probabilities (for BAD).

kiwi.lib.evaluate.report_lengths_mismatch(gold, prediction)

Checks if the number of gold and predictions labels match. Prints a warning and returns false if they do not.

Parameters
  • gold – list of gold labels

  • prediction – list of predicted labels

Returns

True if all lenghts match, False if not

Return type

bool

kiwi.lib.evaluate.lengths_match(gold, prediction)
Checks if the number of gold and predictions labels match. Returns false if they

do not.

Parameters
  • gold – list of gold labels

  • prediction – list of predicted labels

Returns

True if all lenghts match, False if not

Return type

bool

kiwi.lib.evaluate.word_level_scores(true_targets, predicted_targets, labels=const.LABELS)
kiwi.lib.evaluate.eval_word_level(true_targets, predictions: Dict[str, List[List[int]]]) → np.ndarray
kiwi.lib.evaluate.sentence_level_scores(true_targets: List[float], predicted_targets: List[float]) → Tuple[Tuple, Tuple]
kiwi.lib.evaluate.eval_sentence_level(true_targets, predictions: Dict[str, List[float]]) → Tuple[np.ndarray, np.ndarray]
kiwi.lib.evaluate._extract_path_prefix(file_names)