kiwi.lib.evaluate
OutputConfig
Base class for all pydantic configs. Used to configure base behaviour of configs.
Configuration
MetricsReport
evaluate_from_configuration(configuration_dict: Dict[str, Any])
evaluate_from_configuration
Evaluate a model’s predictions based on the flags received from the configuration
run(config: Configuration) → MetricsReport
run
Runs the evaluation pipeline for evaluating a model’s predictions. Essentially
retrieve_gold_standard(config: OutputConfig)
retrieve_gold_standard
normalize_prediction_files(predicted_files_config: List[OutputConfig], predicted_dir_config: List[Path])
normalize_prediction_files
split_wmt18_tags(tags: List[List[Any]])
split_wmt18_tags
Split tags list of lists in WMT18 format into target and gap tags.
read_sentence_scores_file(sent_file)
read_sentence_scores_file
Read file with numeric scores for sentences.
to_numeric_values(predictions: Union[str, List[str], List[List[str]]]) → Union[int, float, List[int], List[float], List[List[int]], List[List[float]]]
to_numeric_values
Convert text labels or string probabilities (for BAD) to int or float values,
to_numeric_binary_labels(predictions: Union[str, float, List[str], List[List[str]], List[float], List[List[float]]], threshold: float = 0.5)
to_numeric_binary_labels
Generate numeric labels from text labels or probabilities (for BAD).
report_lengths_mismatch(gold, prediction)
report_lengths_mismatch
Checks if the number of gold and predictions labels match. Prints a warning and
lengths_match(gold, prediction)
lengths_match
Checks if the number of gold and predictions labels match. Returns false if they
word_level_scores(true_targets, predicted_targets, labels=const.LABELS)
word_level_scores
eval_word_level(true_targets, predictions: Dict[str, List[List[int]]]) → np.ndarray
eval_word_level
sentence_level_scores(true_targets: List[float], predicted_targets: List[float]) → Tuple[Tuple, Tuple]
sentence_level_scores
eval_sentence_level(true_targets, predictions: Dict[str, List[float]]) → Tuple[np.ndarray, np.ndarray]
eval_sentence_level
_extract_path_prefix(file_names)
_extract_path_prefix
kiwi.lib.evaluate.
logger
Bases: kiwi.data.datasets.wmt_qe_dataset.OutputConfig
kiwi.data.datasets.wmt_qe_dataset.OutputConfig
gap_tags
Path to label file for gaps (only for predictions).
targetgaps_tags
Path to label file for target+gaps (only for predictions).
Bases: kiwi.utils.io.BaseConfig
kiwi.utils.io.BaseConfig
gold_files
predicted_files
predicted_dir
One or more directories from where to read predicted files (using standard output names.
verbose
quiet
ensure_list
check_consistency
word_scores
sentence_scores
add_word_level_scores
add_sentence_level_scores
print_scores_table
__str__
Return str(self).
_scores_str
Evaluate a model’s predictions based on the flags received from the configuration files.
Refer to configuration for a list of available configuration flags for the evaluate pipeline.
configuration_dict – options read from file or CLI
Runs the evaluation pipeline for evaluating a model’s predictions. Essentially calculating metrics using gold_targets and prediction_files.
Refer to configuration for a list of available options for this pipeline.
config – Configuration Namespace
Object with information for both word and sentence level metrics
Convert text labels or string probabilities (for BAD) to int or float values, respectively.
Checks if the number of gold and predictions labels match. Prints a warning and returns false if they do not.
gold – list of gold labels
prediction – list of predicted labels
True if all lenghts match, False if not
bool
do not.
kiwi.lib
kiwi.lib.predict