kiwi.models.linear package¶

Submodules¶

kiwi.models.linear.label_dictionary module¶

This implements a dictionary of labels.

class kiwi.models.linear.label_dictionary.LabelDictionary(label_names=None)[source]¶

Bases: dict

This class implements a dictionary of labels. Labels as mapped to integers, and it is efficient to retrieve the label name from its integer representation, and vice-versa.

add(name)[source]¶: Add new label.

get_label_id(name)[source]¶: Get label id from name.

get_label_name(label_id)[source]¶: Get label name from id.

load(label_file)[source]¶: Load labels from a file.

save(label_file)[source]¶: Save labels to a file.

kiwi.models.linear.linear_model module¶

This implements a linear model.

class kiwi.models.linear.linear_model.LinearModel[source]¶

Bases: object

An abstract linear model.

clear()[source]¶: Clear all weights.

compute_score(features)[source]¶: Compute a score by taking the inner product with a feature vector.

compute_score_binary_features(binary_features)[source]¶: Compute a score by taking the inner product with a binary feature vector.

finalize(t)[source]¶: Finalize by setting the weights as the running average. This is a no-op if use_average=False.

load(model_file, average=False, feature_indices=None)[source]¶: Load the model from a file.

make_gradient_step(features, eta, t, gradient)[source]¶: Make a gradient step with stepsize eta.

read_fnames(fnames_file)[source]¶: Read file mapping from integers to feature descriptions.

save(model_file, average=False, feature_indices=None)[source]¶: Save the model to a file.

write_fnames(fnames_file, fnames)[source]¶: Write file mapping from integers to feature descriptions.

kiwi.models.linear.linear_trainer module¶

A generic implementation of a basic trainer.

class kiwi.models.linear.linear_trainer.LinearTrainer(classifier, checkpointer, algorithm='svm_mira', regularization_constant=1000000000000.0)[source]¶

Bases: object

run(train_iterator, valid_iterator, epochs=50)[source]¶: Train with a general online algorithm.

save(output_directory)[source]¶

kiwi.models.linear.linear_word_qe_decoder module¶

Decoder for word-level quality estimation.

class kiwi.models.linear.linear_word_qe_decoder.LinearWordQEDecoder(estimator, cost_false_positives=0.5, cost_false_negatives=0.5)[source]¶

Bases: kiwi.models.linear.structured_decoder.StructuredDecoder

A decoder for word-level quality estimation.

decode(instance, parts, scores)[source]¶: Decoder. Return the most likely sequence of OK/BAD labels.

decode_mira(instance, parts, scores, gold_outputs, old_mira=False)[source]¶

Cost-augmented decoder. Allows a compromise between precision and recall. In general: p = a - (a+b)*z0 q = b*sum(z0) p’*z + q = a*sum(z) - (a+b)*z0’*z + b*sum(z0)

= a*(1-z0)’*z + b*(1-z)’*z0

a => penalty for predicting 1 when it is 0 (FP) b => penalty for predicting 0 when it is 1 (FN)

F1: a = 0.5, b = 0.5 recall: a = 0, b = 1

decode_with_bigrams(instance, parts, scores)[source]¶: Decoder for a sequential model (with bigrams).

decode_with_unigrams(instance, parts, scores)[source]¶: Decoder for a non-sequential model (unigrams only).

run_viterbi(initial_scores, transition_scores, final_scores, emission_scores)[source]¶: Computes the viterbi trellis for a given sequence. Receives: - Initial scores: (num_states) array - Transition scores: (length-1, num_states, num_states) array - Final scores: (num_states) array - Emission scores: (length, num_states) array.

kiwi.models.linear.linear_word_qe_decoder.logzero()[source]¶: Return log of zero.

kiwi.models.linear.linear_word_qe_features module¶

A class for handling features for word-level quality estimation.

class kiwi.models.linear.linear_word_qe_features.LinearWordQEFeatures(use_basic_features_only=True, use_simple_bigram_features=True, use_parse_features=False, use_stacked_features=False, save_to_cache=False, load_from_cache=False, cached_features_file=None)[source]¶

Bases: kiwi.models.linear.sparse_feature_vector.SparseFeatureVector

This class implements a feature vector for word-level quality estimation.

compute_bigram_features(sentence_word_features, part)[source]¶: Compute bigram features (that depend on consecutive labels).

compute_unigram_features(sentence_word_features, part)[source]¶: Compute unigram features (depending only on a single label).

get_head(sentence_word_features, index)[source]¶

get_siblings(sentence_word_features, index)[source]¶

kiwi.models.linear.linear_word_qe_features.quantize(value, bins_down)[source]¶: Quantize a numeric feature into bins. Example: bins = [50, 40, 30, 25, 20, 18, 16, 14, 12, 10].

kiwi.models.linear.linear_word_qe_sentence module¶

class kiwi.models.linear.linear_word_qe_sentence.LinearWordQESentence[source]¶

Bases: object

Represents a sentence (word features and their labels).

create_from_sentence_pair(source_words, target_words, alignments, source_pos_tags=None, target_pos_tags=None, target_parse_heads=None, target_parse_relations=None, target_ngram_left=None, target_ngram_right=None, target_stacked_features=None, labels=None)[source]¶: Creates an instance from source/target token and alignment information.

static create_stop_symbol()[source]¶: Generates dummy features for a stop symbol.

num_words()[source]¶: Returns the number of words of the sentence.

class kiwi.models.linear.linear_word_qe_sentence.LinearWordQETokenFeatures(stacked_features=None, source_token_count=-1, target_token_count=-1, source_target_token_count_ratio=0.0, token='', left_context='', right_context='', first_aligned_token='', left_alignment='', right_alignment='', is_stopword=False, is_punctuation=False, is_proper_noun=False, is_digit=False, highest_order_ngram_left=-1, highest_order_ngram_right=-1, backoff_behavior_left=0.0, backoff_behavior_middle=0.0, backoff_behavior_right=0.0, source_highest_order_ngram_left=-1, source_highest_order_ngram_right=-1, pseudo_reference=False, target_pos='', target_morph='', target_head=-1, target_deprel='', aligned_source_pos_list='', polysemy_count_source=0, polysemy_count_target=0)[source]¶: Bases: object

kiwi.models.linear.sequence_parts module¶

class kiwi.models.linear.sequence_parts.SequenceBigramPart(index, label, previous_label)[source]¶

Bases: object

A part for bigrams (two labels at consecutive words position). Necessary for the model to be sequential.

class kiwi.models.linear.sequence_parts.SequenceUnigramPart(index, label)[source]¶

Bases: object

A part for unigrams (a single label at a word position).

kiwi.models.linear.sparse_feature_vector module¶

This defines the class for defining sparse features in linear models.

class kiwi.models.linear.sparse_feature_vector.SparseBinaryFeatureVector(feature_indices=None, save_to_cache=False, load_from_cache=False, cached_features_file=None)[source]¶

Bases: list

A generic class for a sparse binary feature vector.

add_binary_feature(name)[source]¶: Add a binary feature.

add_categorical_feature(name, value)[source]¶: Add a categorical feature, represented internally as a binary feature.

load_cached_features()[source]¶: Load features from file.

save_cached_features()[source]¶: Save features to file.

to_sparse_vector()[source]¶: Convert to a SparseVector.

class kiwi.models.linear.sparse_feature_vector.SparseFeatureVector(save_to_cache=False, load_from_cache=False, cached_features_file=None)[source]¶

Bases: kiwi.models.linear.sparse_vector.SparseVector

A generic class for a sparse feature vector.

add_binary_feature(name)[source]¶: Add a binary feature.

add_categorical_feature(name, value, allow_duplicates=False)[source]¶: Add a categorical feature, represented internally as a binary feature.

add_numeric_feature(name, value)[source]¶: Add a numeric feature.

load_cached_features()[source]¶: Load features from file.

save_cached_features()[source]¶: Save features to file.

kiwi.models.linear.sparse_vector module¶

This defines a generic class for sparse vectors.

class kiwi.models.linear.sparse_vector.SparseVector[source]¶

Bases: dict

Implementation of a sparse vector using a dictionary.

add(vector, scalar=1.0)[source]¶: Adds this vector and a given vector.

add_constant(scalar)[source]¶: Adds a constant to each element of the vector.

as_string()[source]¶: Returns a string representation.

copy()[source]¶: Returns a copy of the current vector.

dot_product(vector)[source]¶: Computes the dot product with a given vector. Note: this iterates through the self vector, so it may be inefficient if the number of nonzeros in self is much larger than the number of nonzeros in vector. Hence the function reverts to vector.dot_product(self) if that is beneficial.

load(f, dtype=<class 'str'>)[source]¶: Load vector from file.

normalize()[source]¶: Normalize the vector. Note: if the norm is zero, do nothing.

save(f)[source]¶: Save vector to file.

scale(scalar)[source]¶: Scales this vector by a scale factor.

squared_norm()[source]¶: Computes the squared norm of the vector.

kiwi.models.linear.structured_classifier module¶

A generic implementation of an abstract structured linear classifier.

class kiwi.models.linear.structured_classifier.StructuredClassifier[source]¶

Bases: object

An abstract structured classifier.

compute_scores(instance, parts, features)[source]¶: Compute a score for every part in the instance using the current model and the part-specific features.

create_instances(dataset)[source]¶: Preprocess the dataset if needed to create instances. Default is returning the dataset itself. Override if needed.

create_prediction(instance, parts, predicted_output)[source]¶: Create a prediction for an instance.

evaluate(instances, predictions, print_scores=True)[source]¶: Evaluate the structure classifier, computing a task-dependent evaluation metric.

label_instance(instance, parts, predicted_output)[source]¶: Return a labeled instance by adding the predicted output information.

load(model_path)[source]¶: Load the full configuration and model.

make_features(instance, parts)[source]¶: Create a feature vector for each part.

make_parts(instance)[source]¶: Compute the task-specific parts for this instance.

run(instance)[source]¶: Run the structured classifier on a single instance.

save(model_path)[source]¶: Save the full configuration and model.

test(instances)[source]¶: Run the structured classifier on dev/test data.

kiwi.models.linear.structured_decoder module¶

class kiwi.models.linear.structured_decoder.StructuredDecoder[source]¶

Bases: object

An abstract decoder for structured prediction.

decode(instance, parts, scores)[source]¶: Decode, computing the highest-scores output. Must return a vector of 0/1 predicted_outputs of the same size as parts.

decode_cost_augmented(instance, parts, scores, gold_outputs)[source]¶: Perform cost-augmented decoding.

decode_mira(instance, parts, scores, gold_outputs, old_mira=False)[source]¶: Perform cost-augmented decoding or classical MIRA.

kiwi.models.linear.utils module¶

Several utility functions.

kiwi.models.linear.utils.nearly_binary_tol(a, tol)[source]¶: Checks if a number is binary up to a tolerance.

kiwi.models.linear.utils.nearly_eq_tol(a, b, tol)[source]¶: Checks if two numbers are equal up to a tolerance.

kiwi.models.linear.utils.nearly_zero_tol(a, tol)[source]¶: Checks if a number is zero up to a tolerance.

kiwi.models.linear package¶

Submodules¶

kiwi.models.linear.label_dictionary module¶

kiwi.models.linear.linear_model module¶

kiwi.models.linear.linear_trainer module¶

kiwi.models.linear.linear_word_qe_decoder module¶

kiwi.models.linear.linear_word_qe_features module¶

kiwi.models.linear.linear_word_qe_sentence module¶

kiwi.models.linear.sequence_parts module¶

kiwi.models.linear.sparse_feature_vector module¶

kiwi.models.linear.sparse_vector module¶

kiwi.models.linear.structured_classifier module¶

kiwi.models.linear.structured_decoder module¶

kiwi.models.linear.utils module¶

Module contents¶