Metrics

Distance metrics

aline

ALINE https://webdocs.cs.ualberta.ca/~kondrak/ Copyright 2002 by Grzegorz Kondrak.

binary_distance(label1, label2)

Simple equality test.

custom_distance(file)

edit_distance(s1, s2[, substitution_cost, ...])

Calculate the Levenshtein edit-distance between two strings.

edit_distance_align(s1, s2[, substitution_cost])

Calculate the minimum Levenshtein edit-distance based alignment mapping between two strings.

fractional_presence(label)

interval_distance(label1, label2)

Krippendorff's interval distance metric

jaccard_distance(label1, label2)

Distance metric comparing set-similarity.

masi_distance(label1, label2)

Distance metric that takes into account partial agreement when multiple labels are assigned.

presence(label)

Higher-order function to test presence of a given label

Scores

AnnotationTask

Represents an annotation task, i.e. people assign labels to items.

ConfusionMatrix

The confusion matrix between a list of reference values and a corresponding list of test values.

Paice

Class for storing lemmas, stems and evaluation metrics.

accuracy(reference, test)

Given a list of reference values and a corresponding list of test values, return the fraction of corresponding values that are equal.

approxrand(a, b, **kwargs)

Returns an approximate significance level between two lists of independently generated test values.

f_measure(reference, test[, alpha])

Given a set of reference values and a set of test values, return the f-measure of the test values, when compared against the reference values.

log_likelihood(reference, test)

Given a list of reference values and a corresponding list of test probability distributions, return the average log likelihood of the reference values, given the probability distributions.

precision(reference, test)

Given a set of reference values and a set of test values, return the fraction of test values that appear in the reference set.

recall(reference, test)

Given a set of reference values and a set of test values, return the fraction of reference values that appear in the test set.

Segmentation

ghd(ref, hyp[, ins_cost, del_cost, ...])

Compute the Generalized Hamming Distance for a reference and a hypothetical segmentation, corresponding to the cost related to the transformation of the hypothetical segmentation into the reference segmentation through boundary insertion, deletion and shift operations.

pk(ref, hyp[, k, boundary])

Compute the Pk metric for a pair of segmentations A segmentation is any sequence over a vocabulary of two items (e.g.

windowdiff(seg1, seg2, k[, boundary, weighted])

Compute the windowdiff score for a pair of segmentations.

Spearman

ranks_from_scores(scores[, rank_gap])

Given a sequence of (key, score) tuples, yields each key with an increasing rank, tying with previous key's rank if the difference between their scores is less than rank_gap.

ranks_from_sequence(seq)

Given a sequence, yields each element with an increasing rank, suitable for use as an argument to spearman_correlation.

spearman_correlation(ranks1, ranks2)

Returns the Spearman correlation coefficient for two rankings, which should be dicts or sequences of (key, rank).

Translation

bleu(references, hypothesis[, weights, ...])

Calculate BLEU score (Bilingual Evaluation Understudy) from Papineni, Kishore, Salim Roukos, Todd Ward, and Wei-Jing Zhu.

ribes(references, hypothesis[, alpha, beta])

The RIBES (Rank-based Intuitive Bilingual Evaluation Score) from Hideki Isozaki, Tsutomu Hirao, Kevin Duh, Katsuhito Sudoh and Hajime Tsukada.

meteor(references, hypothesis[, preprocess, ...])

Calculates METEOR score for hypothesis with multiple references as described in "Meteor: An Automatic Metric for MT Evaluation with HighLevels of Correlation with Human Judgments" by Alon Lavie and Abhaya Agarwal, in Proceedings of ACL.

alignment_error_rate(reference, hypothesis)

Return the Alignment Error Rate (AER) of an alignment with respect to a "gold standard" reference alignment.

nist(references, hypothesis[, n])

Calculate NIST score from George Doddington.

chrf(reference, hypothesis[, min_len, ...])

Calculates the sentence level CHRF (Character n-gram F-score) described in

gleu(references, hypothesis[, min_len, max_len])

Calculates the sentence level GLEU (Google-BLEU) score described in