nltk.metrics.BigramAssocMeasures¶
- class nltk.metrics.BigramAssocMeasures[source]¶
Bases:
NgramAssocMeasures
A collection of bigram association measures. Each association measure is provided as a function with three arguments:
bigram_score_fn(n_ii, (n_ix, n_xi), n_xx)
The arguments constitute the marginals of a contingency table, counting the occurrences of particular events in a corpus. The letter i in the suffix refers to the appearance of the word in question, while x indicates the appearance of any word. Thus, for example:
n_ii counts
(w1, w2)
, i.e. the bigram being scoredn_ix counts
(w1, *)
n_xi counts
(*, w2)
n_xx counts
(*, *)
, i.e. any bigram
This may be shown with respect to a contingency table:
w1 ~w1 ------ ------ w2 | n_ii | n_oi | = n_xi ------ ------ ~w2 | n_io | n_oo | ------ ------ = n_ix TOTAL = n_xx
- classmethod phi_sq(*marginals)[source]¶
Scores bigrams using phi-square, the square of the Pearson correlation coefficient.
- classmethod chi_sq(n_ii, n_ix_xi_tuple, n_xx)[source]¶
Scores bigrams using chi-square, i.e. phi-sq multiplied by the number of bigrams, as in Manning and Schutze 5.3.3.
- classmethod fisher(*marginals)[source]¶
Scores bigrams using Fisher’s Exact Test (Pedersen 1996). Less sensitive to small counts than PMI or Chi Sq, but also more expensive to compute. Requires scipy.
- classmethod likelihood_ratio(*marginals)[source]¶
Scores ngrams using likelihood ratios as in Manning and Schutze 5.3.4.
- static mi_like(*marginals, **kwargs)[source]¶
Scores ngrams using a variant of mutual information. The keyword argument power sets an exponent (default 3) for the numerator. No logarithm of the result is calculated.