nltk.metrics.QuadgramAssocMeasures

class nltk.metrics.QuadgramAssocMeasures[source]

Bases: NgramAssocMeasures

A collection of quadgram association measures. Each association measure is provided as a function with five arguments:

trigram_score_fn(n_iiii,
                (n_iiix, n_iixi, n_ixii, n_xiii),
                (n_iixx, n_ixix, n_ixxi, n_xixi, n_xxii, n_xiix),
                (n_ixxx, n_xixx, n_xxix, n_xxxi),
                n_all)

The arguments constitute the marginals of a contingency table, counting the occurrences of particular events in a corpus. The letter i in the suffix refers to the appearance of the word in question, while x indicates the appearance of any word. Thus, for example:

  • n_iiii counts (w1, w2, w3, w4), i.e. the quadgram being scored

  • n_ixxi counts (w1, *, *, w4)

  • n_xxxx counts (*, *, *, *), i.e. any quadgram

classmethod chi_sq(*marginals)[source]

Scores ngrams using Pearson’s chi-square as in Manning and Schutze 5.3.3.

classmethod jaccard(*marginals)[source]

Scores ngrams using the Jaccard index.

classmethod likelihood_ratio(*marginals)[source]

Scores ngrams using likelihood ratios as in Manning and Schutze 5.3.4.

static mi_like(*marginals, **kwargs)[source]

Scores ngrams using a variant of mutual information. The keyword argument power sets an exponent (default 3) for the numerator. No logarithm of the result is calculated.

classmethod pmi(*marginals)[source]

Scores ngrams by pointwise mutual information, as in Manning and Schutze 5.4.

classmethod poisson_stirling(*marginals)[source]

Scores ngrams using the Poisson-Stirling measure.

static raw_freq(*marginals)[source]

Scores ngrams by their frequency

classmethod student_t(*marginals)[source]

Scores ngrams using Student’s t test with independence hypothesis for unigrams, as in Manning and Schutze 5.3.1.