nltk.lm.Lidstone¶

class nltk.lm.Lidstone[source]¶

Bases: LanguageModel

Provides Lidstone-smoothed scores.

In addition to initialization arguments from BaseNgramModel also requires a number by which to increase the counts, gamma.

__init__(gamma, *args, **kwargs)[source]¶

Creates new LanguageModel.

Parameters

vocabulary (nltk.lm.NgramCounter or None) – If provided, this vocabulary will be used instead of creating a new one when training.
counter – If provided, use this object to count ngrams.
ngrams_fn (function or None) – If given, defines how sentences in training text are turned to ngram sequences.
pad_fn (function or None) – If given, defines how sentences in training text are padded.

unmasked_score(word, context=None)[source]¶

Add-one smoothing: Lidstone or Laplace.

To see what kind, look at gamma attribute on the class.

context_counts(context)[source]¶

Helper method for retrieving counts for a given context.

Assumes context has been checked and oov words in it masked. :type context: tuple(str) or None

entropy(text_ngrams)[source]¶

Calculate cross-entropy of model for given evaluation text.

Parameters: text_ngrams (Iterable(tuple(str))) – A sequence of ngram tuples.
Return type: float

fit(text, vocabulary_text=None)[source]¶

Trains the model on a text.

Parameters: text – Training text as a sequence of sentences.

generate(num_words=1, text_seed=None, random_seed=None)[source]¶

Generate words from the model.

Parameters

num_words (int) – How many words to generate. By default 1.
text_seed – Generation can be conditioned on preceding context.
random_seed – A random seed or an instance of random.Random. If provided, makes the random sampling part of generation reproducible.

Returns

One (str) word or a list of words generated from model.

Examples:

>>> from nltk.lm import MLE
>>> lm = MLE(2)
>>> lm.fit([[("a", "b"), ("b", "c")]], vocabulary_text=['a', 'b', 'c'])
>>> lm.fit([[("a",), ("b",), ("c",)]])
>>> lm.generate(random_seed=3)
'a'
>>> lm.generate(text_seed=['a'])
'b'

logscore(word, context=None)[source]¶

Evaluate the log score of this word in this context.

The arguments are the same as for score and unmasked_score.

perplexity(text_ngrams)[source]¶

Calculates the perplexity of the given text.

This is simply 2 ** cross-entropy for the text, so the arguments are the same.

score(word, context=None)[source]¶

Masks out of vocab (OOV) words and computes their model score.

For model-specific logic of calculating scores, see the unmasked_score method.

NLTK

Documentation

nltk.lm.Lidstone¶