nltk.lm.Lidstone

class nltk.lm.Lidstone[source]

Bases: LanguageModel

Provides Lidstone-smoothed scores.

In addition to initialization arguments from BaseNgramModel also requires a number by which to increase the counts, gamma.

__init__(gamma, *args, **kwargs)[source]

Creates new LanguageModel.

Parameters
  • vocabulary (nltk.lm.NgramCounter or None) – If provided, this vocabulary will be used instead of creating a new one when training.

  • counter – If provided, use this object to count ngrams.

  • ngrams_fn (function or None) – If given, defines how sentences in training text are turned to ngram sequences.

  • pad_fn (function or None) – If given, defines how sentences in training text are padded.

unmasked_score(word, context=None)[source]

Add-one smoothing: Lidstone or Laplace.

To see what kind, look at gamma attribute on the class.

context_counts(context)[source]

Helper method for retrieving counts for a given context.

Assumes context has been checked and oov words in it masked. :type context: tuple(str) or None

entropy(text_ngrams)[source]

Calculate cross-entropy of model for given evaluation text.

Parameters

text_ngrams (Iterable(tuple(str))) – A sequence of ngram tuples.

Return type

float

fit(text, vocabulary_text=None)[source]

Trains the model on a text.

Parameters

text – Training text as a sequence of sentences.

generate(num_words=1, text_seed=None, random_seed=None)[source]

Generate words from the model.

Parameters
  • num_words (int) – How many words to generate. By default 1.

  • text_seed – Generation can be conditioned on preceding context.

  • random_seed – A random seed or an instance of random.Random. If provided, makes the random sampling part of generation reproducible.

Returns

One (str) word or a list of words generated from model.

Examples:

>>> from nltk.lm import MLE
>>> lm = MLE(2)
>>> lm.fit([[("a", "b"), ("b", "c")]], vocabulary_text=['a', 'b', 'c'])
>>> lm.fit([[("a",), ("b",), ("c",)]])
>>> lm.generate(random_seed=3)
'a'
>>> lm.generate(text_seed=['a'])
'b'
logscore(word, context=None)[source]

Evaluate the log score of this word in this context.

The arguments are the same as for score and unmasked_score.

perplexity(text_ngrams)[source]

Calculates the perplexity of the given text.

This is simply 2 ** cross-entropy for the text, so the arguments are the same.

score(word, context=None)[source]

Masks out of vocab (OOV) words and computes their model score.

For model-specific logic of calculating scores, see the unmasked_score method.