nltk.lm.Lidstone¶
- class nltk.lm.Lidstone[source]¶
Bases:
LanguageModel
Provides Lidstone-smoothed scores.
In addition to initialization arguments from BaseNgramModel also requires a number by which to increase the counts, gamma.
- __init__(gamma, *args, **kwargs)[source]¶
Creates new LanguageModel.
- Parameters
vocabulary (nltk.lm.NgramCounter or None) – If provided, this vocabulary will be used instead of creating a new one when training.
counter – If provided, use this object to count ngrams.
ngrams_fn (function or None) – If given, defines how sentences in training text are turned to ngram sequences.
pad_fn (function or None) – If given, defines how sentences in training text are padded.
- unmasked_score(word, context=None)[source]¶
Add-one smoothing: Lidstone or Laplace.
To see what kind, look at gamma attribute on the class.
- context_counts(context)[source]¶
Helper method for retrieving counts for a given context.
Assumes context has been checked and oov words in it masked. :type context: tuple(str) or None
- entropy(text_ngrams)[source]¶
Calculate cross-entropy of model for given evaluation text.
- Parameters
text_ngrams (Iterable(tuple(str))) – A sequence of ngram tuples.
- Return type
float
- fit(text, vocabulary_text=None)[source]¶
Trains the model on a text.
- Parameters
text – Training text as a sequence of sentences.
- generate(num_words=1, text_seed=None, random_seed=None)[source]¶
Generate words from the model.
- Parameters
num_words (int) – How many words to generate. By default 1.
text_seed – Generation can be conditioned on preceding context.
random_seed – A random seed or an instance of random.Random. If provided, makes the random sampling part of generation reproducible.
- Returns
One (str) word or a list of words generated from model.
Examples:
>>> from nltk.lm import MLE >>> lm = MLE(2) >>> lm.fit([[("a", "b"), ("b", "c")]], vocabulary_text=['a', 'b', 'c']) >>> lm.fit([[("a",), ("b",), ("c",)]]) >>> lm.generate(random_seed=3) 'a' >>> lm.generate(text_seed=['a']) 'b'
- logscore(word, context=None)[source]¶
Evaluate the log score of this word in this context.
The arguments are the same as for score and unmasked_score.