nltk.classify.util module¶
Utility functions and classes for classifiers.
- nltk.classify.util.apply_features(feature_func, toks, labeled=None)[source]¶
Use the
LazyMap
class to construct a lazy list-like object that is analogous tomap(feature_func, toks)
. In particular, iflabeled=False
, then the returned list-like object’s values are equal to:[feature_func(tok) for tok in toks]
If
labeled=True
, then the returned list-like object’s values are equal to:[(feature_func(tok), label) for (tok, label) in toks]
The primary purpose of this function is to avoid the memory overhead involved in storing all the featuresets for every token in a corpus. Instead, these featuresets are constructed lazily, as-needed. The reduction in memory overhead can be especially significant when the underlying list of tokens is itself lazy (as is the case with many corpus readers).
- Parameters
feature_func – The function that will be applied to each token. It should return a featureset – i.e., a dict mapping feature names to feature values.
toks – The list of tokens to which
feature_func
should be applied. Iflabeled=True
, then the list elements will be passed directly tofeature_func()
. Iflabeled=False
, then the list elements should be tuples(tok,label)
, andtok
will be passed tofeature_func()
.labeled – If true, then
toks
contains labeled tokens – i.e., tuples of the form(tok, label)
. (Default: auto-detect based on types.)
- nltk.classify.util.attested_labels(tokens)[source]¶
- Returns
A list of all labels that are attested in the given list of tokens.
- Return type
list of (immutable)
- Parameters
tokens (list) – The list of classified tokens from which to extract labels. A classified token has the form
(token, label)
.