nltk.tag.SennaNERTagger¶
- class nltk.tag.SennaNERTagger[source]¶
Bases:
Senna
- tag_sents(sentences)[source]¶
Applies the tag method over a list of sentences. This method will return for each sentence a list of tuples of (word, tag).
- SUPPORTED_OPERATIONS = ['pos', 'chk', 'ner']¶
- accuracy(gold)[source]¶
Score the accuracy of the tagger against the gold standard. Strip the tags from the gold standard text, retag it using the tagger, then compute the accuracy score.
- Parameters
gold (list(list(tuple(str, str)))) – The list of tagged sentences to score the tagger on.
- Return type
float
- confusion(gold)[source]¶
Return a ConfusionMatrix with the tags from
gold
as the reference values, with the predictions fromtag_sents
as the predicted values.>>> from nltk.tag import PerceptronTagger >>> from nltk.corpus import treebank >>> tagger = PerceptronTagger() >>> gold_data = treebank.tagged_sents()[:10] >>> print(tagger.confusion(gold_data)) | - | | N | | O P | | N J J N N P P R R V V V V V W | | ' E C C D E I J J J M N N N O R P R B R T V B B B B B D ` | | ' , - . C D T X N J R S D N P S S P $ B R P O B D G N P Z T ` | -------+----------------------------------------------------------------------------------------------+ '' | <1> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | , | .<15> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | -NONE- | . . <.> . . 2 . . . 2 . . . 5 1 . . . . 2 . . . . . . . . . . . | . | . . .<10> . . . . . . . . . . . . . . . . . . . . . . . . . . . | CC | . . . . <1> . . . . . . . . . . . . . . . . . . . . . . . . . . | CD | . . . . . <5> . . . . . . . . . . . . . . . . . . . . . . . . . | DT | . . . . . .<20> . . . . . . . . . . . . . . . . . . . . . . . . | EX | . . . . . . . <1> . . . . . . . . . . . . . . . . . . . . . . . | IN | . . . . . . . .<22> . . . . . . . . . . 3 . . . . . . . . . . . | JJ | . . . . . . . . .<16> . . . . 1 . . . . 1 . . . . . . . . . . . | JJR | . . . . . . . . . . <.> . . . . . . . . . . . . . . . . . . . . | JJS | . . . . . . . . . . . <1> . . . . . . . . . . . . . . . . . . . | MD | . . . . . . . . . . . . <1> . . . . . . . . . . . . . . . . . . | NN | . . . . . . . . . . . . .<28> 1 1 . . . . . . . . . . . . . . . | NNP | . . . . . . . . . . . . . .<25> . . . . . . . . . . . . . . . . | NNS | . . . . . . . . . . . . . . .<19> . . . . . . . . . . . . . . . | POS | . . . . . . . . . . . . . . . . <1> . . . . . . . . . . . . . . | PRP | . . . . . . . . . . . . . . . . . <4> . . . . . . . . . . . . . | PRP$ | . . . . . . . . . . . . . . . . . . <2> . . . . . . . . . . . . | RB | . . . . . . . . . . . . . . . . . . . <4> . . . . . . . . . . . | RBR | . . . . . . . . . . 1 . . . . . . . . . <1> . . . . . . . . . . | RP | . . . . . . . . . . . . . . . . . . . . . <1> . . . . . . . . . | TO | . . . . . . . . . . . . . . . . . . . . . . <5> . . . . . . . . | VB | . . . . . . . . . . . . . . . . . . . . . . . <3> . . . . . . . | VBD | . . . . . . . . . . . . . 1 . . . . . . . . . . <6> . . . . . . | VBG | . . . . . . . . . . . . . 1 . . . . . . . . . . . <4> . . . . . | VBN | . . . . . . . . . . . . . . . . . . . . . . . . 1 . <4> . . . . | VBP | . . . . . . . . . . . . . . . . . . . . . . . . . . . <3> . . . | VBZ | . . . . . . . . . . . . . . . . . . . . . . . . . . . . <7> . . | WDT | . . . . . . . . 2 . . . . . . . . . . . . . . . . . . . . <.> . | `` | . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . <1>| -------+----------------------------------------------------------------------------------------------+ (row = reference; col = test)
- Parameters
gold (list(list(tuple(str, str)))) – The list of tagged sentences to run the tagger with, also used as the reference values in the generated confusion matrix.
- Return type
- evaluate_per_tag(gold, alpha=0.5, truncate=None, sort_by_count=False)[source]¶
Tabulate the recall, precision and f-measure for each tag from
gold
or from runningtag
on the tokenized sentences fromgold
.>>> from nltk.tag import PerceptronTagger >>> from nltk.corpus import treebank >>> tagger = PerceptronTagger() >>> gold_data = treebank.tagged_sents()[:10] >>> print(tagger.evaluate_per_tag(gold_data)) Tag | Prec. | Recall | F-measure -------+--------+--------+----------- '' | 1.0000 | 1.0000 | 1.0000 , | 1.0000 | 1.0000 | 1.0000 -NONE- | 0.0000 | 0.0000 | 0.0000 . | 1.0000 | 1.0000 | 1.0000 CC | 1.0000 | 1.0000 | 1.0000 CD | 0.7143 | 1.0000 | 0.8333 DT | 1.0000 | 1.0000 | 1.0000 EX | 1.0000 | 1.0000 | 1.0000 IN | 0.9167 | 0.8800 | 0.8980 JJ | 0.8889 | 0.8889 | 0.8889 JJR | 0.0000 | 0.0000 | 0.0000 JJS | 1.0000 | 1.0000 | 1.0000 MD | 1.0000 | 1.0000 | 1.0000 NN | 0.8000 | 0.9333 | 0.8615 NNP | 0.8929 | 1.0000 | 0.9434 NNS | 0.9500 | 1.0000 | 0.9744 POS | 1.0000 | 1.0000 | 1.0000 PRP | 1.0000 | 1.0000 | 1.0000 PRP$ | 1.0000 | 1.0000 | 1.0000 RB | 0.4000 | 1.0000 | 0.5714 RBR | 1.0000 | 0.5000 | 0.6667 RP | 1.0000 | 1.0000 | 1.0000 TO | 1.0000 | 1.0000 | 1.0000 VB | 1.0000 | 1.0000 | 1.0000 VBD | 0.8571 | 0.8571 | 0.8571 VBG | 1.0000 | 0.8000 | 0.8889 VBN | 1.0000 | 0.8000 | 0.8889 VBP | 1.0000 | 1.0000 | 1.0000 VBZ | 1.0000 | 1.0000 | 1.0000 WDT | 0.0000 | 0.0000 | 0.0000 `` | 1.0000 | 1.0000 | 1.0000
- Parameters
gold (list(list(tuple(str, str)))) – The list of tagged sentences to score the tagger on.
alpha (float) – Ratio of the cost of false negative compared to false positives, as used in the f-measure computation. Defaults to 0.5, where the costs are equal.
truncate (int, optional) – If specified, then only show the specified number of values. Any sorting (e.g., sort_by_count) will be performed before truncation. Defaults to None
sort_by_count (bool, optional) – Whether to sort the outputs on number of occurrences of that tag in the
gold
data, defaults to False
- Returns
A tabulated recall, precision and f-measure string
- Return type
str
- executable(base_path)[source]¶
The function that determines the system specific binary that should be used in the pipeline. In case, the system is not known the default senna binary will be used.
- f_measure(gold, alpha=0.5)[source]¶
Compute the f-measure for each tag from
gold
or from runningtag
on the tokenized sentences fromgold
. Then, return the dictionary with mappings from tag to f-measure. The f-measure is the harmonic mean of theprecision
andrecall
, weighted byalpha
. In particular, given the precision p and recall r defined by:p = true positive / (true positive + false negative)
r = true positive / (true positive + false positive)
The f-measure is:
1/(alpha/p + (1-alpha)/r)
With
alpha = 0.5
, this reduces to:2pr / (p + r)
- Parameters
gold (list(list(tuple(str, str)))) – The list of tagged sentences to score the tagger on.
alpha (float) – Ratio of the cost of false negative compared to false positives. Defaults to 0.5, where the costs are equal.
- Returns
A mapping from tags to precision
- Return type
Dict[str, float]
- precision(gold)[source]¶
Compute the precision for each tag from
gold
or from runningtag
on the tokenized sentences fromgold
. Then, return the dictionary with mappings from tag to precision. The precision is defined as:p = true positive / (true positive + false negative)
- Parameters
gold (list(list(tuple(str, str)))) – The list of tagged sentences to score the tagger on.
- Returns
A mapping from tags to precision
- Return type
Dict[str, float]
- recall(gold) Dict[str, float] [source]¶
Compute the recall for each tag from
gold
or from runningtag
on the tokenized sentences fromgold
. Then, return the dictionary with mappings from tag to recall. The recall is defined as:r = true positive / (true positive + false positive)
- Parameters
gold (list(list(tuple(str, str)))) – The list of tagged sentences to score the tagger on.
- Returns
A mapping from tags to recall
- Return type
Dict[str, float]