nltk.chunk.RegexpChunkParser¶
- class nltk.chunk.RegexpChunkParser[source]¶
Bases:
ChunkParserI
A regular expression based chunk parser.
RegexpChunkParser
uses a sequence of “rules” to find chunks of a single type within a text. The chunking of the text is encoded using aChunkString
, and each rule acts by modifying the chunking in theChunkString
. The rules are all implemented using regular expression matching and substitution.The
RegexpChunkRule
class and its subclasses (ChunkRule
,StripRule
,UnChunkRule
,MergeRule
, andSplitRule
) define the rules that are used byRegexpChunkParser
. Each rule defines anapply()
method, which modifies the chunking encoded by a givenChunkString
.- Variables
_rules – The list of rules that should be applied to a text.
_trace – The default level of tracing.
- __init__(rules, chunk_label='NP', root_label='S', trace=0)[source]¶
Construct a new
RegexpChunkParser
.- Parameters
rules (list(RegexpChunkRule)) – The sequence of rules that should be used to generate the chunking for a tagged text.
chunk_label (str) – The node value that should be used for chunk subtrees. This is typically a short string describing the type of information contained by the chunk, such as
"NP"
for base noun phrases.root_label (str) – The node value that should be used for the top node of the chunk structure.
trace (int) – The level of tracing that should be used when parsing a text.
0
will generate no tracing output;1
will generate normal tracing output; and2
or higher will generate verbose tracing output.
- parse(chunk_struct, trace=None)[source]¶
- Parameters
chunk_struct (Tree) – the chunk structure to be (further) chunked
trace (int) – The level of tracing that should be used when parsing a text.
0
will generate no tracing output;1
will generate normal tracing output; and2
or higher will generate verbose tracing output. This value overrides the trace level value that was given to the constructor.
- Return type
- Returns
a chunk structure that encodes the chunks in a given tagged sentence. A chunk is a non-overlapping linguistic group, such as a noun phrase. The set of chunks identified in the chunk structure depends on the rules used to define this
RegexpChunkParser
.
- rules()[source]¶
- Returns
the sequence of rules used by
RegexpChunkParser
.- Return type
list(RegexpChunkRule)
- accuracy(gold)[source]¶
Score the accuracy of the chunker against the gold standard. Remove the chunking the gold standard text, rechunk it using the chunker, and return a
ChunkScore
object reflecting the performance of this chunk parser.- Parameters
gold (list(Tree)) – The list of chunked sentences to score the chunker on.
- Return type