nltk.parse.corenlp module¶
- exception nltk.parse.corenlp.CoreNLPServerError[source]¶
Bases:
OSError
Exceptions associated with the Core NLP server.
- class nltk.parse.corenlp.CoreNLPServer[source]¶
Bases:
object
- __init__(path_to_jar=None, path_to_models_jar=None, verbose=False, java_options=None, corenlp_options=None, port=None, strict_json=True)[source]¶
- class nltk.parse.corenlp.GenericCoreNLPParser[source]¶
Bases:
ParserI
,TokenizerI
,TaggerI
Interface to the CoreNLP Parser.
- parse_sents(sentences, *args, **kwargs)[source]¶
Parse multiple sentences.
Takes multiple sentences as a list where each sentence is a list of words. Each sentence will be automatically tagged with this CoreNLPParser instance’s tagger.
If a whitespace exists inside a token, then the token will be treated as several tokens.
- Parameters
sentences (list(list(str))) – Input sentences to parse
- Return type
iter(iter(Tree))
- raw_parse(sentence, properties=None, *args, **kwargs)[source]¶
Parse a sentence.
Takes a sentence as a string; before parsing, it will be automatically tokenized and tagged by the CoreNLP Parser.
- Parameters
sentence (str) – Input sentence to parse
- Return type
iter(Tree)
- raw_parse_sents(sentences, verbose=False, properties=None, *args, **kwargs)[source]¶
Parse multiple sentences.
Takes multiple sentences as a list of strings. Each sentence will be automatically tokenized and tagged.
- Parameters
sentences (list(str)) – Input sentences to parse.
- Return type
iter(iter(Tree))
- parse_text(text, *args, **kwargs)[source]¶
Parse a piece of text.
The text might contain several sentences which will be split by CoreNLP.
- Parameters
text (str) – text to be split.
- Returns
an iterable of syntactic structures. # TODO: should it be an iterable of iterables?
- tokenize(text, properties=None)[source]¶
Tokenize a string of text.
>>> parser = CoreNLPParser(url='http://localhost:9000')
>>> text = 'Good muffins cost $3.88\nin New York. Please buy me\ntwo of them.\nThanks.' >>> list(parser.tokenize(text)) ['Good', 'muffins', 'cost', '$', '3.88', 'in', 'New', 'York', '.', 'Please', 'buy', 'me', 'two', 'of', 'them', '.', 'Thanks', '.']
>>> s = "The colour of the wall is blue." >>> list( ... parser.tokenize( ... 'The colour of the wall is blue.', ... properties={'tokenize.options': 'americanize=true'}, ... ) ... ) ['The', 'color', 'of', 'the', 'wall', 'is', 'blue', '.']
- tag_sents(sentences)[source]¶
Tag multiple sentences.
Takes multiple sentences as a list where each sentence is a list of tokens.
- Parameters
sentences (list(list(str))) – Input sentences to tag
- Return type
list(list(tuple(str, str))
- tag(sentence)[source]¶
Tag a list of tokens.
- Return type
list(tuple(str, str))
>>> parser = CoreNLPParser(url='http://localhost:9000', tagtype='ner') >>> tokens = 'Rami Eid is studying at Stony Brook University in NY'.split() >>> parser.tag(tokens) [('Rami', 'PERSON'), ('Eid', 'PERSON'), ('is', 'O'), ('studying', 'O'), ('at', 'O'), ('Stony', 'ORGANIZATION'), ('Brook', 'ORGANIZATION'), ('University', 'ORGANIZATION'), ('in', 'O'), ('NY', 'O')]
>>> parser = CoreNLPParser(url='http://localhost:9000', tagtype='pos') >>> tokens = "What is the airspeed of an unladen swallow ?".split() >>> parser.tag(tokens) [('What', 'WP'), ('is', 'VBZ'), ('the', 'DT'), ('airspeed', 'NN'), ('of', 'IN'), ('an', 'DT'), ('unladen', 'JJ'), ('swallow', 'VB'), ('?', '.')]
- class nltk.parse.corenlp.CoreNLPParser[source]¶
Bases:
GenericCoreNLPParser
>>> parser = CoreNLPParser(url='http://localhost:9000')
>>> next( ... parser.raw_parse('The quick brown fox jumps over the lazy dog.') ... ).pretty_print() ROOT | S _______________|__________________________ | VP | | _________|___ | | | PP | | | ________|___ | NP | | NP | ____|__________ | | _______|____ | DT JJ JJ NN VBZ IN DT JJ NN . | | | | | | | | | | The quick brown fox jumps over the lazy dog .
>>> (parse_fox, ), (parse_wolf, ) = parser.raw_parse_sents( ... [ ... 'The quick brown fox jumps over the lazy dog.', ... 'The quick grey wolf jumps over the lazy fox.', ... ] ... )
>>> parse_fox.pretty_print() ROOT | S _______________|__________________________ | VP | | _________|___ | | | PP | | | ________|___ | NP | | NP | ____|__________ | | _______|____ | DT JJ JJ NN VBZ IN DT JJ NN . | | | | | | | | | | The quick brown fox jumps over the lazy dog .
>>> parse_wolf.pretty_print() ROOT | S _______________|__________________________ | VP | | _________|___ | | | PP | | | ________|___ | NP | | NP | ____|_________ | | _______|____ | DT JJ JJ NN VBZ IN DT JJ NN . | | | | | | | | | | The quick grey wolf jumps over the lazy fox .
>>> (parse_dog, ), (parse_friends, ) = parser.parse_sents( ... [ ... "I 'm a dog".split(), ... "This is my friends ' cat ( the tabby )".split(), ... ] ... )
>>> parse_dog.pretty_print() ROOT | S _______|____ | VP | ________|___ NP | NP | | ___|___ PRP VBP DT NN | | | | I 'm a dog
>>> parse_friends.pretty_print() ROOT | S ____|___________ | VP | ___________|_____________ | | NP | | _______|_________ | | NP PRN | | _____|_______ ____|______________ NP | NP | | NP | | | ______|_________ | | ___|____ | DT VBZ PRP$ NNS POS NN -LRB- DT NN -RRB- | | | | | | | | | | This is my friends ' cat -LRB- the tabby -RRB-
>>> parse_john, parse_mary, = parser.parse_text( ... 'John loves Mary. Mary walks.' ... )
>>> parse_john.pretty_print() ROOT | S _____|_____________ | VP | | ____|___ | NP | NP | | | | | NNP VBZ NNP . | | | | John loves Mary .
>>> parse_mary.pretty_print() ROOT | S _____|____ NP VP | | | | NNP VBZ . | | | Mary walks .
Special cases
>>> next( ... parser.raw_parse( ... 'NASIRIYA, Iraq—Iraqi doctors who treated former prisoner of war ' ... 'Jessica Lynch have angrily dismissed claims made in her biography ' ... 'that she was raped by her Iraqi captors.' ... ) ... ).height() 20
>>> next( ... parser.raw_parse( ... "The broader Standard & Poor's 500 Index <.SPX> was 0.46 points lower, or " ... '0.05 percent, at 997.02.' ... ) ... ).height() 9
- parser_annotator = 'parse'¶
- class nltk.parse.corenlp.CoreNLPDependencyParser[source]¶
Bases:
GenericCoreNLPParser
Dependency parser.
>>> dep_parser = CoreNLPDependencyParser(url='http://localhost:9000')
>>> parse, = dep_parser.raw_parse( ... 'The quick brown fox jumps over the lazy dog.' ... ) >>> print(parse.to_conll(4)) The DT 4 det quick JJ 4 amod brown JJ 4 amod fox NN 5 nsubj jumps VBZ 0 ROOT over IN 9 case the DT 9 det lazy JJ 9 amod dog NN 5 nmod . . 5 punct
>>> print(parse.tree()) (jumps (fox The quick brown) (dog over the lazy) .)
>>> for governor, dep, dependent in parse.triples(): ... print(governor, dep, dependent) ('jumps', 'VBZ') nsubj ('fox', 'NN') ('fox', 'NN') det ('The', 'DT') ('fox', 'NN') amod ('quick', 'JJ') ('fox', 'NN') amod ('brown', 'JJ') ('jumps', 'VBZ') nmod ('dog', 'NN') ('dog', 'NN') case ('over', 'IN') ('dog', 'NN') det ('the', 'DT') ('dog', 'NN') amod ('lazy', 'JJ') ('jumps', 'VBZ') punct ('.', '.')
>>> (parse_fox, ), (parse_dog, ) = dep_parser.raw_parse_sents( ... [ ... 'The quick brown fox jumps over the lazy dog.', ... 'The quick grey wolf jumps over the lazy fox.', ... ] ... ) >>> print(parse_fox.to_conll(4)) The DT 4 det quick JJ 4 amod brown JJ 4 amod fox NN 5 nsubj jumps VBZ 0 ROOT over IN 9 case the DT 9 det lazy JJ 9 amod dog NN 5 nmod . . 5 punct
>>> print(parse_dog.to_conll(4)) The DT 4 det quick JJ 4 amod grey JJ 4 amod wolf NN 5 nsubj jumps VBZ 0 ROOT over IN 9 case the DT 9 det lazy JJ 9 amod fox NN 5 nmod . . 5 punct
>>> (parse_dog, ), (parse_friends, ) = dep_parser.parse_sents( ... [ ... "I 'm a dog".split(), ... "This is my friends ' cat ( the tabby )".split(), ... ] ... ) >>> print(parse_dog.to_conll(4)) I PRP 4 nsubj 'm VBP 4 cop a DT 4 det dog NN 0 ROOT
>>> print(parse_friends.to_conll(4)) This DT 6 nsubj is VBZ 6 cop my PRP$ 4 nmod:poss friends NNS 6 nmod:poss ' POS 4 case cat NN 0 ROOT -LRB- -LRB- 9 punct the DT 9 det tabby NN 6 appos -RRB- -RRB- 9 punct
>>> parse_john, parse_mary, = dep_parser.parse_text( ... 'John loves Mary. Mary walks.' ... )
>>> print(parse_john.to_conll(4)) John NNP 2 nsubj loves VBZ 0 ROOT Mary NNP 2 dobj . . 2 punct
>>> print(parse_mary.to_conll(4)) Mary NNP 2 nsubj walks VBZ 0 ROOT . . 2 punct
Special cases
Non-breaking space inside of a token.
>>> len( ... next( ... dep_parser.raw_parse( ... 'Anhalt said children typically treat a 20-ounce soda bottle as one ' ... 'serving, while it actually contains 2 1/2 servings.' ... ) ... ).nodes ... ) 21
Phone numbers.
>>> len( ... next( ... dep_parser.raw_parse('This is not going to crash: 01 111 555.') ... ).nodes ... ) 10
>>> print( ... next( ... dep_parser.raw_parse('The underscore _ should not simply disappear.') ... ).to_conll(4) ... ) The DT 3 det underscore VBP 3 amod _ NN 7 nsubj should MD 7 aux not RB 7 neg simply RB 7 advmod disappear VB 0 ROOT . . 7 punct
>>> print( ... '\n'.join( ... next( ... dep_parser.raw_parse( ... 'for all of its insights into the dream world of teen life , and its electronic expression through ' ... 'cyber culture , the film gives no quarter to anyone seeking to pull a cohesive story out of its 2 ' ... '1/2-hour running time .' ... ) ... ).to_conll(4).split('\n')[-8:] ... ) ... ) its PRP$ 40 nmod:poss 2 1/2 CD 40 nummod - : 40 punct hour NN 31 nmod running VBG 42 amod time NN 40 dep . . 24 punct
- parser_annotator = 'depparse'¶