nltk.corpus.reader.TEICorpusView¶

class nltk.corpus.reader.TEICorpusView[source]¶

__init__(corpus_file, tagged, group_by_sent, group_by_para, tagset=None, head_len=0, textids=None)[source]¶

Create a new corpus view, based on the file fileid, and read with block_reader. See the class documentation for more information.

Parameters

fileid – The path to the file that is read by this corpus view. fileid can either be a string or a PathPointer.
startpos – The file position at which the view will start reading. This can be used to skip over preface sections.
encoding – The unicode encoding that should be used to read the file’s contents. If no encoding is specified, then the file’s contents will be read as a non-unicode string (i.e., a str).

read_block(stream)[source]¶

Read a block from the input stream.

close()[source]¶: Close the file stream associated with this corpus view. This can be useful if you are worried about running out of file handles (although the stream should automatically be closed upon garbage collection of the corpus view). If the corpus view is accessed after it is closed, it will be automatically re-opened.

count(value)[source]¶: Return the number of times this list contains value.

property fileid¶

The fileid of the file that is accessed by this view.

index(value, start=None, stop=None)[source]¶: Return the index of the first occurrence of value in this list that is greater than or equal to start and less than stop. Negative start and stop values are treated like negative slice bounds – i.e., they count from the end of the list.

iterate_from(start_tok)[source]¶: Return an iterator that generates the tokens in the corpus file underlying this corpus view, starting at the token number start. If start>=len(self), then this iterator will generate no tokens.

NLTK