nltk.corpus.reader.TEICorpusView

class nltk.corpus.reader.TEICorpusView[source]

Bases: StreamBackedCorpusView

__init__(corpus_file, tagged, group_by_sent, group_by_para, tagset=None, head_len=0, textids=None)[source]

Create a new corpus view, based on the file fileid, and read with block_reader. See the class documentation for more information.

Parameters
  • fileid – The path to the file that is read by this corpus view. fileid can either be a string or a PathPointer.

  • startpos – The file position at which the view will start reading. This can be used to skip over preface sections.

  • encoding – The unicode encoding that should be used to read the file’s contents. If no encoding is specified, then the file’s contents will be read as a non-unicode string (i.e., a str).

read_block(stream)[source]

Read a block from the input stream.

Returns

a block of tokens from the input stream

Return type

list(any)

Parameters

stream (stream) – an input stream

close()[source]

Close the file stream associated with this corpus view. This can be useful if you are worried about running out of file handles (although the stream should automatically be closed upon garbage collection of the corpus view). If the corpus view is accessed after it is closed, it will be automatically re-opened.

count(value)[source]

Return the number of times this list contains value.

property fileid

The fileid of the file that is accessed by this view.

Type

str or PathPointer

index(value, start=None, stop=None)[source]

Return the index of the first occurrence of value in this list that is greater than or equal to start and less than stop. Negative start and stop values are treated like negative slice bounds – i.e., they count from the end of the list.

iterate_from(start_tok)[source]

Return an iterator that generates the tokens in the corpus file underlying this corpus view, starting at the token number start. If start>=len(self), then this iterator will generate no tokens.