nltk.tokenize.BlanklineTokenizer¶
- class nltk.tokenize.BlanklineTokenizer[source]¶
Bases:
RegexpTokenizer
Tokenize a string, treating any sequence of blank lines as a delimiter. Blank lines are defined as lines containing no characters, except for space or tab characters.
- span_tokenize(text)[source]¶
Identify the tokens using integer offsets
(start_i, end_i)
, wheres[start_i:end_i]
is the corresponding token.- Return type
Iterator[Tuple[int, int]]