nltk.tokenize.TabTokenizer¶
- class nltk.tokenize.TabTokenizer[source]¶
Bases:
StringTokenizer
Tokenize a string use the tab character as a delimiter, the same as
s.split('\t')
.>>> from nltk.tokenize import TabTokenizer >>> TabTokenizer().tokenize('a\tb c\n\t d') ['a', 'b c\n', ' d']
- span_tokenize(s)[source]¶
Identify the tokens using integer offsets
(start_i, end_i)
, wheres[start_i:end_i]
is the corresponding token.- Return type
Iterator[Tuple[int, int]]