nltk.tokenize.SpaceTokenizer¶
- class nltk.tokenize.SpaceTokenizer[source]¶
Bases:
StringTokenizer
Tokenize a string using the space character as a delimiter, which is the same as
s.split(' ')
.>>> from nltk.tokenize import SpaceTokenizer >>> s = "Good muffins cost $3.88\nin New York. Please buy me\ntwo of them.\n\nThanks." >>> SpaceTokenizer().tokenize(s) ['Good', 'muffins', 'cost', '$3.88\nin', 'New', 'York.', '', 'Please', 'buy', 'me\ntwo', 'of', 'them.\n\nThanks.']
- span_tokenize(s)[source]¶
Identify the tokens using integer offsets
(start_i, end_i)
, wheres[start_i:end_i]
is the corresponding token.- Return type
Iterator[Tuple[int, int]]