nltk.tokenize.SpaceTokenizer

class nltk.tokenize.SpaceTokenizer[source]

Bases: StringTokenizer

Tokenize a string using the space character as a delimiter, which is the same as s.split(' ').

>>> from nltk.tokenize import SpaceTokenizer
>>> s = "Good muffins cost $3.88\nin New York.  Please buy me\ntwo of them.\n\nThanks."
>>> SpaceTokenizer().tokenize(s)
['Good', 'muffins', 'cost', '$3.88\nin', 'New', 'York.', '',
'Please', 'buy', 'me\ntwo', 'of', 'them.\n\nThanks.']
span_tokenize(s)[source]

Identify the tokens using integer offsets (start_i, end_i), where s[start_i:end_i] is the corresponding token.

Return type

Iterator[Tuple[int, int]]

span_tokenize_sents(strings: List[str]) Iterator[List[Tuple[int, int]]][source]

Apply self.span_tokenize() to each element of strings. I.e.:

return [self.span_tokenize(s) for s in strings]

Yield

List[Tuple[int, int]]

Parameters

strings (List[str]) –

Return type

Iterator[List[Tuple[int, int]]]

tokenize(s)[source]

Return a tokenized copy of s.

Return type

List[str]

tokenize_sents(strings: List[str]) List[List[str]][source]

Apply self.tokenize() to each element of strings. I.e.:

return [self.tokenize(s) for s in strings]

Return type

List[List[str]]

Parameters

strings (List[str]) –