letters_digits_tokenizer
allennlp.data.tokenizers.letters_digits_tokenizer
LettersDigitsTokenizer¶
@Tokenizer.register("letters_digits")
class LettersDigitsTokenizer(Tokenizer)
A Tokenizer
which keeps runs of (unicode) letters and runs of digits together, while
every other non-whitespace character becomes a separate word.
Registered as a Tokenizer
with name "letters_digits".
tokenize¶
class LettersDigitsTokenizer(Tokenizer):
| ...
| def tokenize(self, text: str) -> List[Token]