[ allennlp.data.tokenizers.sentence_splitter ]
SentenceSplitter splits strings into sentences.
default_implementation = "spacy"
| def split_sentences(self, text: str) -> List[str]
| def batch_split_sentences(self, texts: List[str]) -> List[List[str]]
Default implementation is to just iterate over the texts and call
class SpacySentenceSplitter(SentenceSplitter): | def __init__( | self, | language: str = "en_core_web_sm", | rule_based: bool = False | ) -> None
SentenceSplitter that uses spaCy's built-in sentence boundary detection.
Spacy's default sentence splitter uses a dependency parse to detect sentence boundaries, so it is slow, but accurate.
Another option is to use rule-based sentence boundary detection. It's fast and has a small memory footprint,
since it uses punctuation to detect sentence boundaries. This can be activated with the
SpacySentenceSplitter calls the default spacy boundary detector.
Registered as a
SentenceSplitter with name "spacy".
| @overrides | def split_sentences(self, text: str) -> List[str]
| @overrides | def batch_split_sentences(self, texts: List[str]) -> List[List[str]]
This method lets you take advantage of spacy's batch processing.