allennlp.data.dataset_readers.sequence_tagging¶
-
class
allennlp.data.dataset_readers.sequence_tagging.
SequenceTaggingDatasetReader
(word_tag_delimiter: str = '###', token_delimiter: str = None, token_indexers: Dict[str, allennlp.data.token_indexers.token_indexer.TokenIndexer] = None, lazy: bool = False)[source]¶ Bases:
allennlp.data.dataset_readers.dataset_reader.DatasetReader
Reads instances from a pretokenised file where each line is in the following format:
WORD###TAG [TAB] WORD###TAG [TAB] …..
and converts it into a
Dataset
suitable for sequence tagging. You can also specify alternative delimiters in the constructor.- Parameters
- word_tag_delimiter: ``str``, optional (default=``”###”``)
The text that separates each WORD from its TAG.
- token_delimiter: ``str``, optional (default=``None``)
The text that separates each WORD-TAG pair from the next pair. If
None
then the line will just be split on whitespace.- token_indexers
Dict[str, TokenIndexer]
, optional (default=``{“tokens”: SingleIdTokenIndexer()}``) We use this to define the input representation for the text. See
TokenIndexer
. Note that the output tags will always correspond to single token IDs based on how they are pre-tokenised in the data file.