allennlp.data.dataset_readers.ontonotes_ner¶
- 
class allennlp.data.dataset_readers.ontonotes_ner.OntonotesNamedEntityRecognition(token_indexers: Dict[str, allennlp.data.token_indexers.token_indexer.TokenIndexer] = None, domain_identifier: str = None, coding_scheme: str = 'BIO', lazy: bool = False)[source]¶
- Bases: - allennlp.data.dataset_readers.dataset_reader.DatasetReader- This DatasetReader is designed to read in the English OntoNotes v5.0 data for fine-grained named entity recognition. It returns a dataset of instances with the following fields: - tokensTextField
- The tokens in the sentence. 
- tagsSequenceLabelField
- A sequence of BIO tags for the NER classes. 
 - Note that the “/pt/” directory of the Onotonotes dataset representing annotations on the new and old testaments of the Bible are excluded, because they do not contain NER annotations. - Parameters
- token_indexersDict[str, TokenIndexer], optional
- We similarly use this for both the premise and the hypothesis. See - TokenIndexer. Default is- {"tokens": SingleIdTokenIndexer()}.
- domain_identifier: ``str``, (default = None)
- A string denoting a sub-domain of the Ontonotes 5.0 dataset to use. If present, only conll files under paths containing this domain identifier will be processed. 
- coding_schemestr, (default = None).
- The coding scheme to use for the NER labels. Valid options are “BIO” or “BIOUL”. 
 
- token_indexers
- Returns
- A DatasetofInstancesfor Fine-Grained NER.
 
- A 
 
- tokens