allennlp.data.dataset_readers.universal_dependencies¶
- 
class allennlp.data.dataset_readers.universal_dependencies.UniversalDependenciesDatasetReader(token_indexers: Dict[str, allennlp.data.token_indexers.token_indexer.TokenIndexer] = None, use_language_specific_pos: bool = False, tokenizer: allennlp.data.tokenizers.tokenizer.Tokenizer = None, lazy: bool = False)[source]¶
- Bases: - allennlp.data.dataset_readers.dataset_reader.DatasetReader- Reads a file in the conllu Universal Dependencies format. - Parameters
- token_indexersDict[str, TokenIndexer], optional (default=``{“tokens”: SingleIdTokenIndexer()}``)
- The token indexers to be applied to the words TextField. 
- use_language_specific_posbool, optional (default = False)
- Whether to use UD POS tags, or to use the language specific POS tags provided in the conllu format. 
- tokenizerTokenizer, optional, default = None
- A tokenizer to use to split the text. This is useful when the tokens that you pass into the model need to have some particular attribute. Typically it is not necessary. 
 
- token_indexers
 - 
text_to_instance(self, words: List[str], upos_tags: List[str], dependencies: List[Tuple[str, int]] = None) → allennlp.data.instance.Instance[source]¶
- Parameters
- wordsList[str], required.
- The words in the sentence to be encoded. 
- upos_tagsList[str], required.
- The universal dependencies POS tags for each word. 
- dependenciesList[Tuple[str, int]], optional (default = None)
- A list of (head tag, head index) tuples. Indices are 1 indexed, meaning an index of 0 corresponds to that word being the root of the dependency tree. 
 
- words
- Returns
- An instance containing words, upos tags, dependency head tags and head
- indices as fields.