srl
allennlp_models.structured_prediction.dataset_readers.srl
SrlReader#
@DatasetReader.register("srl")
class SrlReader(DatasetReader):
| def __init__(
| self,
| token_indexers: Dict[str, TokenIndexer] = None,
| domain_identifier: str = None,
| bert_model_name: str = None,
| **kwargs
| ) -> None
This DatasetReader is designed to read in the English OntoNotes v5.0 data for semantic role labelling. It returns a dataset of instances with the following fields:
tokens : TextField
The tokens in the sentence.
verb_indicator : SequenceLabelField
A sequence of binary indicators for whether the word is the verb for this frame.
tags : SequenceLabelField
A sequence of Propbank tags for the given verb in a BIO format.
Parameters¶
- token_indexers :
Dict[str, TokenIndexer]
, optional
We similarly use this for both the premise and the hypothesis. SeeTokenIndexer
. Default is{"tokens": SingleIdTokenIndexer()}
. - domain_identifier :
str
, optional (default =None
)
A string denoting a sub-domain of the Ontonotes 5.0 dataset to use. If present, only conll files under paths containing this domain identifier will be processed. - bert_model_name :
Optional[str]
, optional (default =None
)
The BERT model to be wrapped. If you specify a bert_model here, then we will assume you want to use BERT throughout; we will use the bert tokenizer, and will expand your tags and verb indicators accordingly. If not, the tokens will be indexed as normal with the token_indexers.
Returns¶
- A
Dataset
ofInstances
for Semantic Role Labelling.
text_to_instance#
class SrlReader(DatasetReader):
| ...
| def text_to_instance(
| self,
| tokens: List[Token],
| verb_label: List[int],
| tags: List[str] = None
| ) -> Instance
We take pre-tokenized
input here, along with a verb label. The verb label should be a
one-hot binary vector, the same length as the tokens, indicating the position of the verb
to find arguments for.