Skip to content

srl

allennlp_models.structured_prediction.models.srl

[SOURCE]


write_bio_formatted_tags_to_file#

def write_bio_formatted_tags_to_file(
    prediction_file: TextIO,
    gold_file: TextIO,
    verb_index: Optional[int],
    sentence: List[str],
    prediction: List[str],
    gold_labels: List[str]
)

Prints predicate argument predictions and gold labels for a single verbal predicate in a sentence to two provided file references.

The CoNLL SRL format is described in the shared task data README.

This function expects IOB2-formatted tags, where the B- tag is used in the beginning of every chunk (i.e. all chunks start with the B- tag).

Parameters

  • prediction_file : TextIO
    A file reference to print predictions to.
  • gold_file : TextIO
    A file reference to print gold labels to.
  • verb_index : Optional[int]
    The index of the verbal predicate in the sentence which the gold labels are the arguments for, or None if the sentence contains no verbal predicate.
  • sentence : List[str]
    The word tokens.
  • prediction : List[str]
    The predicted BIO labels.
  • gold_labels : List[str]
    The gold BIO labels.

write_conll_formatted_tags_to_file#

def write_conll_formatted_tags_to_file(
    prediction_file: TextIO,
    gold_file: TextIO,
    verb_index: Optional[int],
    sentence: List[str],
    conll_formatted_predictions: List[str],
    conll_formatted_gold_labels: List[str]
)

Prints predicate argument predictions and gold labels for a single verbal predicate in a sentence to two provided file references.

The CoNLL SRL format is described in the shared task data README.

This function expects IOB2-formatted tags, where the B- tag is used in the beginning of every chunk (i.e. all chunks start with the B- tag).

Parameters

  • prediction_file : TextIO
    A file reference to print predictions to.
  • gold_file : TextIO
    A file reference to print gold labels to.
  • verb_index : Optional[int]
    The index of the verbal predicate in the sentence which the gold labels are the arguments for, or None if the sentence contains no verbal predicate.
  • sentence : List[str]
    The word tokens.
  • conll_formatted_predictions : List[str]
    The predicted CoNLL-formatted labels.
  • conll_formatted_gold_labels : List[str]
    The gold CoNLL-formatted labels.

convert_bio_tags_to_conll_format#

def convert_bio_tags_to_conll_format(labels: List[str])

Converts BIO formatted SRL tags to the format required for evaluation with the official CONLL 2005 perl script. Spans are represented by bracketed labels, with the labels of words inside spans being the same as those outside spans. Beginning spans always have a opening bracket and a closing asterisk (e.g. "(ARG-1" ) and closing spans always have a closing bracket (e.g. ")" ). This applies even for length 1 spans, (e.g "(ARG-0*)").

A full example of the conversion performed:

[B-ARG-1, I-ARG-1, I-ARG-1, I-ARG-1, I-ARG-1, O]
[ "(ARG-1*", "*", "*", "*", "*)", "*"]

Parameters

  • labels : List[str]
    A list of BIO tags to convert to the CONLL span based format.

Returns

  • A list of labels in the CONLL span based format.

SemanticRoleLabeler#

@Model.register("srl")
class SemanticRoleLabeler(Model):
 | def __init__(
 |     self,
 |     vocab: Vocabulary,
 |     text_field_embedder: TextFieldEmbedder,
 |     encoder: Seq2SeqEncoder,
 |     binary_feature_dim: int,
 |     embedding_dropout: float = 0.0,
 |     initializer: InitializerApplicator = InitializerApplicator(),
 |     label_smoothing: float = None,
 |     ignore_span_metric: bool = False,
 |     srl_eval_path: str = DEFAULT_SRL_EVAL_PATH,
 |     **kwargs
 | ) -> None

This model performs semantic role labeling using BIO tags using Propbank semantic roles. Specifically, it is an implementation of Deep Semantic Role Labeling - What works and what's next.

This implementation is effectively a series of stacked interleaved LSTMs with highway connections, applied to embedded sequences of words concatenated with a binary indicator containing whether or not a word is the verbal predicate to generate predictions for in the sentence. Additionally, during inference, Viterbi decoding is applied to constrain the predictions to contain valid BIO sequences.

Specifically, the model expects and outputs IOB2-formatted tags, where the B- tag is used in the beginning of every chunk (i.e. all chunks start with the B- tag).

Parameters

  • vocab : Vocabulary
    A Vocabulary, required in order to compute sizes for input/output projections.
  • text_field_embedder : TextFieldEmbedder
    Used to embed the tokens TextField we get as input to the model.
  • encoder : Seq2SeqEncoder
    The encoder (with its own internal stacking) that we will use in between embedding tokens and predicting output tags.
  • binary_feature_dim : int
    The dimensionality of the embedding of the binary verb predicate features.
  • initializer : InitializerApplicator, optional (default = InitializerApplicator())
    Used to initialize the model parameters.
  • label_smoothing : float, optional (default = 0.0)
    Whether or not to use label smoothing on the labels when computing cross entropy loss.
  • ignore_span_metric : bool, optional (default = False)
    Whether to calculate span loss, which is irrelevant when predicting BIO for Open Information Extraction.
  • srl_eval_path : str, optional (default = DEFAULT_SRL_EVAL_PATH)
    The path to the srl-eval.pl script. By default, will use the srl-eval.pl included with allennlp, which is located at allennlp/tools/srl-eval.pl . If None, srl-eval.pl is not used.

forward#

class SemanticRoleLabeler(Model):
 | ...
 | def forward(
 |     self,
 |     tokens: TextFieldTensors,
 |     verb_indicator: torch.LongTensor,
 |     tags: torch.LongTensor = None,
 |     metadata: List[Dict[str, Any]] = None
 | ) -> Dict[str, torch.Tensor]

Parameters

  • tokens : TextFieldTensors
    The output of TextField.as_array(), which should typically be passed directly to a TextFieldEmbedder. This output is a dictionary mapping keys to TokenIndexer tensors. At its most basic, using a SingleIdTokenIndexer this is : {"tokens": Tensor(batch_size, num_tokens)}. This dictionary will have the same keys as were used for the TokenIndexers when you created the TextField representing your sequence. The dictionary is designed to be passed directly to a TextFieldEmbedder, which knows how to combine different word representations into a single vector per token in your input.
  • verb_indicator : torch.LongTensor
    An integer SequenceFeatureField representation of the position of the verb in the sentence. This should have shape (batch_size, num_tokens) and importantly, can be all zeros, in the case that the sentence has no verbal predicate.
  • tags : torch.LongTensor, optional (default = None)
    A torch tensor representing the sequence of integer gold class labels of shape (batch_size, num_tokens)
  • metadata : List[Dict[str, Any]], optional (default = None)
    metadata containg the original words in the sentence and the verb to compute the frame for, under 'words' and 'verb' keys, respectively.

Returns

  • An output dictionary consisting of:

  • logits : torch.FloatTensor
    A tensor of shape (batch_size, num_tokens, tag_vocab_size) representing unnormalised log probabilities of the tag classes.

  • class_probabilities : torch.FloatTensor
    A tensor of shape (batch_size, num_tokens, tag_vocab_size) representing a distribution of the tag classes per word.
  • loss : torch.FloatTensor, optional
    A scalar loss to be optimised.

make_output_human_readable#

class SemanticRoleLabeler(Model):
 | ...
 | def make_output_human_readable(
 |     self,
 |     output_dict: Dict[str, torch.Tensor]
 | ) -> Dict[str, torch.Tensor]

Does constrained viterbi decoding on class probabilities output in forward. The constraint simply specifies that the output tags must be a valid BIO sequence. We add a "tags" key to the dictionary with the result.

get_metrics#

class SemanticRoleLabeler(Model):
 | ...
 | def get_metrics(self, reset: bool = False)

get_viterbi_pairwise_potentials#

class SemanticRoleLabeler(Model):
 | ...
 | def get_viterbi_pairwise_potentials(self)

Generate a matrix of pairwise transition potentials for the BIO labels. The only constraint implemented here is that I-XXX labels must be preceded by either an identical I-XXX tag or a B-XXX tag. In order to achieve this constraint, pairs of labels which do not satisfy this constraint have a pairwise potential of -inf.

Returns

  • transition_matrix : torch.Tensor
    A (num_labels, num_labels) matrix of pairwise potentials.

get_start_transitions#

class SemanticRoleLabeler(Model):
 | ...
 | def get_start_transitions(self)

In the BIO sequence, we cannot start the sequence with an I-XXX tag. This transition sequence is passed to viterbi_decode to specify this constraint.

Returns

  • start_transitions : torch.Tensor
    The pairwise potentials between a START token and the first token of the sequence.

default_predictor#

class SemanticRoleLabeler(Model):
 | ...
 | default_predictor = "semantic_role_labeling"

write_to_conll_eval_file#

def write_to_conll_eval_file(
    prediction_file: TextIO,
    gold_file: TextIO,
    verb_index: Optional[int],
    sentence: List[str],
    prediction: List[str],
    gold_labels: List[str]
)

.. deprecated:: 0.8.4 The write_to_conll_eval_file function was deprecated in favor of the identical write_bio_formatted_tags_to_file in version 0.8.4.

Prints predicate argument predictions and gold labels for a single verbal predicate in a sentence to two provided file references.

The CoNLL SRL format is described in the shared task data README.

This function expects IOB2-formatted tags, where the B- tag is used in the beginning of every chunk (i.e. all chunks start with the B- tag).

Parameters

  • prediction_file : TextIO
    A file reference to print predictions to.
  • gold_file : TextIO
    A file reference to print gold labels to.
  • verb_index : Optional[int]
    The index of the verbal predicate in the sentence which the gold labels are the arguments for, or None if the sentence contains no verbal predicate.
  • sentence : List[str]
    The word tokens.
  • prediction : List[str]
    The predicted BIO labels.
  • gold_labels : List[str]
    The gold BIO labels.