class allennlp.models.srl_bert.SrlBert(vocab:, bert_model: Union[str, pytorch_pretrained_bert.modeling.BertModel], embedding_dropout: float = 0.0, initializer: allennlp.nn.initializers.InitializerApplicator = <allennlp.nn.initializers.InitializerApplicator object>, regularizer: Optional[allennlp.nn.regularizers.regularizer_applicator.RegularizerApplicator] = None, label_smoothing: float = None, ignore_span_metric: bool = False, srl_eval_path: str = '/Users/michael/hack/allenai/allennlp/allennlp/tools/')[source]

Bases: allennlp.models.model.Model

vocabVocabulary, required

A Vocabulary, required in order to compute sizes for input/output projections.

modelUnion[str, BertModel], required.

A string describing the BERT model to load or an already constructed BertModel.

initializerInitializerApplicator, optional (default=``InitializerApplicator()``)

Used to initialize the model parameters.

regularizerRegularizerApplicator, optional (default=``None``)

If provided, will be used to calculate the regularization penalty during training.

label_smoothingfloat, optional (default = 0.0)

Whether or not to use label smoothing on the labels when computing cross entropy loss.

ignore_span_metric: ``bool``, optional (default = False)

Whether to calculate span loss, which is irrelevant when predicting BIO for Open Information Extraction.

srl_eval_path: ``str``, optional (default=``DEFAULT_SRL_EVAL_PATH``)

The path to the script. By default, will use the included with allennlp, which is located at allennlp/tools/ . If None, is not used.

decode(self, output_dict: Dict[str, torch.Tensor]) → Dict[str, torch.Tensor][source]

Does constrained viterbi decoding on class probabilities output in forward(). The constraint simply specifies that the output tags must be a valid BIO sequence. We add a "tags" key to the dictionary with the result.

NOTE: First, we decode a BIO sequence on top of the wordpieces. This is important; viterbi decoding produces low quality output if you decode on top of word representations directly, because the model gets confused by the ‘missing’ positions (which is sensible as it is trained to perform tagging on wordpieces, not words).

Secondly, it’s important that the indices we use to recover words from the wordpieces are the start_offsets (i.e offsets which correspond to using the first wordpiece of words which are tokenized into multiple wordpieces) as otherwise, we might get an ill-formed BIO sequence when we select out the word tags from the wordpiece tags. This happens in the case that a word is split into multiple word pieces, and then we take the last tag of the word, which might correspond to, e.g, I-V, which would not be allowed as it is not preceeded by a B tag.

forward(self, tokens: Dict[str, torch.Tensor], verb_indicator: torch.Tensor, metadata: List[Any], tags: torch.LongTensor = None)[source]
tokensDict[str, torch.LongTensor], required

The output of TextField.as_array(), which should typically be passed directly to a TextFieldEmbedder. For this model, this must be a SingleIdTokenIndexer which indexes wordpieces from the BERT vocabulary.

verb_indicator: torch.LongTensor, required.

An integer SequenceFeatureField representation of the position of the verb in the sentence. This should have shape (batch_size, num_tokens) and importantly, can be all zeros, in the case that the sentence has no verbal predicate.

tagstorch.LongTensor, optional (default = None)

A torch tensor representing the sequence of integer gold class labels of shape (batch_size, num_tokens)

metadataList[Dict[str, Any]], optional, (default = None)

metadata containg the original words in the sentence, the verb to compute the frame for, and start offsets for converting wordpieces back to a sequence of words, under ‘words’, ‘verb’ and ‘offsets’ keys, respectively.

An output dictionary consisting of:

A tensor of shape (batch_size, num_tokens, tag_vocab_size) representing unnormalised log probabilities of the tag classes.


A tensor of shape (batch_size, num_tokens, tag_vocab_size) representing a distribution of the tag classes per word.

losstorch.FloatTensor, optional

A scalar loss to be optimised.

get_metrics(self, reset: bool = False)[source]

Returns a dictionary of metrics. This method will be called by in order to compute and use model metrics for early stopping and model serialization. We return an empty dictionary here rather than raising as it is not required to implement metrics for a new model. A boolean reset parameter is passed, as frequently a metric accumulator will have some state which should be reset between epochs. This is also compatible with Metrics should be populated during the call to ``forward`, with the Metric handling the accumulation of the metric until this method is called.


In the BIO sequence, we cannot start the sequence with an I-XXX tag. This transition sequence is passed to viterbi_decode to specify this constraint.


The pairwise potentials between a START token and the first token of the sequence.


Generate a matrix of pairwise transition potentials for the BIO labels. The only constraint implemented here is that I-XXX labels must be preceded by either an identical I-XXX tag or a B-XXX tag. In order to achieve this constraint, pairs of labels which do not satisfy this constraint have a pairwise potential of -inf.


A (num_labels, num_labels) matrix of pairwise potentials.