Skip to content



DEFAULT_SRL_EVAL_PATH = os.path.abspath(
    os.path.join(os.path.dirname(os.path.realpath(__file__)), "..", "tools", "srl-e ...


class SrlEvalScorer(Metric):
 | def __init__(
 |     self,
 |     srl_eval_path: str = DEFAULT_SRL_EVAL_PATH,
 |     ignore_classes: List[str] = None
 | ) -> None

This class uses the external script for computing the CoNLL SRL metrics.

AllenNLP contains the script, but you will need perl 5.x.

Note that this metric reads and writes from disk quite a bit. In particular, it writes and subsequently reads two files per call, which is typically invoked once per batch. You probably don't want to include it in your training loop; instead, you should calculate this on a validation set only.


  • srl_eval_path : str, optional
    The path to the script.
  • ignore_classes : List[str], optional (default = None)
    A list of classes to ignore.


class SrlEvalScorer(Metric):
 | ...
 | @overrides
 | def __call__(
 |     self,
 |     batch_verb_indices: List[Optional[int]],
 |     batch_sentences: List[List[str]],
 |     batch_conll_formatted_predicted_tags: List[List[str]],
 |     batch_conll_formatted_gold_tags: List[List[str]]
 | ) -> None


  • batch_verb_indices : List[Optional[int]]
    The indices of the verbal predicate in the sentences which the gold labels are the arguments for, or None if the sentence contains no verbal predicate.
  • batch_sentences : List[List[str]]
    The word tokens for each instance in the batch.
  • batch_conll_formatted_predicted_tags : List[List[str]]
    A list of predicted CoNLL-formatted SRL tags (itself a list) to compute score for. Use allennlp.models.semantic_role_labeler.convert_bio_tags_to_conll_format to convert from BIO to CoNLL format before passing the tags into the metric, if applicable.
  • batch_conll_formatted_gold_tags : List[List[str]]
    A list of gold CoNLL-formatted SRL tags (itself a list) to use as a reference. Use allennlp.models.semantic_role_labeler.convert_bio_tags_to_conll_format to convert from BIO to CoNLL format before passing the tags into the metric, if applicable.


class SrlEvalScorer(Metric):
 | ...
 | def get_metric(self, reset: bool = False)


  • A Dict per label containing following the span based metrics:

  • - precision : float

  • - recall : float
  • - f1-measure : float

  • Additionally, an overall key is included, which provides the precision,

  • recall and f1-measure for all spans.


class SrlEvalScorer(Metric):
 | ...
 | def reset(self)