Skip to content




TAGS_TO_SPANS_FUNCTION_TYPE = Callable[[List[str], Optional[List[str]]], List[TypedStringSpan]]


class SpanBasedF1Measure(Metric):
 | def __init__(
 |     self,
 |     vocabulary: Vocabulary,
 |     tag_namespace: str = "tags",
 |     ignore_classes: List[str] = None,
 |     label_encoding: Optional[str] = "BIO",
 |     tags_to_spans_function: Optional[TAGS_TO_SPANS_FUNCTION_TYPE] = None
 | ) -> None

The Conll SRL metrics are based on exact span matching. This metric implements span-based precision and recall metrics for a BIO tagging scheme. It will produce precision, recall and F1 measures per tag, as well as overall statistics. Note that the implementation of this metric is not exactly the same as the perl script used to evaluate the CONLL 2005 data - particularly, it does not consider continuations or reference spans as constituents of the original span. However, it is a close proxy, which can be helpful for judging model performance during training. This metric works properly when the spans are unlabeled (i.e., your labels are simply "B", "I", "O" if using the "BIO" label encoding).


class SpanBasedF1Measure(Metric):
 | ...
 | def __call__(
 |     self,
 |     predictions: torch.Tensor,
 |     gold_labels: torch.Tensor,
 |     mask: Optional[torch.BoolTensor] = None,
 |     prediction_map: Optional[torch.Tensor] = None
 | )


  • predictions : torch.Tensor
    A tensor of predictions of shape (batch_size, sequence_length, num_classes).
  • gold_labels : torch.Tensor
    A tensor of integer class label of shape (batch_size, sequence_length). It must be the same shape as the predictions tensor without the num_classes dimension.
  • mask : torch.BoolTensor, optional (default = None)
    A masking tensor the same size as gold_labels.
  • prediction_map : torch.Tensor, optional (default = None)
    A tensor of size (batch_size, num_classes) which provides a mapping from the index of predictions to the indices of the label vocabulary. If provided, the output label at each timestep will be vocabulary.get_index_to_token_vocabulary(prediction_map[batch, argmax(predictions[batch, t])), rather than simply vocabulary.get_index_to_token_vocabulary(argmax(predictions[batch, t])). This is useful in cases where each Instance in the dataset is associated with a different possible subset of labels from a large label-space (IE FrameNet, where each frame has a different set of possible roles associated with it).


class SpanBasedF1Measure(Metric):
 | ...
 | def get_metric(self, reset: bool = False)


  • Dict[str, float]
    A Dict per label containing following the span based metrics:

    • precision : float
    • recall : float
    • f1-measure : float

    Additionally, an overall key is included, which provides the precision, recall and f1-measure for all spans.


class SpanBasedF1Measure(Metric):
 | ...
 | def reset(self)