Skip to content





class SelfAttentiveSpanExtractor(SpanExtractorWithSpanWidthEmbedding):
 | def __init__(
 |     self,
 |     input_dim: int,
 |     num_width_embeddings: int = None,
 |     span_width_embedding_dim: int = None,
 |     bucket_widths: bool = False
 | ) -> None

Computes span representations by generating an unnormalized attention score for each word in the document. Spans representations are computed with respect to these scores by normalising the attention scores for words inside the span.

Given these attention distributions over every span, this module weights the corresponding vector representations of the words in the span by this distribution, returning a weighted representation of each span.

Registered as a SpanExtractor with name "self_attentive".


  • input_dim : int
    The final dimension of the sequence_tensor.
  • num_width_embeddings : int, optional (default = None)
    Specifies the number of buckets to use when representing span width features.
  • span_width_embedding_dim : int, optional (default = None)
    The embedding size for the span_width features.
  • bucket_widths : bool, optional (default = False)
    Whether to bucket the span widths into log-space buckets. If False, the raw span widths are used.


  • attended_text_embeddings : torch.FloatTensor.
    A tensor of shape (batch_size, num_spans, input_dim), which each span representation is formed by locally normalising a global attention over the sequence. The only way in which the attention distribution differs over different spans is in the set of words over which they are normalized.


class SelfAttentiveSpanExtractor(SpanExtractorWithSpanWidthEmbedding):
 | ...
 | def get_output_dim(self) -> int