Skip to content




An attention module that computes the similarity between an input vector and the rows of a matrix.


class Attention(torch.nn.Module,  Registrable):
 | def __init__(self, normalize: bool = True) -> None

An Attention takes two inputs: a (batched) vector and a matrix, plus an optional mask on the rows of the matrix. We compute the similarity between the vector and each row in the matrix, and then (optionally) perform a softmax over rows using those computed similarities.


  • vector: shape (batch_size, embedding_dim)
  • matrix: shape (batch_size, num_rows, embedding_dim)
  • matrix_mask: shape (batch_size, num_rows), specifying which rows are just padding.


  • attention: shape (batch_size, num_rows).


  • normalize : bool, optional (default = True)
    If true, we normalize the computed similarities with a softmax, to return a probability distribution for your attention. If false, this is just computing a similarity score.


class Attention(torch.nn.Module,  Registrable):
 | ...
 | def forward(
 |     self,
 |     vector: torch.Tensor,
 |     matrix: torch.Tensor,
 |     matrix_mask: torch.BoolTensor = None
 | ) -> torch.Tensor