Skip to content





class ScaledDotProductAttention(DotProductAttention):
 | def __init__(
 |     self,
 |     scaling_factor: Optional[int] = None,
 |     normalize: bool = True
 | ) -> None

Computes attention between two tensors using scaled dot product.

Reference: [Attention Is All You Need (Vaswani et al, 2017)]


Registered as an Attention with name "scaled_dot_product".


  • scaling_factor : int
    The similarity score is scaled down by the scaling_factor.
  • normalize : bool, optional (default = True)
    If true, we normalize the computed similarities with a softmax, to return a probability distribution for your attention. If false, this is just computing a similarity score.