Skip to content

scaled_dot_product_attention

allennlp.modules.attention.scaled_dot_product_attention

[SOURCE]


ScaledDotProductAttention

@Attention.register("scaled_dot_product")
class ScaledDotProductAttention(DotProductAttention):
 | def __init__(
 |     self,
 |     scaling_factor: Optional[int] = None,
 |     normalize: bool = True
 | ) -> None

Computes attention between two tensors using scaled dot product.

Reference: [Attention Is All You Need (Vaswani et al, 2017)]

(https://api.semanticscholar.org/CorpusID:13756489)

Registered as an Attention with name "scaled_dot_product".

Parameters

  • scaling_factor : int
    The similarity score is scaled down by the scaling_factor.
  • normalize : bool, optional (default = True)
    If true, we normalize the computed similarities with a softmax, to return a probability distribution for your attention. If false, this is just computing a similarity score.