scaled_dot_product_attention
allennlp.modules.attention.scaled_dot_product_attention
ScaledDotProductAttention¶
@Attention.register("scaled_dot_product")
class ScaledDotProductAttention(DotProductAttention):
| def __init__(
| self,
| scaling_factor: Optional[int] = None,
| normalize: bool = True
| ) -> None
Computes attention between two tensors using scaled dot product.
Reference: [Attention Is All You Need (Vaswani et al, 2017)]¶
(https://api.semanticscholar.org/CorpusID:13756489)¶
Registered as an Attention
with name "scaled_dot_product".
Parameters¶
- scaling_factor :
int
The similarity score is scaled down by thescaling_factor
. - normalize :
bool
, optional (default =True
)
If true, we normalize the computed similarities with a softmax, to return a probability distribution for your attention. If false, this is just computing a similarity score.