multi_head_self_attention
MultiHeadSelfAttention#
class MultiHeadSelfAttention(Seq2SeqEncoder):
| def __init__(
| self,
| num_heads: int,
| input_dim: int,
| attention_dim: int,
| values_dim: int,
| output_projection_dim: int = None,
| attention_dropout_prob: float = 0.1
| ) -> None
This class implements the key-value scaled dot product attention mechanism detailed in the paper [Attention is all you Need] (https://www.semanticscholar.org/paper/Attention-Is-All-You-Need-Vaswani-Shazeer/0737da0767d77606169cbf4187b83e1ab62f6077).
The attention mechanism is a weighted sum of a projection V of the inputs, with respect to the scaled, normalised dot product of Q and K, which are also both linear projections of the input. This procedure is repeated for each attention head, using different parameters.
Parameters
- num_heads :
int
The number of attention heads to use. - input_dim :
int
The size of the last dimension of the input tensor. attention_dimint
, required. The total dimension of the query and key projections which comprise the dot product attention function. Must be divisible bynum_heads
. - values_dim :
int
The total dimension which the input is projected to for representing the values, which are combined using the attention. Must be divisible bynum_heads
. - output_projection_dim :
int
, optional (default =None
)
The dimensionality of the final output projection. If this is not passed explicitly, the projection has sizeinput_size
. - attention_dropout_prob :
float
, optional (default =0.1
)
The dropout probability applied to the normalised attention distributions.
get_input_dim#
class MultiHeadSelfAttention(Seq2SeqEncoder):
| ...
| def get_input_dim(self)
get_output_dim#
class MultiHeadSelfAttention(Seq2SeqEncoder):
| ...
| def get_output_dim(self)
is_bidirectional#
class MultiHeadSelfAttention(Seq2SeqEncoder):
| ...
| @overrides
| def is_bidirectional(self)
forward#
class MultiHeadSelfAttention(Seq2SeqEncoder):
| ...
| @overrides
| def forward(
| self,
| inputs: torch.Tensor,
| mask: torch.BoolTensor = None
| ) -> torch.FloatTensor
Parameters
- inputs :
torch.FloatTensor
A tensor of shape (batch_size, timesteps, input_dim) - mask :
torch.BoolTensor
, optional (default =None
)
A tensor of shape (batch_size, timesteps).
Returns
- A tensor of shape
(batch_size, timesteps, output_projection_dim)
, - where output_projection_dim = input_dim by default.