multi_head_self_attention
allennlp_models.rc.modules.seq2seq_encoders.multi_head_self_attention
MultiHeadSelfAttention#
@Seq2SeqEncoder.register("multi_head_self_attention", exist_ok=True)
class MultiHeadSelfAttention(Seq2SeqEncoder):
| def __init__(
| self,
| num_heads: int,
| input_dim: int,
| attention_dim: int,
| values_dim: int,
| output_projection_dim: int = None,
| attention_dropout_prob: float = 0.1
| ) -> None
This class implements the key-value scaled dot product attention mechanism detailed in the paper Attention is all you Need.
The attention mechanism is a weighted sum of a projection V of the inputs, with respect to the scaled, normalised dot product of Q and K, which are also both linear projections of the input. This procedure is repeated for each attention head, using different parameters.
Parameters¶
- num_heads :
int
The number of attention heads to use. - input_dim :
int
The size of the last dimension of the input tensor. attention_dimint
, required. The total dimension of the query and key projections which comprise the dot product attention function. Must be divisible bynum_heads
. - values_dim :
int
The total dimension which the input is projected to for representing the values, which are combined using the attention. Must be divisible bynum_heads
. - output_projection_dim :
int
, optional (default =None
)
The dimensionality of the final output projection. If this is not passed explicitly, the projection has sizeinput_size
. - attention_dropout_prob :
float
, optional (default =0.1
)
The dropout probability applied to the normalised attention distributions.
get_input_dim#
class MultiHeadSelfAttention(Seq2SeqEncoder):
| ...
| def get_input_dim(self)
get_output_dim#
class MultiHeadSelfAttention(Seq2SeqEncoder):
| ...
| def get_output_dim(self)
is_bidirectional#
class MultiHeadSelfAttention(Seq2SeqEncoder):
| ...
| @overrides
| def is_bidirectional(self)
forward#
class MultiHeadSelfAttention(Seq2SeqEncoder):
| ...
| @overrides
| def forward(
| self,
| inputs: torch.Tensor,
| mask: torch.BoolTensor = None
| ) -> torch.FloatTensor
Parameters¶
- inputs :
torch.FloatTensor
A tensor of shape (batch_size, timesteps, input_dim) - mask :
torch.BoolTensor
, optional (default =None
)
A tensor of shape (batch_size, timesteps).
Returns¶
- A tensor of shape
(batch_size, timesteps, output_projection_dim)
, - where output_projection_dim = input_dim by default.