allennlp.modules.attention¶
An attention module that computes the similarity between an input vector and the rows of a matrix.
-
class
allennlp.modules.attention.attention.
Attention
(normalize: bool = True)[source]¶ Bases:
torch.nn.modules.module.Module
,allennlp.common.registrable.Registrable
An
Attention
takes two inputs: a (batched) vector and a matrix, plus an optional mask on the rows of the matrix. We compute the similarity between the vector and each row in the matrix, and then (optionally) perform a softmax over rows using those computed similarities.Inputs:
vector: shape
(batch_size, embedding_dim)
matrix: shape
(batch_size, num_rows, embedding_dim)
matrix_mask: shape
(batch_size, num_rows)
, specifying which rows are just padding.
Output:
attention: shape
(batch_size, num_rows)
.
- Parameters
- normalize
bool
, optional (default:True
) If true, we normalize the computed similarities with a softmax, to return a probability distribution for your attention. If false, this is just computing a similarity score.
- normalize
-
forward
(self, vector: torch.Tensor, matrix: torch.Tensor, matrix_mask: torch.Tensor = None) → torch.Tensor[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class
allennlp.modules.attention.bilinear_attention.
BilinearAttention
(vector_dim: int, matrix_dim: int, activation: allennlp.nn.activations.Activation = None, normalize: bool = True)[source]¶ Bases:
allennlp.modules.attention.attention.Attention
Computes attention between a vector and a matrix using a bilinear attention function. This function has a matrix of weights
W
and a biasb
, and the similarity between the vectorx
and the matrixy
is computed asx^T W y + b
.- Parameters
- vector_dim
int
The dimension of the vector,
x
, described above. This isx.size()[-1]
- the length of the vector that will go into the similarity computation. We need this so we can build the weight matrix correctly.- matrix_dim
int
The dimension of the matrix,
y
, described above. This isy.size()[-1]
- the length of the vector that will go into the similarity computation. We need this so we can build the weight matrix correctly.- activation
Activation
, optional (default=linear (i.e. no activation)) An activation function applied after the
x^T W y + b
calculation. Default is no activation.- normalize
bool
, optional (default:True
) If true, we normalize the computed similarities with a softmax, to return a probability distribution for your attention. If false, this is just computing a similarity score.
- vector_dim
-
class
allennlp.modules.attention.additive_attention.
AdditiveAttention
(vector_dim: int, matrix_dim: int, normalize: bool = True)[source]¶ Bases:
allennlp.modules.attention.attention.Attention
Computes attention between a vector and a matrix using an additive attention function. This function has two matrices
W
,U
and a vectorV
. The similarity between the vectorx
and the matrixy
is computed asV tanh(Wx + Uy)
.This attention is often referred as concat or additive attention. It was introduced in <https://arxiv.org/abs/1409.0473> by Bahdanau et al.
- Parameters
- vector_dim
int
The dimension of the vector,
x
, described above. This isx.size()[-1]
- the length of the vector that will go into the similarity computation. We need this so we can build the weight matrix correctly.- matrix_dim
int
The dimension of the matrix,
y
, described above. This isy.size()[-1]
- the length of the vector that will go into the similarity computation. We need this so we can build the weight matrix correctly.- normalize
bool
, optional (default:True
) If true, we normalize the computed similarities with a softmax, to return a probability distribution for your attention. If false, this is just computing a similarity score.
- vector_dim
-
class
allennlp.modules.attention.cosine_attention.
CosineAttention
(normalize: bool = True)[source]¶ Bases:
allennlp.modules.attention.attention.Attention
Computes attention between a vector and a matrix using cosine similarity.
-
class
allennlp.modules.attention.dot_product_attention.
DotProductAttention
(normalize: bool = True)[source]¶ Bases:
allennlp.modules.attention.attention.Attention
Computes attention between a vector and a matrix using dot product.
-
class
allennlp.modules.attention.legacy_attention.
LegacyAttention
(similarity_function: allennlp.modules.similarity_functions.similarity_function.SimilarityFunction = None, normalize: bool = True)[source]¶ Bases:
allennlp.modules.attention.attention.Attention
Computes attention between a vector and a matrix using a similarity function. This should be considered deprecated, as it consumes more memory than the specialized attention modules.
-
class
allennlp.modules.attention.linear_attention.
LinearAttention
(tensor_1_dim: int, tensor_2_dim: int, combination: str = 'x, y', activation: allennlp.nn.activations.Activation = None, normalize: bool = True)[source]¶ Bases:
allennlp.modules.attention.attention.Attention
This
Attention
module performs a dot product between a vector of weights and some combination of the two input vectors, followed by an (optional) activation function. The combination used is configurable.If the two vectors are
x
andy
, we allow the following kinds of combinations:x
,y
,x*y
,x+y
,x-y
,x/y
, where each of those binary operations is performed elementwise. You can list as many combinations as you want, comma separated. For example, you might givex,y,x*y
as thecombination
parameter to this class. The computed similarity function would then bew^T [x; y; x*y] + b
, wherew
is a vector of weights,b
is a bias parameter, and[;]
is vector concatenation.Note that if you want a bilinear similarity function with a diagonal weight matrix W, where the similarity function is computed as x * w * y + b (with w the diagonal of W), you can accomplish that with this class by using “x*y” for combination.
- Parameters
- tensor_1_dim
int
The dimension of the first tensor,
x
, described above. This isx.size()[-1]
- the length of the vector that will go into the similarity computation. We need this so we can build weight vectors correctly.- tensor_2_dim
int
The dimension of the second tensor,
y
, described above. This isy.size()[-1]
- the length of the vector that will go into the similarity computation. We need this so we can build weight vectors correctly.- combination
str
, optional (default=”x,y”) Described above.
- activation
Activation
, optional (default=linear (i.e. no activation)) An activation function applied after the
w^T * [x;y] + b
calculation. Default is no activation.
- tensor_1_dim