allennlp.modules.matrix_attention¶
-
class
allennlp.modules.matrix_attention.matrix_attention.
MatrixAttention
[source]¶ Bases:
torch.nn.modules.module.Module
,allennlp.common.registrable.Registrable
MatrixAttention
takes two matrices as input and returns a matrix of attentions.We compute the similarity between each row in each matrix and return unnormalized similarity scores. Because these scores are unnormalized, we don’t take a mask as input; it’s up to the caller to deal with masking properly when this output is used.
- Input:
matrix_1:
(batch_size, num_rows_1, embedding_dim_1)
matrix_2:
(batch_size, num_rows_2, embedding_dim_2)
- Output:
(batch_size, num_rows_1, num_rows_2)
-
forward
(self, matrix_1: torch.Tensor, matrix_2: torch.Tensor) → torch.Tensor[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class
allennlp.modules.matrix_attention.bilinear_matrix_attention.
BilinearMatrixAttention
(matrix_1_dim: int, matrix_2_dim: int, activation: allennlp.nn.activations.Activation = None, use_input_biases: bool = False, label_dim: int = 1)[source]¶ Bases:
allennlp.modules.matrix_attention.matrix_attention.MatrixAttention
Computes attention between two matrices using a bilinear attention function. This function has a matrix of weights
W
and a biasb
, and the similarity between the two matricesX
andY
is computed asX W Y^T + b
.- Parameters
- matrix_1_dim
int
The dimension of the matrix
X
, described above. This isX.size()[-1]
- the length of the vector that will go into the similarity computation. We need this so we can build the weight matrix correctly.- matrix_2_dim
int
The dimension of the matrix
Y
, described above. This isY.size()[-1]
- the length of the vector that will go into the similarity computation. We need this so we can build the weight matrix correctly.- activation
Activation
, optional (default=linear (i.e. no activation)) An activation function applied after the
X W Y^T + b
calculation. Default is no activation.- use_input_biases
bool
, optional (default = False) If True, we add biases to the inputs such that the final computation is equivalent to the original bilinear matrix multiplication plus a projection of both inputs.
- label_dim
int
, optional (default = 1) The number of output classes. Typically in an attention setting this will be one, but this parameter allows this class to function as an equivalent to
torch.nn.Bilinear
for matrices, rather than vectors.
- matrix_1_dim
-
forward
(self, matrix_1: torch.Tensor, matrix_2: torch.Tensor) → torch.Tensor[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class
allennlp.modules.matrix_attention.cosine_matrix_attention.
CosineMatrixAttention
[source]¶ Bases:
allennlp.modules.matrix_attention.matrix_attention.MatrixAttention
Computes attention between every entry in matrix_1 with every entry in matrix_2 using cosine similarity.
-
forward
(self, matrix_1: torch.Tensor, matrix_2: torch.Tensor) → torch.Tensor[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
-
class
allennlp.modules.matrix_attention.dot_product_matrix_attention.
DotProductMatrixAttention
[source]¶ Bases:
allennlp.modules.matrix_attention.matrix_attention.MatrixAttention
Computes attention between every entry in matrix_1 with every entry in matrix_2 using a dot product.
-
forward
(self, matrix_1: torch.Tensor, matrix_2: torch.Tensor) → torch.Tensor[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
-
class
allennlp.modules.matrix_attention.linear_matrix_attention.
LinearMatrixAttention
(tensor_1_dim: int, tensor_2_dim: int, combination: str = 'x, y', activation: allennlp.nn.activations.Activation = None)[source]¶ Bases:
allennlp.modules.matrix_attention.matrix_attention.MatrixAttention
This
MatrixAttention
takes two matrices as input and returns a matrix of attentions by performing a dot product between a vector of weights and some combination of the two input matrices, followed by an (optional) activation function. The combination used is configurable.If the two vectors are
x
andy
, we allow the following kinds of combinations:x
,y
,x*y
,x+y
,x-y
,x/y
, where each of those binary operations is performed elementwise. You can list as many combinations as you want, comma separated. For example, you might givex,y,x*y
as thecombination
parameter to this class. The computed similarity function would then bew^T [x; y; x*y] + b
, wherew
is a vector of weights,b
is a bias parameter, and[;]
is vector concatenation.Note that if you want a bilinear similarity function with a diagonal weight matrix W, where the similarity function is computed as x * w * y + b (with w the diagonal of W), you can accomplish that with this class by using “x*y” for combination.
- Parameters
- tensor_1_dim
int
The dimension of the first tensor,
x
, described above. This isx.size()[-1]
- the length of the vector that will go into the similarity computation. We need this so we can build weight vectors correctly.- tensor_2_dim
int
The dimension of the second tensor,
y
, described above. This isy.size()[-1]
- the length of the vector that will go into the similarity computation. We need this so we can build weight vectors correctly.- combination
str
, optional (default=”x,y”) Described above.
- activation
Activation
, optional (default=linear (i.e. no activation)) An activation function applied after the
w^T * [x;y] + b
calculation. Default is no activation.
- tensor_1_dim
-
forward
(self, matrix_1: torch.Tensor, matrix_2: torch.Tensor) → torch.Tensor[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class
allennlp.modules.matrix_attention.legacy_matrix_attention.
LegacyMatrixAttention
(similarity_function: allennlp.modules.similarity_functions.similarity_function.SimilarityFunction = None)[source]¶ Bases:
allennlp.modules.matrix_attention.matrix_attention.MatrixAttention
The legacy implementation of
MatrixAttention
.It should be considered deprecated as it uses much more memory than the newer specialized
MatrixAttention
modules.- Parameters
- similarity_function: ``SimilarityFunction``, optional (default=``DotProductSimilarity``)
The similarity function to use when computing the attention.
-
forward
(self, matrix_1: torch.Tensor, matrix_2: torch.Tensor) → torch.Tensor[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.