allennlp.modules.similarity_functions¶
A SimilarityFunction
takes a pair of tensors with the same shape, and computes a similarity
function on the vectors in the last dimension.
-
class
allennlp.modules.similarity_functions.similarity_function.
SimilarityFunction
[source]¶ Bases:
torch.nn.modules.module.Module
,allennlp.common.registrable.Registrable
A
SimilarityFunction
takes a pair of tensors with the same shape, and computes a similarity function on the vectors in the last dimension. For example, the tensors might both have shape (batch_size, sentence_length, embedding_dim), and we will compute some function of the two vectors of length embedding_dim for each position (batch_size, sentence_length), returning a tensor of shape (batch_size, sentence_length).The similarity function could be as simple as a dot product, or it could be a more complex, parameterized function.
If you want to compute a similarity between tensors of different sizes, you need to first tile them in the appropriate dimensions to make them the same before you can use these functions. The
Attention
andMatrixAttention
modules do this.-
default_implementation
: str = 'dot_product'¶
-
forward
(self, tensor_1: torch.Tensor, tensor_2: torch.Tensor) → torch.Tensor[source]¶ Takes two tensors of the same shape, such as
(batch_size, length_1, length_2, embedding_dim)
. Computes a (possibly parameterized) similarity on the final dimension and returns a tensor with one less dimension, such as(batch_size, length_1, length_2)
.
-
-
class
allennlp.modules.similarity_functions.bilinear.
BilinearSimilarity
(tensor_1_dim: int, tensor_2_dim: int, activation: allennlp.nn.activations.Activation = None)[source]¶ Bases:
allennlp.modules.similarity_functions.similarity_function.SimilarityFunction
This similarity function performs a bilinear transformation of the two input vectors. This function has a matrix of weights
W
and a biasb
, and the similarity between two vectorsx
andy
is computed asx^T W y + b
.- Parameters
- tensor_1_dim
int
The dimension of the first tensor,
x
, described above. This isx.size()[-1]
- the length of the vector that will go into the similarity computation. We need this so we can build the weight matrix correctly.- tensor_2_dim
int
The dimension of the second tensor,
y
, described above. This isy.size()[-1]
- the length of the vector that will go into the similarity computation. We need this so we can build the weight matrix correctly.- activation
Activation
, optional (default=linear (i.e. no activation)) An activation function applied after the
x^T W y + b
calculation. Default is no activation.
- tensor_1_dim
-
forward
(self, tensor_1: torch.Tensor, tensor_2: torch.Tensor) → torch.Tensor[source]¶ Takes two tensors of the same shape, such as
(batch_size, length_1, length_2, embedding_dim)
. Computes a (possibly parameterized) similarity on the final dimension and returns a tensor with one less dimension, such as(batch_size, length_1, length_2)
.
-
class
allennlp.modules.similarity_functions.cosine.
CosineSimilarity
[source]¶ Bases:
allennlp.modules.similarity_functions.similarity_function.SimilarityFunction
This similarity function simply computes the cosine similarity between each pair of vectors. It has no parameters.
-
forward
(self, tensor_1: torch.Tensor, tensor_2: torch.Tensor) → torch.Tensor[source]¶ Takes two tensors of the same shape, such as
(batch_size, length_1, length_2, embedding_dim)
. Computes a (possibly parameterized) similarity on the final dimension and returns a tensor with one less dimension, such as(batch_size, length_1, length_2)
.
-
-
class
allennlp.modules.similarity_functions.dot_product.
DotProductSimilarity
(scale_output: bool = False)[source]¶ Bases:
allennlp.modules.similarity_functions.similarity_function.SimilarityFunction
This similarity function simply computes the dot product between each pair of vectors, with an optional scaling to reduce the variance of the output elements.
- Parameters
- scale_output
bool
, optional If
True
, we will scale the output bymath.sqrt(tensor.size(-1))
, to reduce the variance in the result.
- scale_output
-
forward
(self, tensor_1: torch.Tensor, tensor_2: torch.Tensor) → torch.Tensor[source]¶ Takes two tensors of the same shape, such as
(batch_size, length_1, length_2, embedding_dim)
. Computes a (possibly parameterized) similarity on the final dimension and returns a tensor with one less dimension, such as(batch_size, length_1, length_2)
.
-
class
allennlp.modules.similarity_functions.linear.
LinearSimilarity
(tensor_1_dim: int, tensor_2_dim: int, combination: str = 'x, y', activation: allennlp.nn.activations.Activation = None)[source]¶ Bases:
allennlp.modules.similarity_functions.similarity_function.SimilarityFunction
This similarity function performs a dot product between a vector of weights and some combination of the two input vectors, followed by an (optional) activation function. The combination used is configurable.
If the two vectors are
x
andy
, we allow the following kinds of combinations:x
,y
,x*y
,x+y
,x-y
,x/y
, where each of those binary operations is performed elementwise. You can list as many combinations as you want, comma separated. For example, you might givex,y,x*y
as thecombination
parameter to this class. The computed similarity function would then bew^T [x; y; x*y] + b
, wherew
is a vector of weights,b
is a bias parameter, and[;]
is vector concatenation.Note that if you want a bilinear similarity function with a diagonal weight matrix W, where the similarity function is computed as x * w * y + b (with w the diagonal of W), you can accomplish that with this class by using “x*y” for combination.
- Parameters
- tensor_1_dim
int
The dimension of the first tensor,
x
, described above. This isx.size()[-1]
- the length of the vector that will go into the similarity computation. We need this so we can build weight vectors correctly.- tensor_2_dim
int
The dimension of the second tensor,
y
, described above. This isy.size()[-1]
- the length of the vector that will go into the similarity computation. We need this so we can build weight vectors correctly.- combination
str
, optional (default=”x,y”) Described above.
- activation
Activation
, optional (default=linear (i.e. no activation)) An activation function applied after the
w^T * [x;y] + b
calculation. Default is no activation.
- tensor_1_dim
-
forward
(self, tensor_1: torch.Tensor, tensor_2: torch.Tensor) → torch.Tensor[source]¶ Takes two tensors of the same shape, such as
(batch_size, length_1, length_2, embedding_dim)
. Computes a (possibly parameterized) similarity on the final dimension and returns a tensor with one less dimension, such as(batch_size, length_1, length_2)
.
-
class
allennlp.modules.similarity_functions.multiheaded.
MultiHeadedSimilarity
(num_heads: int, tensor_1_dim: int, tensor_1_projected_dim: int = None, tensor_2_dim: int = None, tensor_2_projected_dim: int = None, internal_similarity: allennlp.modules.similarity_functions.similarity_function.SimilarityFunction = DotProductSimilarity())[source]¶ Bases:
allennlp.modules.similarity_functions.similarity_function.SimilarityFunction
This similarity function uses multiple “heads” to compute similarity. That is, we take the input tensors and project them into a number of new tensors, and compute similarities on each of the projected tensors individually. The result here has one more dimension than a typical similarity function.
For example, say we have two input tensors, both of shape
(batch_size, sequence_length, 100)
, and that we want 5 similarity heads. We’ll project these tensors with a100x100
matrix, then split the resultant tensors to have shape(batch_size, sequence_length, 5, 20)
. Then we call a wrapped similarity function on the result (by default just a dot product), giving a tensor of shape(batch_size, sequence_length, 5)
.- Parameters
- num_heads
int
The number of similarity heads to compute.
- tensor_1_dim
int
The dimension of the first tensor described above. This is
tensor.size()[-1]
- the length of the vector before the multi-headed projection. We need this so we can build the weight matrix correctly.- tensor_1_projected_dim
int
, optional The dimension of the first tensor after the multi-headed projection, before we split into multiple heads. This number must be divisible evenly by
num_heads
. If not given, we default totensor_1_dim
.- tensor_2_dim
int
, optional The dimension of the second tensor described above. This is
tensor.size()[-1]
- the length of the vector before the multi-headed projection. We need this so we can build the weight matrix correctly. If not given, we default totensor_1_dim
.- tensor_2_projected_dim
int
, optional The dimension of the second tensor after the multi-headed projection, before we split into multiple heads. This number must be divisible evenly by
num_heads
. If not given, we default totensor_2_dim
.- internal_similarity
SimilarityFunction
, optional The
SimilarityFunction
to call on the projected, multi-headed tensors. The default is to use a dot product.
- num_heads
-
forward
(self, tensor_1: torch.Tensor, tensor_2: torch.Tensor) → torch.Tensor[source]¶ Takes two tensors of the same shape, such as
(batch_size, length_1, length_2, embedding_dim)
. Computes a (possibly parameterized) similarity on the final dimension and returns a tensor with one less dimension, such as(batch_size, length_1, length_2)
.