allennlp.modules.similarity_functions¶

A SimilarityFunction takes a pair of tensors with the same shape, and computes a similarity function on the vectors in the last dimension.

SimilarityFunction
BilinearSimilarity
CosineSimilarity
DotProductSimilarity
LinearSimilarity
MultiHeadedSimilarity

class allennlp.modules.similarity_functions.similarity_function.SimilarityFunction[source]¶

Bases: torch.nn.modules.module.Module, allennlp.common.registrable.Registrable

A SimilarityFunction takes a pair of tensors with the same shape, and computes a similarity function on the vectors in the last dimension. For example, the tensors might both have shape (batch_size, sentence_length, embedding_dim), and we will compute some function of the two vectors of length embedding_dim for each position (batch_size, sentence_length), returning a tensor of shape (batch_size, sentence_length).

The similarity function could be as simple as a dot product, or it could be a more complex, parameterized function.

If you want to compute a similarity between tensors of different sizes, you need to first tile them in the appropriate dimensions to make them the same before you can use these functions. The Attention and MatrixAttention modules do this.

default_implementation: str = 'dot_product'¶

forward(self, tensor_1: torch.Tensor, tensor_2: torch.Tensor) → torch.Tensor[source]¶: Takes two tensors of the same shape, such as (batch_size, length_1, length_2, embedding_dim). Computes a (possibly parameterized) similarity on the final dimension and returns a tensor with one less dimension, such as (batch_size, length_1, length_2).

class allennlp.modules.similarity_functions.bilinear.BilinearSimilarity(tensor_1_dim: int, tensor_2_dim: int, activation: allennlp.nn.activations.Activation = None)[source]¶

Bases: allennlp.modules.similarity_functions.similarity_function.SimilarityFunction

This similarity function performs a bilinear transformation of the two input vectors. This function has a matrix of weights W and a bias b, and the similarity between two vectors x and y is computed as x^T W y + b.

Parameters

tensor_1_dimint: The dimension of the first tensor, x, described above. This is x.size()[-1] - the length of the vector that will go into the similarity computation. We need this so we can build the weight matrix correctly.
tensor_2_dimint: The dimension of the second tensor, y, described above. This is y.size()[-1] - the length of the vector that will go into the similarity computation. We need this so we can build the weight matrix correctly.
activationActivation, optional (default=linear (i.e. no activation)): An activation function applied after the x^T W y + b calculation. Default is no activation.

forward(self, tensor_1: torch.Tensor, tensor_2: torch.Tensor) → torch.Tensor[source]¶: Takes two tensors of the same shape, such as (batch_size, length_1, length_2, embedding_dim). Computes a (possibly parameterized) similarity on the final dimension and returns a tensor with one less dimension, such as (batch_size, length_1, length_2).

reset_parameters(self)[source]¶

class allennlp.modules.similarity_functions.cosine.CosineSimilarity[source]¶

Bases: allennlp.modules.similarity_functions.similarity_function.SimilarityFunction

This similarity function simply computes the cosine similarity between each pair of vectors. It has no parameters.

forward(self, tensor_1: torch.Tensor, tensor_2: torch.Tensor) → torch.Tensor[source]¶: Takes two tensors of the same shape, such as (batch_size, length_1, length_2, embedding_dim). Computes a (possibly parameterized) similarity on the final dimension and returns a tensor with one less dimension, such as (batch_size, length_1, length_2).

class allennlp.modules.similarity_functions.dot_product.DotProductSimilarity(scale_output: bool = False)[source]¶

Bases: allennlp.modules.similarity_functions.similarity_function.SimilarityFunction

This similarity function simply computes the dot product between each pair of vectors, with an optional scaling to reduce the variance of the output elements.

Parameters

scale_outputbool, optional: If True, we will scale the output by math.sqrt(tensor.size(-1)), to reduce the variance in the result.

forward(self, tensor_1: torch.Tensor, tensor_2: torch.Tensor) → torch.Tensor[source]¶: Takes two tensors of the same shape, such as (batch_size, length_1, length_2, embedding_dim). Computes a (possibly parameterized) similarity on the final dimension and returns a tensor with one less dimension, such as (batch_size, length_1, length_2).

class allennlp.modules.similarity_functions.linear.LinearSimilarity(tensor_1_dim: int, tensor_2_dim: int, combination: str = 'x, y', activation: allennlp.nn.activations.Activation = None)[source]¶

Bases: allennlp.modules.similarity_functions.similarity_function.SimilarityFunction

This similarity function performs a dot product between a vector of weights and some combination of the two input vectors, followed by an (optional) activation function. The combination used is configurable.

If the two vectors are x and y, we allow the following kinds of combinations: x, y, x*y, x+y, x-y, x/y, where each of those binary operations is performed elementwise. You can list as many combinations as you want, comma separated. For example, you might give x,y,x*y as the combination parameter to this class. The computed similarity function would then be w^T [x; y; x*y] + b, where w is a vector of weights, b is a bias parameter, and [;] is vector concatenation.

Note that if you want a bilinear similarity function with a diagonal weight matrix W, where the similarity function is computed as x * w * y + b (with w the diagonal of W), you can accomplish that with this class by using “x*y” for combination.

Parameters

tensor_1_dimint: The dimension of the first tensor, x, described above. This is x.size()[-1] - the length of the vector that will go into the similarity computation. We need this so we can build weight vectors correctly.
tensor_2_dimint: The dimension of the second tensor, y, described above. This is y.size()[-1] - the length of the vector that will go into the similarity computation. We need this so we can build weight vectors correctly.
combinationstr, optional (default=”x,y”): Described above.
activationActivation, optional (default=linear (i.e. no activation)): An activation function applied after the w^T * [x;y] + b calculation. Default is no activation.

forward(self, tensor_1: torch.Tensor, tensor_2: torch.Tensor) → torch.Tensor[source]¶: Takes two tensors of the same shape, such as (batch_size, length_1, length_2, embedding_dim). Computes a (possibly parameterized) similarity on the final dimension and returns a tensor with one less dimension, such as (batch_size, length_1, length_2).

reset_parameters(self)[source]¶

class allennlp.modules.similarity_functions.multiheaded.MultiHeadedSimilarity(num_heads: int, tensor_1_dim: int, tensor_1_projected_dim: int = None, tensor_2_dim: int = None, tensor_2_projected_dim: int = None, internal_similarity: allennlp.modules.similarity_functions.similarity_function.SimilarityFunction = DotProductSimilarity())[source]¶

Bases: allennlp.modules.similarity_functions.similarity_function.SimilarityFunction

This similarity function uses multiple “heads” to compute similarity. That is, we take the input tensors and project them into a number of new tensors, and compute similarities on each of the projected tensors individually. The result here has one more dimension than a typical similarity function.

For example, say we have two input tensors, both of shape (batch_size, sequence_length, 100), and that we want 5 similarity heads. We’ll project these tensors with a 100x100 matrix, then split the resultant tensors to have shape (batch_size, sequence_length, 5, 20). Then we call a wrapped similarity function on the result (by default just a dot product), giving a tensor of shape (batch_size, sequence_length, 5).

Parameters

num_headsint: The number of similarity heads to compute.
tensor_1_dimint: The dimension of the first tensor described above. This is tensor.size()[-1] - the length of the vector before the multi-headed projection. We need this so we can build the weight matrix correctly.
tensor_1_projected_dimint, optional: The dimension of the first tensor after the multi-headed projection, before we split into multiple heads. This number must be divisible evenly by num_heads. If not given, we default to tensor_1_dim.
tensor_2_dimint, optional: The dimension of the second tensor described above. This is tensor.size()[-1] - the length of the vector before the multi-headed projection. We need this so we can build the weight matrix correctly. If not given, we default to tensor_1_dim.
tensor_2_projected_dimint, optional: The dimension of the second tensor after the multi-headed projection, before we split into multiple heads. This number must be divisible evenly by num_heads. If not given, we default to tensor_2_dim.
internal_similaritySimilarityFunction, optional: The SimilarityFunction to call on the projected, multi-headed tensors. The default is to use a dot product.

forward(self, tensor_1: torch.Tensor, tensor_2: torch.Tensor) → torch.Tensor[source]¶: Takes two tensors of the same shape, such as (batch_size, length_1, length_2, embedding_dim). Computes a (possibly parameterized) similarity on the final dimension and returns a tensor with one less dimension, such as (batch_size, length_1, length_2).

reset_parameters(self)[source]¶