allennlp.modules.seq2vec_encoders

Modules that transform a sequence of input vectors into a single output vector. Some are just basic wrappers around existing PyTorch modules, others are AllenNLP modules.

The available Seq2Vec encoders are

class allennlp.modules.seq2vec_encoders.bert_pooler.BertPooler(pretrained_model: Union[str, pytorch_pretrained_bert.modeling.BertModel], requires_grad: bool = True, dropout: float = 0.0)[source]

Bases: allennlp.modules.seq2vec_encoders.seq2vec_encoder.Seq2VecEncoder

The pooling layer at the end of the BERT model. This returns an embedding for the [CLS] token, after passing it through a non-linear tanh activation; the non-linear layer is also part of the BERT model. If you want to use the pretrained BERT model to build a classifier and you want to use the AllenNLP token-indexer -> token-embedder -> seq2vec encoder setup, this is the Seq2VecEncoder to use. (For example, if you want to experiment with other embedding / encoding combinations.)

If you just want to train a BERT classifier, it’s simpler to just use the BertForClassification model.

Parameters
pretrained_modelUnion[str, BertModel]

The pretrained BERT model to use. If this is a string, we will call BertModel.from_pretrained(pretrained_model) and use that.

requires_gradbool, optional, (default = True)

If True, the weights of the pooler will be updated during training. Otherwise they will not.

dropoutfloat, optional, (default = 0.0)

Amount of dropout to apply after pooling

forward(self, tokens: torch.Tensor, mask: torch.Tensor = None)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

get_input_dim(self) → int[source]

Returns the dimension of the vector input for each element in the sequence input to a Seq2VecEncoder. This is not the shape of the input tensor, but the last element of that shape.

get_output_dim(self) → int[source]

Returns the dimension of the final vector output by this Seq2VecEncoder. This is not the shape of the returned tensor, but the last element of that shape.

class allennlp.modules.seq2vec_encoders.cnn_encoder.CnnEncoder(embedding_dim: int, num_filters: int, ngram_filter_sizes: Tuple[int, ...] = (2, 3, 4, 5), conv_layer_activation: allennlp.nn.activations.Activation = None, output_dim: Optional[int] = None)[source]

Bases: allennlp.modules.seq2vec_encoders.seq2vec_encoder.Seq2VecEncoder

A CnnEncoder is a combination of multiple convolution layers and max pooling layers. As a Seq2VecEncoder, the input to this module is of shape (batch_size, num_tokens, input_dim), and the output is of shape (batch_size, output_dim).

The CNN has one convolution layer for each ngram filter size. Each convolution operation gives out a vector of size num_filters. The number of times a convolution layer will be used is num_tokens - ngram_size + 1. The corresponding maxpooling layer aggregates all these outputs from the convolution layer and outputs the max.

This operation is repeated for every ngram size passed, and consequently the dimensionality of the output after maxpooling is len(ngram_filter_sizes) * num_filters. This then gets (optionally) projected down to a lower dimensional output, specified by output_dim.

We then use a fully connected layer to project in back to the desired output_dim. For more details, refer to “A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification”, Zhang and Wallace 2016, particularly Figure 1.

Parameters
embedding_dimint

This is the input dimension to the encoder. We need this because we can’t do shape inference in pytorch, and we need to know what size filters to construct in the CNN.

num_filters: ``int``

This is the output dim for each convolutional layer, which is the number of “filters” learned by that layer.

ngram_filter_sizes: ``Tuple[int]``, optional (default=``(2, 3, 4, 5)``)

This specifies both the number of convolutional layers we will create and their sizes. The default of (2, 3, 4, 5) will have four convolutional layers, corresponding to encoding ngrams of size 2 to 5 with some number of filters.

conv_layer_activation: ``Activation``, optional (default=``torch.nn.ReLU``)

Activation to use after the convolution layers.

output_dimOptional[int], optional (default=``None``)

After doing convolutions and pooling, we’ll project the collected features into a vector of this size. If this value is None, we will just return the result of the max pooling, giving an output of shape len(ngram_filter_sizes) * num_filters.

forward(self, tokens: torch.Tensor, mask: torch.Tensor)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

get_input_dim(self) → int[source]

Returns the dimension of the vector input for each element in the sequence input to a Seq2VecEncoder. This is not the shape of the input tensor, but the last element of that shape.

get_output_dim(self) → int[source]

Returns the dimension of the final vector output by this Seq2VecEncoder. This is not the shape of the returned tensor, but the last element of that shape.

class allennlp.modules.seq2vec_encoders.pytorch_seq2vec_wrapper.PytorchSeq2VecWrapper(module: torch.nn.modules.rnn.RNNBase)[source]

Bases: allennlp.modules.seq2vec_encoders.seq2vec_encoder.Seq2VecEncoder

Pytorch’s RNNs have two outputs: the hidden state for every time step, and the hidden state at the last time step for every layer. We just want the second one as a single output. This wrapper pulls out that output, and adds a get_output_dim() method, which is useful if you want to, e.g., define a linear + softmax layer on top of this to get some distribution over a set of labels. The linear layer needs to know its input dimension before it is called, and you can get that from get_output_dim.

Also, there are lots of ways you could imagine going from an RNN hidden state at every timestep to a single vector - you could take the last vector at all layers in the stack, do some kind of pooling, take the last vector of the top layer in a stack, or many other options. We just take the final hidden state vector, or in the case of a bidirectional RNN cell, we concatenate the forward and backward final states together. TODO(mattg): allow for other ways of wrapping RNNs.

In order to be wrapped with this wrapper, a class must have the following members:

  • self.input_size: int

  • self.hidden_size: int

  • def forward(inputs: PackedSequence, hidden_state: torch.tensor) -> Tuple[PackedSequence, torch.Tensor].

  • self.bidirectional: bool (optional)

This is what pytorch’s RNN’s look like - just make sure your class looks like those, and it should work.

Note that we require you to pass sequence lengths when you call this module, to avoid subtle bugs around masking. If you already have a PackedSequence you can pass None as the second parameter.

forward(self, inputs: torch.Tensor, mask: torch.Tensor, hidden_state: torch.Tensor = None) → torch.Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

get_input_dim(self) → int[source]

Returns the dimension of the vector input for each element in the sequence input to a Seq2VecEncoder. This is not the shape of the input tensor, but the last element of that shape.

get_output_dim(self) → int[source]

Returns the dimension of the final vector output by this Seq2VecEncoder. This is not the shape of the returned tensor, but the last element of that shape.

class allennlp.modules.seq2vec_encoders.seq2vec_encoder.Seq2VecEncoder(stateful: bool = False)[source]

Bases: allennlp.modules.encoder_base._EncoderBase, allennlp.common.registrable.Registrable

A Seq2VecEncoder is a Module that takes as input a sequence of vectors and returns a single vector. Input shape: (batch_size, sequence_length, input_dim); output shape: (batch_size, output_dim).

We add two methods to the basic Module API: get_input_dim() and get_output_dim(). You might need this if you want to construct a Linear layer using the output of this encoder, or to raise sensible errors for mis-matching input dimensions.

get_input_dim(self) → int[source]

Returns the dimension of the vector input for each element in the sequence input to a Seq2VecEncoder. This is not the shape of the input tensor, but the last element of that shape.

get_output_dim(self) → int[source]

Returns the dimension of the final vector output by this Seq2VecEncoder. This is not the shape of the returned tensor, but the last element of that shape.

class allennlp.modules.seq2vec_encoders.boe_encoder.BagOfEmbeddingsEncoder(embedding_dim: int, averaged: bool = False)[source]

Bases: allennlp.modules.seq2vec_encoders.seq2vec_encoder.Seq2VecEncoder

A BagOfEmbeddingsEncoder is a simple Seq2VecEncoder which simply sums the embeddings of a sequence across the time dimension. The input to this module is of shape (batch_size, num_tokens, embedding_dim), and the output is of shape (batch_size, embedding_dim).

Parameters
embedding_dim: ``int``

This is the input dimension to the encoder.

averaged: ``bool``, optional (default=``False``)

If True, this module will average the embeddings across time, rather than simply summing (ie. we will divide the summed embeddings by the length of the sentence).

forward(self, tokens: torch.Tensor, mask: torch.Tensor = None)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

get_input_dim(self) → int[source]

Returns the dimension of the vector input for each element in the sequence input to a Seq2VecEncoder. This is not the shape of the input tensor, but the last element of that shape.

get_output_dim(self) → int[source]

Returns the dimension of the final vector output by this Seq2VecEncoder. This is not the shape of the returned tensor, but the last element of that shape.

class allennlp.modules.seq2vec_encoders.cnn_highway_encoder.CnnHighwayEncoder(embedding_dim: int, filters: Sequence[Sequence[int]], num_highway: int, projection_dim: int, activation: str = 'relu', projection_location: str = 'after_highway', do_layer_norm: bool = False)[source]

Bases: allennlp.modules.seq2vec_encoders.seq2vec_encoder.Seq2VecEncoder

The character CNN + highway encoder from Kim et al “Character aware neural language models” https://arxiv.org/abs/1508.06615 with an optional projection.

Parameters
embedding_dim: int

The dimension of the initial character embedding.

filters: ``Sequence[Sequence[int]]``

A sequence of pairs (filter_width, num_filters).

num_highway: int

The number of highway layers.

projection_dim: int

The output dimension of the projection layer.

activation: str, optional (default = ‘relu’)

The activation function for the convolutional layers.

projection_location: str, optional (default = ‘after_highway’)

Where to apply the projection layer. Valid values are ‘after_highway’, ‘after_cnn’, and None.

forward(self, inputs: torch.Tensor, mask: torch.Tensor) → Dict[str, torch.Tensor][source]

Compute context insensitive token embeddings for ELMo representations.

Parameters
inputs:

Shape (batch_size, num_characters, embedding_dim) Character embeddings representing the current batch.

mask:

Shape (batch_size, num_characters) Currently unused. The mask for characters is implicit. See TokenCharactersEncoder.forward.

Returns
encoding:

Shape (batch_size, projection_dim) tensor with context-insensitive token representations.

get_input_dim(self) → int[source]

Returns the dimension of the vector input for each element in the sequence input to a Seq2VecEncoder. This is not the shape of the input tensor, but the last element of that shape.

get_output_dim(self) → int[source]

Returns the dimension of the final vector output by this Seq2VecEncoder. This is not the shape of the returned tensor, but the last element of that shape.