PytorchTransformer( self, input_dim: int, num_layers: int, feedforward_hidden_dim: int = 2048, num_attention_heads: int = 8, positional_encoding: Optional[str] = None, positional_embedding_size: int = 512, dropout_prob: float = 0.1, activation: str = 'relu', ) -> None
Implements a stacked self-attention encoder similar to the Transformer architecture in [Attention is all you Need] (https://www.semanticscholar.org/paper/Attention-Is-All-You-Need-Vaswani-Shazeer/0737da0767d77606169cbf4187b83e1ab62f6077).
This class adapts the Transformer from torch.nn for use in AllenNLP. Optionally, it adds positional encodings.
Registered as a
Seq2SeqEncoder with name "pytorch_transformer".
- input_dim :
int, required. The input dimension of the encoder.
- feedforward_hidden_dim :
int, required. The middle dimension of the FeedForward network. The input and output dimensions are fixed to ensure sizes match up for the self attention layers.
- num_layers :
int, required. The number of stacked self attention -> feedforward -> layer normalisation blocks.
- num_attention_heads :
int, required. The number of attention heads to use per layer.
- use_positional_encoding :
bool, optional, (default = True) Whether to add sinusoidal frequencies to the input tensor. This is strongly recommended, as without this feature, the self attention layers have no idea of absolute or relative position (as they are just computing pairwise similarity between vectors of elements), which can be important features for many tasks.
- dropout_prob :
float, optional, (default = 0.1) The dropout probability for the feedforward network.
PytorchTransformer.forward(self, inputs:torch.Tensor, mask:torch.BoolTensor)
Defines the computation performed at every call.
Should be overridden by all subclasses.
Although the recipe for forward pass needs to be defined within
this function, one should call the :class:
Module instance afterwards
instead of this since the former takes care of running the
registered hooks while the latter silently ignores them.
PytorchTransformer.get_input_dim(self) -> int
Returns the dimension of the vector input for each element in the sequence input
Seq2SeqEncoder. This is
not the shape of the input tensor, but the
last element of that shape.
PytorchTransformer.get_output_dim(self) -> int
Returns the dimension of each vector in the sequence output by this
not the shape of the returned tensor, but the last element of that shape.
True if this encoder is bidirectional. If so, we assume the forward direction
of the encoder is the first half of the final dimension, and the backward direction is the