pytorch_transformer_wrapper
allennlp.modules.seq2seq_encoders.pytorch_transformer_wrapper
PytorchTransformer¶
@Seq2SeqEncoder.register("pytorch_transformer")
class PytorchTransformer(Seq2SeqEncoder):
| def __init__(
| self,
| input_dim: int,
| num_layers: int,
| feedforward_hidden_dim: int = 2048,
| num_attention_heads: int = 8,
| positional_encoding: Optional[str] = None,
| positional_embedding_size: int = 512,
| dropout_prob: float = 0.1,
| activation: str = "relu"
| ) -> None
Implements a stacked self-attention encoder similar to the Transformer architecture in Attention is all you Need.
This class adapts the Transformer from torch.nn for use in AllenNLP. Optionally, it adds positional encodings.
Registered as a Seq2SeqEncoder
with name "pytorch_transformer".
Parameters¶
- input_dim :
int
The input dimension of the encoder. - num_layers :
int
The number of stacked self attention -> feedforward -> layer normalisation blocks. - feedforward_hidden_dim :
int
The middle dimension of the FeedForward network. The input and output dimensions are fixed to ensure sizes match up for the self attention layers. - num_attention_heads :
int
The number of attention heads to use per layer. - positional_encoding :
str
, optional (default =None
)
Specifies the type of positional encodings to use. Your options areNone
to have no positional encodings."sinusoidal"
to have sinusoidal encodings, as described in https://api.semanticscholar.org/CorpusID:13756489."embedding"
to treat positional encodings as learnable parameters Without positional encoding, the self attention layers have no idea of absolute or relative position (as they are just computing pairwise similarity between vectors of elements), which can be important features for many tasks.
- positional_embedding_size :
int
, optional (default =512
)
The number of positional embeddings. - dropout_prob :
float
, optional (default =0.1
)
The dropout probability for the feedforward network. - activation :
str
, optional (default ="relu"
)
The activation function of intermediate layers. Must be either"relu"
or"gelu"
.
get_input_dim¶
class PytorchTransformer(Seq2SeqEncoder):
| ...
| def get_input_dim(self) -> int
get_output_dim¶
class PytorchTransformer(Seq2SeqEncoder):
| ...
| def get_output_dim(self) -> int
is_bidirectional¶
class PytorchTransformer(Seq2SeqEncoder):
| ...
| def is_bidirectional(self)
forward¶
class PytorchTransformer(Seq2SeqEncoder):
| ...
| def forward(self, inputs: torch.Tensor, mask: torch.BoolTensor)