Skip to content

transformer_embeddings

allennlp.modules.transformer.transformer_embeddings

[SOURCE]


Embeddings

class Embeddings(TransformerModule,  FromParams):
 | def __init__(
 |     self,
 |     embeddings: torch.nn.ModuleDict,
 |     embedding_size: int,
 |     dropout: float,
 |     layer_norm_eps: float = 1e-12
 | )

General class for embeddings for any modality.

Parameters

  • embeddings : torch.nn.ModuleDict
    Named embedding layers. Eg. "word_embeddings", "position_embeddings", etc. All the embedding layers are expected to have different inputs; the output of one will not be passed to the other. All the layers should have the same embedding_dim/out_features.
  • embedding_size : int
    The embedding_dim of all the embedding layers.
  • dropout : float
    The probability of an element to be zeroed.

forward

class Embeddings(TransformerModule,  FromParams):
 | ...
 | def forward(self, *inputs) -> torch.Tensor

ImageFeatureEmbeddings

class ImageFeatureEmbeddings(Embeddings):
 | def __init__(
 |     self,
 |     feature_size: int,
 |     embedding_size: int,
 |     dropout: float = 0.0
 | )

Embedding module for image features.

Parameters

  • feature_size : int
    Number of image features.
  • embedding_size : int
    The embedding_dim of all the embedding layers.
  • dropout : float, optional (default = 0.0)
    The probability of an element to be zeroed.

TransformerEmbeddings

class TransformerEmbeddings(Embeddings):
 | def __init__(
 |     self,
 |     vocab_size: int,
 |     embedding_size: int,
 |     pad_token_id: int = 0,
 |     max_position_embeddings: int = 512,
 |     position_pad_token_id: Optional[int] = None,
 |     type_vocab_size: int = 2,
 |     dropout: float = 0.1,
 |     layer_norm_eps: float = 1e-12,
 |     output_size: Optional[int] = None
 | )

Construct the embeddings from word, position and token_type embeddings. Details in the paper: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Devlin et al, 2019

Parameters

  • vocab_size : int
    The size of the input vocab.
  • embedding_size : int
    The embedding_dim of all the embedding layers.
  • pad_token_id : int, optional (default = 0)
    The token id of the <pad> token.
  • max_position_embeddings : int, optional (default = 512)
    The maximum number of positions.
  • type_vocab_size : int, optional (default = 2)
    The size of the input token_type vocab.
  • dropout : int, optional (default = 0.1)
    The probability of an element to be zeroed.
  • output_size : int, optional (default = None)
    Optionally apply a linear transform after the dropout, projecting to output_size.

forward

class TransformerEmbeddings(Embeddings):
 | ...
 | def forward(
 |     self,
 |     input_ids: torch.Tensor,
 |     token_type_ids: Optional[torch.Tensor] = None,
 |     position_ids: Optional[torch.Tensor] = None,
 |     attention_mask: Optional[torch.Tensor] = None
 | ) -> torch.Tensor

Parameters

  • input_ids : torch.Tensor
    Shape batch_size x seq_len
  • attention_mask : torch.Tensor
    Shape batch_size x seq_len. This parameter is ignored, but it is here for compatibility.
  • token_type_ids : torch.Tensor, optional
    Shape batch_size x seq_len
  • position_ids : torch.Tensor, optional
    Shape batch_size x seq_len