transformer_embeddings
allennlp.modules.transformer.transformer_embeddings
Embeddings¶
class Embeddings(TransformerModule, FromParams):
| def __init__(
| self,
| embeddings: torch.nn.ModuleDict,
| embedding_size: int,
| dropout: float,
| layer_norm_eps: float = 1e-12
| )
General class for embeddings for any modality.
Parameters¶
- embeddings :
torch.nn.ModuleDict
Named embedding layers. Eg."word_embeddings"
,"position_embeddings"
, etc. All the embedding layers are expected to have different inputs; the output of one will not be passed to the other. All the layers should have the sameembedding_dim
/out_features
. - embedding_size :
int
Theembedding_dim
of all the embedding layers. - dropout :
float
The probability of an element to be zeroed.
forward¶
class Embeddings(TransformerModule, FromParams):
| ...
| def forward(self, *inputs) -> torch.Tensor
ImageFeatureEmbeddings¶
class ImageFeatureEmbeddings(Embeddings):
| def __init__(
| self,
| feature_size: int,
| embedding_size: int,
| dropout: float = 0.0
| )
Embedding module for image features.
Parameters¶
- feature_size :
int
Number of image features. - embedding_size :
int
Theembedding_dim
of all the embedding layers. - dropout :
float
, optional (default =0.0
)
The probability of an element to be zeroed.
TransformerEmbeddings¶
class TransformerEmbeddings(Embeddings):
| def __init__(
| self,
| vocab_size: int,
| embedding_size: int,
| pad_token_id: int = 0,
| max_position_embeddings: int = 512,
| position_pad_token_id: Optional[int] = None,
| type_vocab_size: int = 2,
| dropout: float = 0.1,
| layer_norm_eps: float = 1e-12,
| output_size: Optional[int] = None
| )
Construct the embeddings from word, position and token_type embeddings. Details in the paper: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Devlin et al, 2019
Parameters¶
- vocab_size :
int
The size of the input vocab. - embedding_size :
int
Theembedding_dim
of all the embedding layers. - pad_token_id :
int
, optional (default =0
)
The token id of the<pad>
token. - max_position_embeddings :
int
, optional (default =512
)
The maximum number of positions. - type_vocab_size :
int
, optional (default =2
)
The size of the input token_type vocab. - dropout :
int
, optional (default =0.1
)
The probability of an element to be zeroed. - output_size :
int
, optional (default =None
)
Optionally apply a linear transform after the dropout, projecting tooutput_size
.
forward¶
class TransformerEmbeddings(Embeddings):
| ...
| def forward(
| self,
| input_ids: torch.Tensor,
| token_type_ids: Optional[torch.Tensor] = None,
| position_ids: Optional[torch.Tensor] = None,
| attention_mask: Optional[torch.Tensor] = None
| ) -> torch.Tensor
Parameters¶
- input_ids :
torch.Tensor
Shapebatch_size x seq_len
- attention_mask :
torch.Tensor
Shapebatch_size x seq_len
. This parameter is ignored, but it is here for compatibility. - token_type_ids :
torch.Tensor
, optional
Shapebatch_size x seq_len
- position_ids :
torch.Tensor
, optional
Shapebatch_size x seq_len