pretrained_transformer_embedder
allennlp.modules.token_embedders.pretrained_transformer_embedder
PretrainedTransformerEmbedder¶
@TokenEmbedder.register("pretrained_transformer")
class PretrainedTransformerEmbedder(TokenEmbedder):
| def __init__(
| self,
| model_name: str,
| *, max_length: int = None,
| *, sub_module: str = None,
| *, train_parameters: bool = True,
| *, eval_mode: bool = False,
| *, last_layer_only: bool = True,
| *, override_weights_file: Optional[str] = None,
| *, override_weights_strip_prefix: Optional[str] = None,
| *, reinit_modules: Optional[Union[int, Tuple[int, ...], Tuple[str, ...]]] = None,
| *, load_weights: bool = True,
| *, gradient_checkpointing: Optional[bool] = None,
| *, tokenizer_kwargs: Optional[Dict[str, Any]] = None,
| *, transformer_kwargs: Optional[Dict[str, Any]] = None
| ) -> None
Uses a pretrained model from transformers
as a TokenEmbedder
.
Registered as a TokenEmbedder
with name "pretrained_transformer".
Parameters¶
- model_name :
str
The name of thetransformers
model to use. Should be the same as the correspondingPretrainedTransformerIndexer
. - max_length :
int
, optional (default =None
)
If positive, folds input token IDs into multiple segments of this length, pass them through the transformer model independently, and concatenate the final representations. Should be set to the same value as themax_length
option on thePretrainedTransformerIndexer
. - sub_module :
str
, optional (default =None
)
The name of a submodule of the transformer to be used as the embedder. Some transformers naturally act as embedders such as BERT. However, other models consist of encoder and decoder, in which case we just want to use the encoder. - train_parameters :
bool
, optional (default =True
)
If this isTrue
, the transformer weights get updated during training. If this isFalse
, the transformer weights are not updated during training. - eval_mode :
bool
, optional (default =False
)
If this isTrue
, the model is always set to evaluation mode (e.g., the dropout is disabled and the batch normalization layer statistics are not updated). If this isFalse
, such dropout and batch normalization layers are only set to evaluation mode when when the model is evaluating on development or test data. - last_layer_only :
bool
, optional (default =True
)
WhenTrue
(the default), only the final layer of the pretrained transformer is taken for the embeddings. But if set toFalse
, a scalar mix of all of the layers is used. - override_weights_file :
Optional[str]
, optional (default =None
)
If set, this specifies a file from which to load alternate weights that override the weights from huggingface. The file is expected to contain a PyTorchstate_dict
, created withtorch.save()
. - override_weights_strip_prefix :
Optional[str]
, optional (default =None
)
If set, strip the given prefix from the state dict when loading it. - reinit_modules :
Optional[Union[int, Tuple[int, ...], Tuple[str, ...]]]
, optional (default =None
)
If this is an integer, the lastreinit_modules
layers of the transformer will be re-initialized. If this is a tuple of integers, the layers indexed byreinit_modules
will be re-initialized. Note, because the module structure of the transformermodel_name
can differ, we cannot guarantee that providing an integer or tuple of integers will work. If this fails, you can instead provide a tuple of strings, which will be treated as regexes and any module with a name matching the regex will be re-initialized. Re-initializing the last few layers of a pretrained transformer can reduce the instability of fine-tuning on small datasets and may improve performance (https://arxiv.org/abs/2006.05987v3). Has no effect ifload_weights
isFalse
oroverride_weights_file
is notNone
. - load_weights :
bool
, optional (default =True
)
Whether to load the pretrained weights. If you're loading your model/predictor from an AllenNLP archive it usually makes sense to set this toFalse
(via theoverrides
parameter) to avoid unnecessarily caching and loading the original pretrained weights, since the archive will already contain all of the weights needed. - gradient_checkpointing :
bool
, optional (default =None
)
Enable or disable gradient checkpointing. - tokenizer_kwargs :
Dict[str, Any]
, optional (default =None
)
Dictionary with additional arguments forAutoTokenizer.from_pretrained
. - transformer_kwargs :
Dict[str, Any]
, optional (default =None
)
Dictionary with additional arguments forAutoModel.from_pretrained
.
authorized_missing_keys¶
class PretrainedTransformerEmbedder(TokenEmbedder):
| ...
| authorized_missing_keys = [r"position_ids$"]
train¶
class PretrainedTransformerEmbedder(TokenEmbedder):
| ...
| def train(self, mode: bool = True)
get_output_dim¶
class PretrainedTransformerEmbedder(TokenEmbedder):
| ...
| def get_output_dim(self)
forward¶
class PretrainedTransformerEmbedder(TokenEmbedder):
| ...
| def forward(
| self,
| token_ids: torch.LongTensor,
| mask: torch.BoolTensor,
| type_ids: Optional[torch.LongTensor] = None,
| segment_concat_mask: Optional[torch.BoolTensor] = None
| ) -> torch.Tensor
Parameters¶
- token_ids :
torch.LongTensor
Shape:[batch_size, num_wordpieces if max_length is None else num_segment_concat_wordpieces]
. num_segment_concat_wordpieces is num_wordpieces plus special tokens inserted in the middle, e.g. the length of: "[CLS] A B C [SEP] [CLS] D E F [SEP]" (see indexer logic). - mask :
torch.BoolTensor
Shape: [batch_size, num_wordpieces]. - type_ids :
Optional[torch.LongTensor]
Shape:[batch_size, num_wordpieces if max_length is None else num_segment_concat_wordpieces]
. - segment_concat_mask :
Optional[torch.BoolTensor]
Shape:[batch_size, num_segment_concat_wordpieces]
.
Returns¶
torch.Tensor
Shape:[batch_size, num_wordpieces, embedding_size]
.