Skip to content

pretrained_transformer_mismatched_embedder

allennlp.modules.token_embedders.pretrained_transformer_mismatched_embedder

[SOURCE]


PretrainedTransformerMismatchedEmbedder

@TokenEmbedder.register("pretrained_transformer_mismatched")
class PretrainedTransformerMismatchedEmbedder(TokenEmbedder):
 | def __init__(
 |     self,
 |     model_name: str,
 |     max_length: int = None,
 |     train_parameters: bool = True,
 |     last_layer_only: bool = True,
 |     gradient_checkpointing: Optional[bool] = None,
 |     tokenizer_kwargs: Optional[Dict[str, Any]] = None,
 |     transformer_kwargs: Optional[Dict[str, Any]] = None,
 |     sub_token_mode: Optional[str] = "avg"
 | ) -> None

Use this embedder to embed wordpieces given by PretrainedTransformerMismatchedIndexer and to get word-level representations.

Registered as a TokenEmbedder with name "pretrained_transformer_mismatched".

Parameters

  • model_name : str
    The name of the transformers model to use. Should be the same as the corresponding PretrainedTransformerMismatchedIndexer.
  • max_length : int, optional (default = None)
    If positive, folds input token IDs into multiple segments of this length, pass them through the transformer model independently, and concatenate the final representations. Should be set to the same value as the max_length option on the PretrainedTransformerMismatchedIndexer.
  • train_parameters : bool, optional (default = True)
    If this is True, the transformer weights get updated during training.
  • last_layer_only : bool, optional (default = True)
    When True (the default), only the final layer of the pretrained transformer is taken for the embeddings. But if set to False, a scalar mix of all of the layers is used.
  • gradient_checkpointing : bool, optional (default = None)
    Enable or disable gradient checkpointing.
  • tokenizer_kwargs : Dict[str, Any], optional (default = None)
    Dictionary with additional arguments for AutoTokenizer.from_pretrained.
  • transformer_kwargs : Dict[str, Any], optional (default = None)
    Dictionary with additional arguments for AutoModel.from_pretrained.
  • sub_token_mode : Optional[str], optional (default = avg)
    If sub_token_mode is set to first, return first sub-token representation as word-level representation If sub_token_mode is set to avg, return average of all the sub-tokens representation as word-level representation If sub_token_mode is not specified it defaults to avg If invalid sub_token_mode is provided, throw ConfigurationError

get_output_dim

class PretrainedTransformerMismatchedEmbedder(TokenEmbedder):
 | ...
 | @overrides
 | def get_output_dim(self)

forward

class PretrainedTransformerMismatchedEmbedder(TokenEmbedder):
 | ...
 | @overrides
 | def forward(
 |     self,
 |     token_ids: torch.LongTensor,
 |     mask: torch.BoolTensor,
 |     offsets: torch.LongTensor,
 |     wordpiece_mask: torch.BoolTensor,
 |     type_ids: Optional[torch.LongTensor] = None,
 |     segment_concat_mask: Optional[torch.BoolTensor] = None
 | ) -> torch.Tensor

Parameters

  • token_ids : torch.LongTensor
    Shape: [batch_size, num_wordpieces] (for exception see PretrainedTransformerEmbedder).
  • mask : torch.BoolTensor
    Shape: [batch_size, num_orig_tokens].
  • offsets : torch.LongTensor
    Shape: [batch_size, num_orig_tokens, 2]. Maps indices for the original tokens, i.e. those given as input to the indexer, to a span in token_ids. token_ids[i][offsets[i][j][0]:offsets[i][j][1] + 1] corresponds to the original j-th token from the i-th batch.
  • wordpiece_mask : torch.BoolTensor
    Shape: [batch_size, num_wordpieces].
  • type_ids : Optional[torch.LongTensor]
    Shape: [batch_size, num_wordpieces].
  • segment_concat_mask : Optional[torch.BoolTensor]
    See PretrainedTransformerEmbedder.

Returns

  • torch.Tensor
    Shape: [batch_size, num_orig_tokens, embedding_size].