Skip to content





class PretrainedTransformerBackbone(Backbone):
 | def __init__(
 |     self,
 |     vocab: Vocabulary,
 |     model_name: str,
 |     *, max_length: int = None,
 |     *, sub_module: str = None,
 |     *, train_parameters: bool = True,
 |     *, last_layer_only: bool = True,
 |     *, override_weights_file: Optional[str] = None,
 |     *, override_weights_strip_prefix: Optional[str] = None,
 |     *, tokenizer_kwargs: Optional[Dict[str, Any]] = None,
 |     *, transformer_kwargs: Optional[Dict[str, Any]] = None,
 |     *, output_token_strings: bool = True,
 |     *, vocab_namespace: str = "tags"
 | ) -> None

Uses a pretrained model from transformers as a Backbone.

This class passes most of its arguments to a PretrainedTransformerEmbedder, which it uses to implement the underlying encoding logic (we duplicate the arguments here instead of taking an Embedder as a constructor argument just to simplify the user-facing API).

Registered as a Backbone with name "pretrained_transformer".


  • vocab : Vocabulary
    Necessary for converting input ids to strings in make_output_human_readable. If you set output_token_strings to False, or if you never call make_output_human_readable, then this will not be used and can be safely set to None.
  • model_name : str
    The name of the transformers model to use. Should be the same as the corresponding PretrainedTransformerIndexer.
  • max_length : int, optional (default = None)
    If positive, folds input token IDs into multiple segments of this length, pass them through the transformer model independently, and concatenate the final representations. Should be set to the same value as the max_length option on the PretrainedTransformerIndexer.
  • sub_module : str, optional (default = None)
    The name of a submodule of the transformer to be used as the embedder. Some transformers naturally act as embedders such as BERT. However, other models consist of encoder and decoder, in which case we just want to use the encoder.
  • train_parameters : bool, optional (default = True)
    If this is True, the transformer weights get updated during training.
  • last_layer_only : bool, optional (default = True)
    When True (the default), only the final layer of the pretrained transformer is taken for the embeddings. But if set to False, a scalar mix of all of the layers is used.
  • tokenizer_kwargs : Dict[str, Any], optional (default = None)
    Dictionary with additional arguments for AutoTokenizer.from_pretrained.
  • transformer_kwargs : Dict[str, Any], optional (default = None)
    Dictionary with additional arguments for AutoModel.from_pretrained.
  • output_token_strings : bool, optional (default = True)
    If True, we will add the input token ids to the output dictionary in forward (with key "token_ids"), and convert them to strings in make_output_human_readable (with key "tokens"). This is necessary for certain demo functionality, and it adds only a trivial amount of computation if you are not using a demo.
  • vocab_namespace : str, optional (default = "tags")
    The namespace to use in conjunction with the Vocabulary above. We use a somewhat confusing default of "tags" here, to match what is done in PretrainedTransformerIndexer.


class PretrainedTransformerBackbone(Backbone):
 | ...
 | def forward(self, text: TextFieldTensors) -> Dict[str, torch.Tensor]


class PretrainedTransformerBackbone(Backbone):
 | ...
 | def make_output_human_readable(
 |     self,
 |     output_dict: Dict[str, torch.Tensor]
 | ) -> Dict[str, torch.Tensor]