allennlp.modules.token_embedders¶
A TokenEmbedder
is a Module
that
embeds one-hot-encoded tokens as vectors.
-
class
allennlp.modules.token_embedders.token_embedder.
TokenEmbedder
[source]¶ Bases:
torch.nn.modules.module.Module
,allennlp.common.registrable.Registrable
A
TokenEmbedder
is aModule
that takes as input a tensor with integer ids that have been output from aTokenIndexer
and outputs a vector per token in the input. The input typically has shape(batch_size, num_tokens)
or(batch_size, num_tokens, num_characters)
, and the output is of shape(batch_size, num_tokens, output_dim)
. The simplestTokenEmbedder
is just an embedding layer, but for character-level input, it could also be some kind of character encoder.We add a single method to the basic
Module
API:get_output_dim()
. This lets us more easily compute output dimensions for theTextFieldEmbedder
, which we might need when defining model parameters such as LSTMs or linear layers, which need to know their input dimension before the layers are called.-
default_implementation
: str = 'embedding'¶
-
-
class
allennlp.modules.token_embedders.embedding.
Embedding
(num_embeddings: int, embedding_dim: int, projection_dim: int = None, weight: torch.FloatTensor = None, padding_index: int = None, trainable: bool = True, max_norm: float = None, norm_type: float = 2.0, scale_grad_by_freq: bool = False, sparse: bool = False, vocab_namespace: str = None, pretrained_file: str = None)[source]¶ Bases:
allennlp.modules.token_embedders.token_embedder.TokenEmbedder
A more featureful embedding module than the default in Pytorch. Adds the ability to:
embed higher-order inputs
pre-specify the weight matrix
use a non-trainable embedding
project the resultant embeddings to some other dimension (which only makes sense with non-trainable embeddings).
build all of this easily
from_params
Note that if you are using our data API and are trying to embed a
TextField
, you should use aTextFieldEmbedder
instead of using this directly.- Parameters
- num_embeddingsint
Size of the dictionary of embeddings (vocabulary size).
- embedding_dimint
The size of each embedding vector.
- projection_dimint, (optional, default=None)
If given, we add a projection layer after the embedding layer. This really only makes sense if
trainable
isFalse
.- weighttorch.FloatTensor, (optional, default=None)
A pre-initialised weight matrix for the embedding lookup, allowing the use of pretrained vectors.
- padding_indexint, (optional, default=None)
If given, pads the output with zeros whenever it encounters the index.
- trainablebool, (optional, default=True)
Whether or not to optimize the embedding parameters.
- max_normfloat, (optional, default=None)
If given, will renormalize the embeddings to always have a norm lesser than this
- norm_typefloat, (optional, default=2)
The p of the p-norm to compute for the max_norm option
- scale_grad_by_freqboolean, (optional, default=False)
If given, this will scale gradients by the frequency of the words in the mini-batch.
- sparsebool, (optional, default=False)
Whether or not the Pytorch backend should use a sparse representation of the embedding weight.
- vocab_namespacestr, (optional, default=None)
In case of fine-tuning/transfer learning, the model’s embedding matrix needs to be extended according to the size of extended-vocabulary. To be able to know how much to extend the embedding-matrix, it’s necessary to know which vocab_namspace was used to construct it in the original training. We store vocab_namespace used during the original training as an attribute, so that it can be retrieved during fine-tuning.
- pretrained_filestr, (optional, default=None)
Used to keep track of what is the source of the weights and loading more embeddings at test time. It does not load the weights from this pretrained_file. For that purpose, use
Embedding.from_params
.
- Returns
- An Embedding module.
-
extend_vocab
(self, extended_vocab: allennlp.data.vocabulary.Vocabulary, vocab_namespace: str = None, extension_pretrained_file: str = None, model_path: str = None)[source]¶ Extends the embedding matrix according to the extended vocabulary. If extension_pretrained_file is available, it will be used for initializing the new words embeddings in the extended vocabulary; otherwise we will check if _pretrained_file attribute is already available. If none is available, they will be initialized with xavier uniform.
- Parameters
- extended_vocabVocabulary:
Vocabulary extended from original vocabulary used to construct this
Embedding
.- vocab_namespacestr, (optional, default=None)
In case you know what vocab_namespace should be used for extension, you can pass it. If not passed, it will check if vocab_namespace used at the time of
Embedding
construction is available. If so, this namespace will be used or else extend_vocab will be a no-op.- extension_pretrained_filestr, (optional, default=None)
A file containing pretrained embeddings can be specified here. It can be the path to a local file or an URL of a (cached) remote file. Check format details in
from_params
ofEmbedding
class.- model_pathstr, (optional, default=None)
Path traversing the model attributes upto this embedding module. Eg. “_text_field_embedder.token_embedder_tokens”. This is only useful to give helpful error message when extend_vocab is implicitly called by fine-tune or any other command.
-
forward
(self, inputs)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
classmethod
from_params
(vocab: allennlp.data.vocabulary.Vocabulary, params: allennlp.common.params.Params) → 'Embedding'[source]¶ We need the vocabulary here to know how many items we need to embed, and we look for a
vocab_namespace
key in the parameter dictionary to know which vocabulary to use. If you know beforehand exactly how many embeddings you need, or aren’t using a vocabulary mapping for the things getting embedded here, then you can pass in thenum_embeddings
key directly, and the vocabulary will be ignored.In the configuration file, a file containing pretrained embeddings can be specified using the parameter
"pretrained_file"
. It can be the path to a local file or an URL of a (cached) remote file. Two formats are supported:hdf5 file - containing an embedding matrix in the form of a torch.Tensor;
text file - an utf-8 encoded text file with space separated fields:
[word] [dim 1] [dim 2] ...
The text file can eventually be compressed with gzip, bz2, lzma or zip. You can even select a single file inside an archive containing multiple files using the URI:
"(archive_uri)#file_path_inside_the_archive"
where
archive_uri
can be a file system path or a URL. For example:"(https://nlp.stanford.edu/data/glove.twitter.27B.zip)#glove.twitter.27B.200d.txt"
-
class
allennlp.modules.token_embedders.embedding.
EmbeddingsFileURI
(main_file_uri, path_inside_archive)[source]¶ Bases:
tuple
-
property
main_file_uri
¶ Alias for field number 0
-
property
path_inside_archive
¶ Alias for field number 1
-
property
-
class
allennlp.modules.token_embedders.embedding.
EmbeddingsTextFile
(file_uri: str, encoding: str = 'utf-8', cache_dir: str = None)[source]¶ Bases:
collections.abc.Iterator
,typing.Generic
Utility class for opening embeddings text files. Handles various compression formats, as well as context management.
- Parameters
- file_uri: str
It can be:
a file system path or a URL of an eventually compressed text file or a zip/tar archive containing a single file.
URI of the type
(archive_path_or_url)#file_path_inside_archive
if the text file is contained in a multi-file archive.
- encoding: str
- cache_dir: str
-
DEFAULT_ENCODING
= 'utf-8'¶
-
allennlp.modules.token_embedders.embedding.
format_embeddings_file_uri
(main_file_path_or_url: str, path_inside_archive: Union[str, NoneType] = None) → str[source]¶
-
allennlp.modules.token_embedders.embedding.
parse_embeddings_file_uri
(uri: str) → 'EmbeddingsFileURI'[source]¶
-
class
allennlp.modules.token_embedders.token_characters_encoder.
TokenCharactersEncoder
(embedding: allennlp.modules.token_embedders.embedding.Embedding, encoder: allennlp.modules.seq2vec_encoders.seq2vec_encoder.Seq2VecEncoder, dropout: float = 0.0)[source]¶ Bases:
allennlp.modules.token_embedders.token_embedder.TokenEmbedder
A
TokenCharactersEncoder
takes the output of aTokenCharactersIndexer
, which is a tensor of shape (batch_size, num_tokens, num_characters), embeds the characters, runs a token-level encoder, and returns the result, which is a tensor of shape (batch_size, num_tokens, encoding_dim). We also optionally apply dropout after the token-level encoder.We take the embedding and encoding modules as input, so this class is itself quite simple.
-
forward
(self, token_characters: torch.Tensor) → torch.Tensor[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
classmethod
from_params
(vocab: allennlp.data.vocabulary.Vocabulary, params: allennlp.common.params.Params) → 'TokenCharactersEncoder'[source]¶ This is the automatic implementation of from_params. Any class that subclasses FromParams (or Registrable, which itself subclasses FromParams) gets this implementation for free. If you want your class to be instantiated from params in the “obvious” way – pop off parameters and hand them to your constructor with the same names – this provides that functionality.
If you need more complex logic in your from from_params method, you’ll have to implement your own method that overrides this one.
-
-
class
allennlp.modules.token_embedders.elmo_token_embedder.
ElmoTokenEmbedder
(options_file: str, weight_file: str, do_layer_norm: bool = False, dropout: float = 0.5, requires_grad: bool = False, projection_dim: int = None, vocab_to_cache: List[str] = None, scalar_mix_parameters: List[float] = None)[source]¶ Bases:
allennlp.modules.token_embedders.token_embedder.TokenEmbedder
Compute a single layer of ELMo representations.
This class serves as a convenience when you only want to use one layer of ELMo representations at the input of your network. It’s essentially a wrapper around Elmo(num_output_representations=1, …)
- Parameters
- options_file
str
, required. An ELMo JSON options file.
- weight_file
str
, required. An ELMo hdf5 weight file.
- do_layer_norm
bool
, optional. Should we apply layer normalization (passed to
ScalarMix
)?- dropout
float
, optional, (default = 0.5). The dropout value to be applied to the ELMo representations.
- requires_grad
bool
, optional If True, compute gradient of ELMo parameters for fine tuning.
- projection_dim
int
, optional If given, we will project the ELMo embedding down to this dimension. We recommend that you try using ELMo with a lot of dropout and no projection first, but we have found a few cases where projection helps (particularly where there is very limited training data).
- vocab_to_cache
List[str]
, optional. A list of words to pre-compute and cache character convolutions for. If you use this option, the ElmoTokenEmbedder expects that you pass word indices of shape (batch_size, timesteps) to forward, instead of character indices. If you use this option and pass a word which wasn’t pre-cached, this will break.
- scalar_mix_parameters
List[int]
, optional, (default=None) If not
None
, use these scalar mix parameters to weight the representations produced by different layers. These mixing weights are not updated during training. The mixing weights here should be the unnormalized (i.e., pre-softmax) weights. So, if you wanted to use only the 1st layer of a 2-layer ELMo, you can set this to [-9e10, 1, -9e10 ].
- options_file
-
forward
(self, inputs: torch.Tensor, word_inputs: torch.Tensor = None) → torch.Tensor[source]¶ - Parameters
- inputs: ``torch.Tensor``
Shape
(batch_size, timesteps, 50)
of character ids representing the current batch.- word_inputs
torch.Tensor
, optional. If you passed a cached vocab, you can in addition pass a tensor of shape
(batch_size, timesteps)
, which represent word ids which have been pre-cached.
- Returns
- The ELMo representations for the input sequence, shape
(batch_size, timesteps, embedding_dim)
-
classmethod
from_params
(vocab: allennlp.data.vocabulary.Vocabulary, params: allennlp.common.params.Params) → 'ElmoTokenEmbedder'[source]¶ This is the automatic implementation of from_params. Any class that subclasses FromParams (or Registrable, which itself subclasses FromParams) gets this implementation for free. If you want your class to be instantiated from params in the “obvious” way – pop off parameters and hand them to your constructor with the same names – this provides that functionality.
If you need more complex logic in your from from_params method, you’ll have to implement your own method that overrides this one.
-
class
allennlp.modules.token_embedders.elmo_token_embedder_multilang.
ElmoTokenEmbedderMultiLang
(options_files: Dict[str, str], weight_files: Dict[str, str], do_layer_norm: bool = False, dropout: float = 0.5, requires_grad: bool = False, projection_dim: int = None, vocab_to_cache: List[str] = None, scalar_mix_parameters: List[float] = None, aligning_files: Dict[str, str] = None)[source]¶ Bases:
allennlp.modules.token_embedders.token_embedder.TokenEmbedder
A multilingual ELMo embedder - extending ElmoTokenEmbedder for multiple languages. Each language has different weights for the ELMo model and an alignment matrix.
- Parameters
- options_files
Dict[str, str]
, required. A dictionary of language identifier to an ELMo JSON options file.
- weight_files
Dict[str, str]
, required. A dictionary of language identifier to an ELMo hdf5 weight file.
- do_layer_norm
bool
, optional. Should we apply layer normalization (passed to
ScalarMix
)?- dropout
float
, optional. The dropout value to be applied to the ELMo representations.
- requires_grad
bool
, optional If True, compute gradient of ELMo parameters for fine tuning.
- projection_dim
int
, optional If given, we will project the ELMo embedding down to this dimension. We recommend that you try using ELMo with a lot of dropout and no projection first, but we have found a few cases where projection helps (particulary where there is very limited training data).
- vocab_to_cache
List[str]
, optional, (default = 0.5). A list of words to pre-compute and cache character convolutions for. If you use this option, the ElmoTokenEmbedder expects that you pass word indices of shape (batch_size, timesteps) to forward, instead of character indices. If you use this option and pass a word which wasn’t pre-cached, this will break.
- scalar_mix_parameters
List[int]
, optional, (default=None). If not
None
, use these scalar mix parameters to weight the representations produced by different layers. These mixing weights are not updated during training.- aligning_files
Dict[str, str]
, optional, (default={}). A dictionary of language identifier to a pth file with an alignment matrix.
- options_files
-
forward
(self, inputs: torch.Tensor, lang: str, word_inputs: torch.Tensor = None) → torch.Tensor[source]¶ - Parameters
- inputs: ``torch.Tensor``
Shape
(batch_size, timesteps, 50)
of character ids representing the current batch.- lang
str
, , required. The language of the ELMo embedder to use.
- word_inputs
torch.Tensor
, optional. If you passed a cached vocab, you can in addition pass a tensor of shape
(batch_size, timesteps)
, which represent word ids which have been pre-cached.
- Returns
- The ELMo representations for the given language for the input sequence, shape
(batch_size, timesteps, embedding_dim)
-
classmethod
from_params
(vocab: allennlp.data.vocabulary.Vocabulary, params: allennlp.common.params.Params) → 'ElmoTokenEmbedderMultiLang'[source]¶ This is the automatic implementation of from_params. Any class that subclasses FromParams (or Registrable, which itself subclasses FromParams) gets this implementation for free. If you want your class to be instantiated from params in the “obvious” way – pop off parameters and hand them to your constructor with the same names – this provides that functionality.
If you need more complex logic in your from from_params method, you’ll have to implement your own method that overrides this one.
-
class
allennlp.modules.token_embedders.openai_transformer_embedder.
OpenaiTransformerEmbedder
(transformer: allennlp.modules.openai_transformer.OpenaiTransformer, top_layer_only: bool = False)[source]¶ Bases:
allennlp.modules.token_embedders.token_embedder.TokenEmbedder
Takes a byte-pair representation of a batch of sentences (as produced by the
OpenaiTransformerBytePairIndexer
) and generates a ScalarMix of the corresponding contextual embeddings.- Parameters
- transformer: ``OpenaiTransformer``, required.
The
OpenaiTransformer
module used for the embeddings.- top_layer_only: ``bool``, optional (default = ``False``)
If
True
, then only return the top layer instead of apply the scalar mix.
-
forward
(self, inputs: torch.Tensor, offsets: torch.Tensor = None) → torch.Tensor[source]¶ - Parameters
- inputs: ``torch.Tensor``, required
A
(batch_size, num_timesteps)
tensor representing the byte-pair encodings for the current batch.- offsets: ``torch.Tensor``, required
A
(batch_size, max_sequence_length)
tensor representing the word offsets for the current batch.
- Returns
[torch.Tensor]
An embedding representation of the input sequence having shape
(batch_size, sequence_length, embedding_dim)
A TokenEmbedder
which uses one of the BERT models
(https://github.com/google-research/bert)
to produce embeddings.
At its core it uses Hugging Face’s PyTorch implementation (https://github.com/huggingface/pytorch-pretrained-BERT), so thanks to them!
-
class
allennlp.modules.token_embedders.bert_token_embedder.
BertEmbedder
(bert_model: pytorch_pretrained_bert.modeling.BertModel, top_layer_only: bool = False, max_pieces: int = 512, num_start_tokens: int = 1, num_end_tokens: int = 1, scalar_mix_parameters: List[float] = None)[source]¶ Bases:
allennlp.modules.token_embedders.token_embedder.TokenEmbedder
A
TokenEmbedder
that produces BERT embeddings for your tokens. Should be paired with aBertIndexer
, which produces wordpiece ids.Most likely you probably want to use
PretrainedBertEmbedder
for one of the named pretrained models, not this base class.- Parameters
- bert_model: ``BertModel``
The BERT model being wrapped.
- top_layer_only: ``bool``, optional (default = ``False``)
If
True
, then only return the top layer instead of apply the scalar mix.- max_piecesint, optional (default: 512)
The BERT embedder uses positional embeddings and so has a corresponding maximum length for its input ids. Assuming the inputs are windowed and padded appropriately by this length, the embedder will split them into a large batch, feed them into BERT, and recombine the output as if it was a longer sequence.
- num_start_tokensint, optional (default: 1)
The number of starting special tokens input to BERT (usually 1, i.e., [CLS])
- num_end_tokensint, optional (default: 1)
The number of ending tokens input to BERT (usually 1, i.e., [SEP])
- scalar_mix_parameters: ``List[float]``, optional, (default = None)
If not
None
, use these scalar mix parameters to weight the representations produced by different layers. These mixing weights are not updated during training.
-
forward
(self, input_ids: torch.LongTensor, offsets: torch.LongTensor = None, token_type_ids: torch.LongTensor = None) → torch.Tensor[source]¶ - Parameters
- input_ids
torch.LongTensor
The (batch_size, …, max_sequence_length) tensor of wordpiece ids.
- offsets
torch.LongTensor
, optional The BERT embeddings are one per wordpiece. However it’s possible/likely you might want one per original token. In that case,
offsets
represents the indices of the desired wordpiece for each original token. Depending on how your token indexer is configured, this could be the position of the last wordpiece for each token, or it could be the position of the first wordpiece for each token.For example, if you had the sentence “Definitely not”, and if the corresponding wordpieces were [“Def”, “##in”, “##ite”, “##ly”, “not”], then the input_ids would be 5 wordpiece ids, and the “last wordpiece” offsets would be [3, 4]. If offsets are provided, the returned tensor will contain only the wordpiece embeddings at those positions, and (in particular) will contain one embedding per token. If offsets are not provided, the entire tensor of wordpiece embeddings will be returned.
- token_type_ids
torch.LongTensor
, optional If an input consists of two sentences (as in the BERT paper), tokens from the first sentence should have type 0 and tokens from the second sentence should have type 1. If you don’t provide this (the default BertIndexer doesn’t) then it’s assumed to be all 0s.
- input_ids
-
class
allennlp.modules.token_embedders.bert_token_embedder.
PretrainedBertEmbedder
(pretrained_model: str, requires_grad: bool = False, top_layer_only: bool = False, scalar_mix_parameters: List[float] = None)[source]¶ Bases:
allennlp.modules.token_embedders.bert_token_embedder.BertEmbedder
- Parameters
- pretrained_model: ``str``
Either the name of the pretrained model to use (e.g. ‘bert-base-uncased’), or the path to the .tar.gz file with the model weights.
If the name is a key in the list of pretrained models at https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/pytorch_pretrained_bert/modeling.py#L41 the corresponding path will be used; otherwise it will be interpreted as a path or URL.
- requires_grad
bool
, optional (default = False) If True, compute gradient of BERT parameters for fine tuning.
- top_layer_only: ``bool``, optional (default = ``False``)
If
True
, then only return the top layer instead of apply the scalar mix.- scalar_mix_parameters: ``List[float]``, optional, (default = None)
If not
None
, use these scalar mix parameters to weight the representations produced by different layers. These mixing weights are not updated during training.
-
class
allennlp.modules.token_embedders.bert_token_embedder.
PretrainedBertModel
[source]¶ Bases:
object
In some instances you may want to load the same BERT model twice (e.g. to use as a token embedder and also as a pooling layer). This factory provides a cache so that you don’t actually have to load the model twice.
-
class
allennlp.modules.token_embedders.language_model_token_embedder.
LanguageModelTokenEmbedder
(archive_file: str, dropout: float = None, bos_eos_tokens: Tuple[str, str] = ('<S>', '</S>'), remove_bos_eos: bool = True, requires_grad: bool = False)[source]¶ Bases:
allennlp.modules.token_embedders.token_embedder.TokenEmbedder
Compute a single layer of representations from a (optionally bidirectional) language model. This is done by computing a learned scalar average of the layers from the LM. Typically the LM’s weights will be fixed, but they can be fine tuned by setting
requires_grad
.- Parameters
- archive_file
str
, required An archive file, typically model.tar.gz, from a LanguageModel. The contextualizer used by the LM must satisfy two requirements:
It must have a num_layers field.
It must take a boolean return_all_layers parameter in its constructor.
See BidirectionalLanguageModelTransformer for their definitions.
- dropout
float
, optional. The dropout value to be applied to the representations.
- bos_eos_tokens
Tuple[str, str]
, optional (default=``(“<S>”, “</S>”)``) These will be indexed and placed around the indexed tokens. Necessary if the language model was trained with them, but they were injected external to an indexer.
- remove_bos_eos: ``bool``, optional (default: True)
Typically the provided token indexes will be augmented with begin-sentence and end-sentence tokens. (Alternatively, you can pass bos_eos_tokens.) If this flag is True the corresponding embeddings will be removed from the return values.
Warning: This only removes a single start and single end token!
- requires_grad
bool
, optional (default: False) If True, compute gradient of bidirectional language model parameters for fine tuning.
- archive_file
-
forward
(self, inputs: torch.Tensor) → Dict[str, torch.Tensor][source]¶ - Parameters
- inputs: ``torch.Tensor``
Shape
(batch_size, timesteps, ...)
of token ids representing the current batch. These must have been produced using the same indexer the LM was trained on.
- Returns
- The bidirectional language model representations for the input sequence, shape
(batch_size, timesteps, embedding_dim)
-
class
allennlp.modules.token_embedders.bag_of_word_counts_token_embedder.
BagOfWordCountsTokenEmbedder
(vocab: allennlp.data.vocabulary.Vocabulary, vocab_namespace: str, projection_dim: int = None, ignore_oov: bool = False)[source]¶ Bases:
allennlp.modules.token_embedders.token_embedder.TokenEmbedder
Represents a sequence of tokens as a bag of (discrete) word ids, as it was done in the pre-neural days.
Each sequence gets a vector of length vocabulary size, where the i’th entry in the vector corresponds to number of times the i’th token in the vocabulary appears in the sequence.
By default, we ignore padding tokens.
- Parameters
- vocab: ``Vocabulary``
- vocab_namespace: ``str``
namespace of vocabulary to embed
- projection_dim
int
, optional (default =None
) if specified, will project the resulting bag of words representation to specified dimension.
- ignore_oov
bool
, optional (default =False
) If true, we ignore the OOV token.
-
forward
(self, inputs: torch.Tensor) → torch.Tensor[source]¶ - Parameters
- inputs: ``torch.Tensor``
Shape
(batch_size, timesteps, sequence_length)
of word ids representing the current batch.
- Returns
- The bag-of-words representations for the input sequence, shape
(batch_size, vocab_size)
-
class
allennlp.modules.token_embedders.pass_through_token_embedder.
PassThroughTokenEmbedder
(hidden_dim: int)[source]¶ Bases:
allennlp.modules.token_embedders.token_embedder.TokenEmbedder
Assumes that the input is already vectorized in some way, and just returns it.
- Parameters
- hidden_dimint, required.
-
forward
(self, inputs: torch.Tensor) → torch.Tensor[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class
allennlp.modules.token_embedders.pretrained_transformer_embedder.
PretrainedTransformerEmbedder
(model_name: str)[source]¶ Bases:
allennlp.modules.token_embedders.token_embedder.TokenEmbedder
Uses a pretrained model from
pytorch-transformers
as aTokenEmbedder
.-
forward
(self, token_ids: torch.LongTensor) → torch.Tensor[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-