language_model

allennlp_models.lm.modules.token_embedders.language_model

LanguageModelTokenEmbedder#

@TokenEmbedder.register("language_model_token_embedder")
class LanguageModelTokenEmbedder(TokenEmbedder):
 | def __init__(
 |     self,
 |     archive_file: str,
 |     dropout: float = None,
 |     bos_eos_tokens: Tuple[str, str] = ("<S>", "</S>"),
 |     remove_bos_eos: bool = True,
 |     requires_grad: bool = False
 | ) -> None

Compute a single layer of representations from a (optionally bidirectional) language model. This is done by computing a learned scalar average of the layers from the LM. Typically the LM's weights will be fixed, but they can be fine tuned by setting requires_grad.

Parameters¶

archive_file : str
An archive file, typically model.tar.gz, from a LanguageModel. The contextualizer used by the LM must satisfy two requirements:
1. It must have a num_layers field.
2. It must take a boolean return_all_layers parameter in its constructor.
See BidirectionalLanguageModelTransformer for their definitions.
dropout : float, optional
The dropout value to be applied to the representations.
bos_eos_tokens : Tuple[str, str], optional (default = ("<S>", "</S>"))
These will be indexed and placed around the indexed tokens. Necessary if the language model was trained with them, but they were injected external to an indexer.
remove_bos_eos : bool, optional (default = True)
Typically the provided token indexes will be augmented with begin-sentence and end-sentence tokens. (Alternatively, you can pass bos_eos_tokens.) If this flag is True the corresponding embeddings will be removed from the return values.

Warning: This only removes a single start and single end token! - requires_grad : bool, optional (default = False)
If True, compute gradient of bidirectional language model parameters for fine tuning.

get_output_dim#

class LanguageModelTokenEmbedder(TokenEmbedder):
 | ...
 | def get_output_dim(self) -> int

forward#

class LanguageModelTokenEmbedder(TokenEmbedder):
 | ...
 | def forward(self, tokens: torch.Tensor) -> Dict[str, torch.Tensor]

Parameters¶

tokens : torch.Tensor
Shape (batch_size, timesteps, ...) of token ids representing the current batch. These must have been produced using the same indexer the LM was trained on.

Returns¶

The bidirectional language model representations for the input sequence, shape
(batch_size, timesteps, embedding_dim)