next_token_lm

allennlp_models.lm.models.next_token_lm

NextTokenLM#

@Model.register("next_token_lm")
class NextTokenLM(Model):
 | def __init__(
 |     self,
 |     vocab: Vocabulary,
 |     text_field_embedder: TextFieldEmbedder,
 |     language_model_head: LanguageModelHead,
 |     contextualizer: Seq2SeqEncoder = None,
 |     target_namespace: str = "bert",
 |     dropout: float = 0.0,
 |     initializer: InitializerApplicator = None,
 |     n_best: int = 5,
 |     beam_search_generator: BeamSearchGenerator = None,
 |     **kwargs
 | ) -> None

The NextTokenLM embeds some input tokens, contextualizes them, then predicts the next word, computing a loss against known target.

If BeamSearch is given, this model will predict a sequence of next tokens.

Note

This was developed for use in a demo, not for training. You definitely don't want to train a language model using this code; it would be incredibly inefficient. But it does compute correct gradients of the loss, however, so you can use it for interesting visualization of the gradients of a pretrained model, and it appears to be fast enough to sample from, at least for one word at a time.

Parameters¶

vocab : Vocabulary
text_field_embedder : TextFieldEmbedder
Used to embed the indexed tokens we get in forward.
language_model_head : LanguageModelHead
The torch.nn.Module that goes from the hidden states output by the contextualizer to logits over some output vocabulary.
contextualizer : Seq2SeqEncoder, optional (default = None)
Used to "contextualize" the embeddings. This is optional because the contextualization might actually be done in the text field embedder.
target_namespace : str, optional (default = 'bert')
Namespace to use to convert predicted token ids to strings in Model.make_output_human_readable.
dropout : float, optional (default = 0.0)
If specified, dropout is applied to the contextualized embeddings before computation of the softmax. The contextualized embeddings themselves are returned without dropout.
n_best : int, optional (default = 5)
The number of best tokens to predict. If beam_search is given, this option is ignored.
beam_search_generator : BeamSearchGenerator, optional (default = None)
An optional BeamSearchGenerator. If given, the model will predict sequences of next tokens instead of just a single next token.

forward#

class NextTokenLM(Model):
 | ...
 | def forward(
 |     self,
 |     tokens: TextFieldTensors,
 |     target_ids: TextFieldTensors = None
 | ) -> Dict[str, torch.Tensor]

Run a forward pass of the model, returning an output tensor dictionary with the following fields:

"probabilities": a tensor of shape (batch_size, n_best) representing the probabilities of the predicted tokens, where n_best is either self._n_best or beam_size if using beam search.
"top_indices": a tensor of shape (batch_size, n_best, num_predicted_tokens) containing the IDs of the predicted tokens, where num_predicted_tokens is just 1 unless using beam search, in which case it depends on the parameters of the beam search.
"token_ids": a tensor of shape (batch_size, num_input_tokens) containing the IDs of the input tokens.
"loss" (optional): the loss of the batch, only given if target_ids is not None.

get_metrics#

class NextTokenLM(Model):
 | ...
 | def get_metrics(self, reset: bool = False)

make_output_human_readable#

class NextTokenLM(Model):
 | ...
 | def make_output_human_readable(
 |     self,
 |     output_dict: Dict[str, torch.Tensor]
 | ) -> Dict[str, torch.Tensor]

Collects token strings from indices, adding two fields to the output_dict:

"top_tokens": a list (for each instance in the batch) of lists (for each of the n best predictions) of lists of strings (for each token in each prediction).
"tokens": a list of list (for each instance in the batch) of strings (for each input token in the instance).

default_predictor#

class NextTokenLM(Model):
 | ...
 | default_predictor = "next_token_lm"