allennlp.models.next_token_lm¶
-
class
allennlp.models.next_token_lm.
NextTokenLM
(vocab: allennlp.data.vocabulary.Vocabulary, text_field_embedder: allennlp.modules.text_field_embedders.text_field_embedder.TextFieldEmbedder, language_model_head: allennlp.modules.language_model_heads.language_model_head.LanguageModelHead, contextualizer: allennlp.modules.seq2seq_encoders.seq2seq_encoder.Seq2SeqEncoder = None, target_namespace: str = 'bert', dropout: float = 0.0, initializer: allennlp.nn.initializers.InitializerApplicator = None)[source]¶ Bases:
allennlp.models.model.Model
The
NextTokenLM
embeds some input tokens, contextualizes them, then predicts the next word, computing a loss against known target.NOTE: This was developed for use in a demo, not for training. You definitely don’t want to train a language model using this code; it would be incredibly inefficient. This does compute correct gradients of the loss, however, so you can use it for interesting visualization of the gradients of a pretrained model, and it appears to be fast enough to sample from, at least for one word at a time. If you want to sample many tokens at a time, you’d want to re-use some intermediate computation, so you would either need to modify this code or use something else.
- Parameters
- vocab
Vocabulary
- text_field_embedder
TextFieldEmbedder
Used to embed the indexed tokens we get in
forward
.- language_model_head
LanguageModelHead
The
torch.nn.Module
that goes from the hidden states output by the contextualizer to logits over some output vocabulary.- contextualizer
Seq2SeqEncoder
, optional (default=None) Used to “contextualize” the embeddings. This is optional because the contextualization might actually be done in the text field embedder.
- target_namespace
str
, optional (default=’bert’) Namespace to use to convert predicted token ids to strings in
Model.decode
.- dropout
float
, optional (default=0.0) If specified, dropout is applied to the contextualized embeddings before computation of the softmax. The contextualized embeddings themselves are returned without dropout.
- vocab
-
decode
(self, output_dict: Dict[str, torch.Tensor]) → Dict[str, torch.Tensor][source]¶ Takes the result of
forward()
and runs inference / decoding / whatever post-processing you need to do your model. The intent is thatmodel.forward()
should produce potentials or probabilities, and thenmodel.decode()
can take those results and run some kind of beam search or constrained inference or whatever is necessary. This does not handle all possible decoding use cases, but it at least handles simple kinds of decoding.This method modifies the input dictionary, and also returns the same dictionary.
By default in the base class we do nothing. If your model has some special decoding step, override this method.
-
forward
(self, tokens: Dict[str, torch.LongTensor], target_ids: Dict[str, torch.LongTensor] = None) → Dict[str, torch.Tensor][source]¶ Defines the forward pass of the model. In addition, to facilitate easy training, this method is designed to compute a loss function defined by a user.
The input is comprised of everything required to perform a training update, including labels - you define the signature here! It is down to the user to ensure that inference can be performed without the presence of these labels. Hence, any inputs not available at inference time should only be used inside a conditional block.
The intended sketch of this method is as follows:
def forward(self, input1, input2, targets=None): .... .... output1 = self.layer1(input1) output2 = self.layer2(input2) output_dict = {"output1": output1, "output2": output2} if targets is not None: # Function returning a scalar torch.Tensor, defined by the user. loss = self._compute_loss(output1, output2, targets) output_dict["loss"] = loss return output_dict
- Parameters
- inputs:
Tensors comprising everything needed to perform a training update, including labels, which should be optional (i.e have a default value of
None
). At inference time, simply pass the relevant inputs, not including the labels.
- Returns
- output_dict:
Dict[str, torch.Tensor]
The outputs from the model. In order to train a model using the
Trainer
api, you must provide a “loss” key pointing to a scalartorch.Tensor
representing the loss to be optimized.
- output_dict:
-
get_metrics
(self, reset: bool = False)[source]¶ Returns a dictionary of metrics. This method will be called by
allennlp.training.Trainer
in order to compute and use model metrics for early stopping and model serialization. We return an empty dictionary here rather than raising as it is not required to implement metrics for a new model. A boolean reset parameter is passed, as frequently a metric accumulator will have some state which should be reset between epochs. This is also compatible withMetrics should be populated during the call to ``forward`
, with theMetric
handling the accumulation of the metric until this method is called.