next_token_lm
allennlp_models.lm.models.next_token_lm
NextTokenLM#
@Model.register("next_token_lm")
class NextTokenLM(Model):
| def __init__(
| self,
| vocab: Vocabulary,
| text_field_embedder: TextFieldEmbedder,
| language_model_head: LanguageModelHead,
| contextualizer: Seq2SeqEncoder = None,
| target_namespace: str = "bert",
| dropout: float = 0.0,
| initializer: InitializerApplicator = None,
| n_best: int = 5,
| beam_search_generator: BeamSearchGenerator = None,
| **kwargs
| ) -> None
The NextTokenLM
embeds some input tokens, contextualizes them, then predicts the next word,
computing a loss against known target.
If BeamSearch
is given, this model will predict a sequence of next tokens.
Note
This was developed for use in a demo, not for training. You definitely don't want to train a language model using this code; it would be incredibly inefficient. But it does compute correct gradients of the loss, however, so you can use it for interesting visualization of the gradients of a pretrained model, and it appears to be fast enough to sample from, at least for one word at a time.
Parameters¶
- vocab :
Vocabulary
- text_field_embedder :
TextFieldEmbedder
Used to embed the indexed tokens we get inforward
. - language_model_head :
LanguageModelHead
Thetorch.nn.Module
that goes from the hidden states output by the contextualizer to logits over some output vocabulary. - contextualizer :
Seq2SeqEncoder
, optional (default =None
)
Used to "contextualize" the embeddings. This is optional because the contextualization might actually be done in the text field embedder. - target_namespace :
str
, optional (default ='bert'
)
Namespace to use to convert predicted token ids to strings inModel.make_output_human_readable
. - dropout :
float
, optional (default =0.0
)
If specified, dropout is applied to the contextualized embeddings before computation of the softmax. The contextualized embeddings themselves are returned without dropout. - n_best :
int
, optional (default =5
)
The number of best tokens to predict. Ifbeam_search
is given, this option is ignored. - beam_search_generator :
BeamSearchGenerator
, optional (default =None
)
An optionalBeamSearchGenerator
. If given, the model will predict sequences of next tokens instead of just a single next token.
forward#
class NextTokenLM(Model):
| ...
| def forward(
| self,
| tokens: TextFieldTensors,
| target_ids: TextFieldTensors = None
| ) -> Dict[str, torch.Tensor]
Run a forward pass of the model, returning an output tensor dictionary with the following fields:
"probabilities"
: a tensor of shape(batch_size, n_best)
representing the probabilities of the predicted tokens, wheren_best
is eitherself._n_best
orbeam_size
if using beam search."top_indices"
: a tensor of shape(batch_size, n_best, num_predicted_tokens)
containing the IDs of the predicted tokens, wherenum_predicted_tokens
is just 1 unless using beam search, in which case it depends on the parameters of the beam search."token_ids"
: a tensor of shape(batch_size, num_input_tokens)
containing the IDs of the input tokens."loss"
(optional): the loss of the batch, only given iftarget_ids
is notNone
.
get_metrics#
class NextTokenLM(Model):
| ...
| def get_metrics(self, reset: bool = False)
make_output_human_readable#
class NextTokenLM(Model):
| ...
| @overrides
| def make_output_human_readable(
| self,
| output_dict: Dict[str, torch.Tensor]
| ) -> Dict[str, torch.Tensor]
Collects token strings from indices, adding two fields to the output_dict
:
"top_tokens"
: a list (for each instance in the batch) of lists (for each of then
best predictions) of lists of strings (for each token in each prediction)."tokens"
: a list of list (for each instance in the batch) of strings (for each input token in the instance).
default_predictor#
class NextTokenLM(Model):
| ...
| default_predictor = "next_token_lm"