allennlp.models.reading_comprehension¶
Reading comprehension is loosely defined as follows: given a question and a passage of text that contains the answer, answer the question.
These submodules contain models for things that are predominantly focused on reading comprehension.
-
class
allennlp.models.reading_comprehension.bidaf.
BidirectionalAttentionFlow
(vocab: allennlp.data.vocabulary.Vocabulary, text_field_embedder: allennlp.modules.text_field_embedders.text_field_embedder.TextFieldEmbedder, num_highway_layers: int, phrase_layer: allennlp.modules.seq2seq_encoders.seq2seq_encoder.Seq2SeqEncoder, similarity_function: allennlp.modules.similarity_functions.similarity_function.SimilarityFunction, modeling_layer: allennlp.modules.seq2seq_encoders.seq2seq_encoder.Seq2SeqEncoder, span_end_encoder: allennlp.modules.seq2seq_encoders.seq2seq_encoder.Seq2SeqEncoder, dropout: float = 0.2, mask_lstms: bool = True, initializer: allennlp.nn.initializers.InitializerApplicator = <allennlp.nn.initializers.InitializerApplicator object>, regularizer: Optional[allennlp.nn.regularizers.regularizer_applicator.RegularizerApplicator] = None)[source]¶ Bases:
allennlp.models.model.Model
This class implements Minjoon Seo’s Bidirectional Attention Flow model for answering reading comprehension questions (ICLR 2017).
The basic layout is pretty simple: encode words as a combination of word embeddings and a character-level encoder, pass the word representations through a bi-LSTM/GRU, use a matrix of attentions to put question information into the passage word representations (this is the only part that is at all non-standard), pass this through another few layers of bi-LSTMs/GRUs, and do a softmax over span start and span end.
- Parameters
- vocab
Vocabulary
- text_field_embedder
TextFieldEmbedder
Used to embed the
question
andpassage
TextFields
we get as input to the model.- num_highway_layers
int
The number of highway layers to use in between embedding the input and passing it through the phrase layer.
- phrase_layer
Seq2SeqEncoder
The encoder (with its own internal stacking) that we will use in between embedding tokens and doing the bidirectional attention.
- similarity_function
SimilarityFunction
The similarity function that we will use when comparing encoded passage and question representations.
- modeling_layer
Seq2SeqEncoder
The encoder (with its own internal stacking) that we will use in between the bidirectional attention and predicting span start and end.
- span_end_encoder
Seq2SeqEncoder
The encoder that we will use to incorporate span start predictions into the passage state before predicting span end.
- dropout
float
, optional (default=0.2) If greater than 0, we will apply dropout with this probability after all encoders (pytorch LSTMs do not apply dropout to their last layer).
- mask_lstms
bool
, optional (default=True) If
False
, we will skip passing the mask to the LSTM layers. This gives a ~2x speedup, with only a slight performance decrease, if any. We haven’t experimented much with this yet, but have confirmed that we still get very similar performance with much faster training times. We still use the mask for all softmaxes, but avoid the shuffling that’s required when using masking with pytorch LSTMs.- initializer
InitializerApplicator
, optional (default=``InitializerApplicator()``) Used to initialize the model parameters.
- regularizer
RegularizerApplicator
, optional (default=``None``) If provided, will be used to calculate the regularization penalty during training.
- vocab
-
forward
(self, question: Dict[str, torch.LongTensor], passage: Dict[str, torch.LongTensor], span_start: torch.IntTensor = None, span_end: torch.IntTensor = None, metadata: List[Dict[str, Any]] = None) → Dict[str, torch.Tensor][source]¶ - Parameters
- questionDict[str, torch.LongTensor]
From a
TextField
.- passageDict[str, torch.LongTensor]
From a
TextField
. The model assumes that this passage contains the answer to the question, and predicts the beginning and ending positions of the answer within the passage.- span_start
torch.IntTensor
, optional From an
IndexField
. This is one of the things we are trying to predict - the beginning position of the answer with the passage. This is an inclusive token index. If this is given, we will compute a loss that gets included in the output dictionary.- span_end
torch.IntTensor
, optional From an
IndexField
. This is one of the things we are trying to predict - the ending position of the answer with the passage. This is an inclusive token index. If this is given, we will compute a loss that gets included in the output dictionary.- metadata
List[Dict[str, Any]]
, optional metadata :
List[Dict[str, Any]]
, optional If present, this should contain the question tokens, passage tokens, original passage text, and token offsets into the passage for each instance in the batch. The length of this list should be the batch size, and each dictionary should have the keysquestion_tokens
,passage_tokens
,original_passage
, andtoken_offsets
.
- Returns
- An output dictionary consisting of:
- span_start_logitstorch.FloatTensor
A tensor of shape
(batch_size, passage_length)
representing unnormalized log probabilities of the span start position.- span_start_probstorch.FloatTensor
The result of
softmax(span_start_logits)
.- span_end_logitstorch.FloatTensor
A tensor of shape
(batch_size, passage_length)
representing unnormalized log probabilities of the span end position (inclusive).- span_end_probstorch.FloatTensor
The result of
softmax(span_end_logits)
.- best_spantorch.IntTensor
The result of a constrained inference over
span_start_logits
andspan_end_logits
to find the most probable span. Shape is(batch_size, 2)
and each offset is a token index.- losstorch.FloatTensor, optional
A scalar loss to be optimised.
- best_span_strList[str]
If sufficient metadata was provided for the instances in the batch, we also return the string from the original passage that the model thinks is the best answer to the question.
-
static
get_best_span
(span_start_logits: torch.Tensor, span_end_logits: torch.Tensor) → torch.Tensor[source]¶
-
get_metrics
(self, reset: bool = False) → Dict[str, float][source]¶ Returns a dictionary of metrics. This method will be called by
allennlp.training.Trainer
in order to compute and use model metrics for early stopping and model serialization. We return an empty dictionary here rather than raising as it is not required to implement metrics for a new model. A boolean reset parameter is passed, as frequently a metric accumulator will have some state which should be reset between epochs. This is also compatible withMetrics should be populated during the call to ``forward`
, with theMetric
handling the accumulation of the metric until this method is called.
-
class
allennlp.models.reading_comprehension.bidaf_ensemble.
BidafEnsemble
(submodels: List[allennlp.models.reading_comprehension.bidaf.BidirectionalAttentionFlow])[source]¶ Bases:
allennlp.models.ensemble.Ensemble
This class ensembles the output from multiple BiDAF models.
It combines results from the submodels by averaging the start and end span probabilities.
-
forward
(self, question: Dict[str, torch.LongTensor], passage: Dict[str, torch.LongTensor], span_start: torch.IntTensor = None, span_end: torch.IntTensor = None, metadata: List[Dict[str, Any]] = None) → Dict[str, torch.Tensor][source]¶ The forward method runs each of the submodels, then selects the best span from the subresults. The best span is determined by averaging the probabilities for the start and end of the spans.
- Parameters
- questionDict[str, torch.LongTensor]
From a
TextField
.- passageDict[str, torch.LongTensor]
From a
TextField
. The model assumes that this passage contains the answer to the question, and predicts the beginning and ending positions of the answer within the passage.- span_start
torch.IntTensor
, optional From an
IndexField
. This is one of the things we are trying to predict - the beginning position of the answer with the passage. This is an inclusive token index. If this is given, we will compute a loss that gets included in the output dictionary.- span_end
torch.IntTensor
, optional From an
IndexField
. This is one of the things we are trying to predict - the ending position of the answer with the passage. This is an inclusive token index. If this is given, we will compute a loss that gets included in the output dictionary.- metadata
List[Dict[str, Any]]
, optional If present, this should contain the question ID, original passage text, and token offsets into the passage for each instance in the batch. We use this for computing official metrics using the official SQuAD evaluation script. The length of this list should be the batch size, and each dictionary should have the keys
id
,original_passage
, andtoken_offsets
. If you only want the best span string and don’t care about official metrics, you can omit theid
key.
- Returns
- An output dictionary consisting of:
- best_spantorch.IntTensor
The result of a constrained inference over
span_start_logits
andspan_end_logits
to find the most probable span. Shape is(batch_size, 2)
and each offset is a token index.- best_span_strList[str]
If sufficient metadata was provided for the instances in the batch, we also return the string from the original passage that the model thinks is the best answer to the question.
-
classmethod
from_params
(vocab: allennlp.data.vocabulary.Vocabulary, params: allennlp.common.params.Params) → 'BidafEnsemble'[source]¶ This is the automatic implementation of from_params. Any class that subclasses FromParams (or Registrable, which itself subclasses FromParams) gets this implementation for free. If you want your class to be instantiated from params in the “obvious” way – pop off parameters and hand them to your constructor with the same names – this provides that functionality.
If you need more complex logic in your from from_params method, you’ll have to implement your own method that overrides this one.
-
get_metrics
(self, reset: bool = False) → Dict[str, float][source]¶ Returns a dictionary of metrics. This method will be called by
allennlp.training.Trainer
in order to compute and use model metrics for early stopping and model serialization. We return an empty dictionary here rather than raising as it is not required to implement metrics for a new model. A boolean reset parameter is passed, as frequently a metric accumulator will have some state which should be reset between epochs. This is also compatible withMetrics should be populated during the call to ``forward`
, with theMetric
handling the accumulation of the metric until this method is called.
-
-
allennlp.models.reading_comprehension.bidaf_ensemble.
ensemble
(subresults: List[Dict[str, torch.Tensor]]) → torch.Tensor[source]¶ Identifies the best prediction given the results from the submodels.
- Parameters
- subresultsList[Dict[str, torch.Tensor]]
Results of each submodel.
- Returns
- The index of the best submodel.
-
class
allennlp.models.reading_comprehension.dialog_qa.
DialogQA
(vocab: allennlp.data.vocabulary.Vocabulary, text_field_embedder: allennlp.modules.text_field_embedders.text_field_embedder.TextFieldEmbedder, phrase_layer: allennlp.modules.seq2seq_encoders.seq2seq_encoder.Seq2SeqEncoder, residual_encoder: allennlp.modules.seq2seq_encoders.seq2seq_encoder.Seq2SeqEncoder, span_start_encoder: allennlp.modules.seq2seq_encoders.seq2seq_encoder.Seq2SeqEncoder, span_end_encoder: allennlp.modules.seq2seq_encoders.seq2seq_encoder.Seq2SeqEncoder, initializer: allennlp.nn.initializers.InitializerApplicator, dropout: float = 0.2, num_context_answers: int = 0, marker_embedding_dim: int = 10, max_span_length: int = 30, max_turn_length: int = 12)[source]¶ Bases:
allennlp.models.model.Model
This class implements modified version of BiDAF (with self attention and residual layer, from Clark and Gardner ACL 17 paper) model as used in Question Answering in Context (EMNLP 2018) paper [https://arxiv.org/pdf/1808.07036.pdf].
In this set-up, a single instance is a dialog, list of question answer pairs.
- Parameters
- vocab
Vocabulary
- text_field_embedder
TextFieldEmbedder
Used to embed the
question
andpassage
TextFields
we get as input to the model.- phrase_layer
Seq2SeqEncoder
The encoder (with its own internal stacking) that we will use in between embedding tokens and doing the bidirectional attention.
- span_start_encoder
Seq2SeqEncoder
The encoder that we will use to incorporate span start predictions into the passage state before predicting span end.
- span_end_encoder
Seq2SeqEncoder
The encoder that we will use to incorporate span end predictions into the passage state.
- dropout
float
, optional (default=0.2) If greater than 0, we will apply dropout with this probability after all encoders (pytorch LSTMs do not apply dropout to their last layer).
- num_context_answers
int
, optional (default=0) If greater than 0, the model will consider previous question answering context.
- max_span_length: ``int``, optional (default=0)
Maximum token length of the output span.
- max_turn_length: ``int``, optional (default=12)
Maximum length of an interaction.
- vocab
-
decode
(self, output_dict: Dict[str, torch.Tensor]) → Dict[str, Any][source]¶ Takes the result of
forward()
and runs inference / decoding / whatever post-processing you need to do your model. The intent is thatmodel.forward()
should produce potentials or probabilities, and thenmodel.decode()
can take those results and run some kind of beam search or constrained inference or whatever is necessary. This does not handle all possible decoding use cases, but it at least handles simple kinds of decoding.This method modifies the input dictionary, and also returns the same dictionary.
By default in the base class we do nothing. If your model has some special decoding step, override this method.
-
forward
(self, question: Dict[str, torch.LongTensor], passage: Dict[str, torch.LongTensor], span_start: torch.IntTensor = None, span_end: torch.IntTensor = None, p1_answer_marker: torch.IntTensor = None, p2_answer_marker: torch.IntTensor = None, p3_answer_marker: torch.IntTensor = None, yesno_list: torch.IntTensor = None, followup_list: torch.IntTensor = None, metadata: List[Dict[str, Any]] = None) → Dict[str, torch.Tensor][source]¶ - Parameters
- questionDict[str, torch.LongTensor]
From a
TextField
.- passageDict[str, torch.LongTensor]
From a
TextField
. The model assumes that this passage contains the answer to the question, and predicts the beginning and ending positions of the answer within the passage.- span_start
torch.IntTensor
, optional From an
IndexField
. This is one of the things we are trying to predict - the beginning position of the answer with the passage. This is an inclusive token index. If this is given, we will compute a loss that gets included in the output dictionary.- span_end
torch.IntTensor
, optional From an
IndexField
. This is one of the things we are trying to predict - the ending position of the answer with the passage. This is an inclusive token index. If this is given, we will compute a loss that gets included in the output dictionary.- p1_answer_marker
torch.IntTensor
, optional This is one of the inputs, but only when num_context_answers > 0. This is a tensor that has a shape [batch_size, max_qa_count, max_passage_length]. Most passage token will have assigned ‘O’, except the passage tokens belongs to the previous answer in the dialog, which will be assigned labels such as <1_start>, <1_in>, <1_end>. For more details, look into dataset_readers/util/make_reading_comprehension_instance_quac
- p2_answer_marker
torch.IntTensor
, optional This is one of the inputs, but only when num_context_answers > 1. It is similar to p1_answer_marker, but marking previous previous answer in passage.
- p3_answer_marker
torch.IntTensor
, optional This is one of the inputs, but only when num_context_answers > 2. It is similar to p1_answer_marker, but marking previous previous previous answer in passage.
- yesno_list
torch.IntTensor
, optional This is one of the outputs that we are trying to predict. Three way classification (the yes/no/not a yes no question).
- followup_list
torch.IntTensor
, optional This is one of the outputs that we are trying to predict. Three way classification (followup / maybe followup / don’t followup).
- metadata
List[Dict[str, Any]]
, optional If present, this should contain the question ID, original passage text, and token offsets into the passage for each instance in the batch. We use this for computing official metrics using the official SQuAD evaluation script. The length of this list should be the batch size, and each dictionary should have the keys
id
,original_passage
, andtoken_offsets
. If you only want the best span string and don’t care about official metrics, you can omit theid
key.
- Returns
- An output dictionary consisting of the followings.
- Each of the followings is a nested list because first iterates over dialog, then questions in dialog.
- qidList[List[str]]
A list of list, consisting of question ids.
- followupList[List[int]]
A list of list, consisting of continuation marker prediction index. (y :yes, m: maybe follow up, n: don’t follow up)
- yesnoList[List[int]]
A list of list, consisting of affirmation marker prediction index. (y :yes, x: not a yes/no question, n: np)
- best_span_strList[List[str]]
If sufficient metadata was provided for the instances in the batch, we also return the string from the original passage that the model thinks is the best answer to the question.
- losstorch.FloatTensor, optional
A scalar loss to be optimised.
-
get_metrics
(self, reset: bool = False) → Dict[str, float][source]¶ Returns a dictionary of metrics. This method will be called by
allennlp.training.Trainer
in order to compute and use model metrics for early stopping and model serialization. We return an empty dictionary here rather than raising as it is not required to implement metrics for a new model. A boolean reset parameter is passed, as frequently a metric accumulator will have some state which should be reset between epochs. This is also compatible withMetrics should be populated during the call to ``forward`
, with theMetric
handling the accumulation of the metric until this method is called.
-
class
allennlp.models.reading_comprehension.qanet.
QaNet
(vocab: allennlp.data.vocabulary.Vocabulary, text_field_embedder: allennlp.modules.text_field_embedders.text_field_embedder.TextFieldEmbedder, num_highway_layers: int, phrase_layer: allennlp.modules.seq2seq_encoders.seq2seq_encoder.Seq2SeqEncoder, matrix_attention_layer: allennlp.modules.matrix_attention.matrix_attention.MatrixAttention, modeling_layer: allennlp.modules.seq2seq_encoders.seq2seq_encoder.Seq2SeqEncoder, dropout_prob: float = 0.1, initializer: allennlp.nn.initializers.InitializerApplicator = <allennlp.nn.initializers.InitializerApplicator object>, regularizer: Optional[allennlp.nn.regularizers.regularizer_applicator.RegularizerApplicator] = None)[source]¶ Bases:
allennlp.models.model.Model
This class implements Adams Wei Yu’s QANet Model for machine reading comprehension published at ICLR 2018.
The overall architecture of QANet is very similar to BiDAF. The main difference is that QANet replaces the RNN encoder with CNN + self-attention. There are also some minor differences in the modeling layer and output layer.
- Parameters
- vocab
Vocabulary
- text_field_embedder
TextFieldEmbedder
Used to embed the
question
andpassage
TextFields
we get as input to the model.- num_highway_layers
int
The number of highway layers to use in between embedding the input and passing it through the phrase layer.
- phrase_layer
Seq2SeqEncoder
The encoder (with its own internal stacking) that we will use in between embedding tokens and doing the passage-question attention.
- matrix_attention_layer
MatrixAttention
The matrix attention function that we will use when comparing encoded passage and question representations.
- modeling_layer
Seq2SeqEncoder
The encoder (with its own internal stacking) that we will use in between the bidirectional attention and predicting span start and end.
- dropout_prob
float
, optional (default=0.1) If greater than 0, we will apply dropout with this probability between layers.
- initializer
InitializerApplicator
, optional (default=``InitializerApplicator()``) Used to initialize the model parameters.
- regularizer
RegularizerApplicator
, optional (default=``None``) If provided, will be used to calculate the regularization penalty during training.
- vocab
-
forward
(self, question: Dict[str, torch.LongTensor], passage: Dict[str, torch.LongTensor], span_start: torch.IntTensor = None, span_end: torch.IntTensor = None, metadata: List[Dict[str, Any]] = None) → Dict[str, torch.Tensor][source]¶ - Parameters
- questionDict[str, torch.LongTensor]
From a
TextField
.- passageDict[str, torch.LongTensor]
From a
TextField
. The model assumes that this passage contains the answer to the question, and predicts the beginning and ending positions of the answer within the passage.- span_start
torch.IntTensor
, optional From an
IndexField
. This is one of the things we are trying to predict - the beginning position of the answer with the passage. This is an inclusive token index. If this is given, we will compute a loss that gets included in the output dictionary.- span_end
torch.IntTensor
, optional From an
IndexField
. This is one of the things we are trying to predict - the ending position of the answer with the passage. This is an inclusive token index. If this is given, we will compute a loss that gets included in the output dictionary.- metadata
List[Dict[str, Any]]
, optional If present, this should contain the question tokens, passage tokens, original passage text, and token offsets into the passage for each instance in the batch. The length of this list should be the batch size, and each dictionary should have the keys
question_tokens
,passage_tokens
,original_passage
, andtoken_offsets
.
- Returns
- An output dictionary consisting of:
- span_start_logitstorch.FloatTensor
A tensor of shape
(batch_size, passage_length)
representing unnormalized log probabilities of the span start position.- span_start_probstorch.FloatTensor
The result of
softmax(span_start_logits)
.- span_end_logitstorch.FloatTensor
A tensor of shape
(batch_size, passage_length)
representing unnormalized log probabilities of the span end position (inclusive).- span_end_probstorch.FloatTensor
The result of
softmax(span_end_logits)
.- best_spantorch.IntTensor
The result of a constrained inference over
span_start_logits
andspan_end_logits
to find the most probable span. Shape is(batch_size, 2)
and each offset is a token index.- losstorch.FloatTensor, optional
A scalar loss to be optimised.
- best_span_strList[str]
If sufficient metadata was provided for the instances in the batch, we also return the string from the original passage that the model thinks is the best answer to the question.
-
get_metrics
(self, reset: bool = False) → Dict[str, float][source]¶ Returns a dictionary of metrics. This method will be called by
allennlp.training.Trainer
in order to compute and use model metrics for early stopping and model serialization. We return an empty dictionary here rather than raising as it is not required to implement metrics for a new model. A boolean reset parameter is passed, as frequently a metric accumulator will have some state which should be reset between epochs. This is also compatible withMetrics should be populated during the call to ``forward`
, with theMetric
handling the accumulation of the metric until this method is called.
-
class
allennlp.models.reading_comprehension.naqanet.
NumericallyAugmentedQaNet
(vocab: allennlp.data.vocabulary.Vocabulary, text_field_embedder: allennlp.modules.text_field_embedders.text_field_embedder.TextFieldEmbedder, num_highway_layers: int, phrase_layer: allennlp.modules.seq2seq_encoders.seq2seq_encoder.Seq2SeqEncoder, matrix_attention_layer: allennlp.modules.matrix_attention.matrix_attention.MatrixAttention, modeling_layer: allennlp.modules.seq2seq_encoders.seq2seq_encoder.Seq2SeqEncoder, dropout_prob: float = 0.1, initializer: allennlp.nn.initializers.InitializerApplicator = <allennlp.nn.initializers.InitializerApplicator object>, regularizer: Optional[allennlp.nn.regularizers.regularizer_applicator.RegularizerApplicator] = None, answering_abilities: List[str] = None)[source]¶ Bases:
allennlp.models.model.Model
This class augments the QANet model with some rudimentary numerical reasoning abilities, as published in the original DROP paper.
The main idea here is that instead of just predicting a passage span after doing all of the QANet modeling stuff, we add several different “answer abilities”: predicting a span from the question, predicting a count, or predicting an arithmetic expression. Near the end of the QANet model, we have a variable that predicts what kind of answer type we need, and each branch has separate modeling logic to predict that answer type. We then marginalize over all possible ways of getting to the right answer through each of these answer types.
-
forward
(self, question: Dict[str, torch.LongTensor], passage: Dict[str, torch.LongTensor], number_indices: torch.LongTensor, answer_as_passage_spans: torch.LongTensor = None, answer_as_question_spans: torch.LongTensor = None, answer_as_add_sub_expressions: torch.LongTensor = None, answer_as_counts: torch.LongTensor = None, metadata: List[Dict[str, Any]] = None) → Dict[str, torch.Tensor][source]¶ Defines the forward pass of the model. In addition, to facilitate easy training, this method is designed to compute a loss function defined by a user.
The input is comprised of everything required to perform a training update, including labels - you define the signature here! It is down to the user to ensure that inference can be performed without the presence of these labels. Hence, any inputs not available at inference time should only be used inside a conditional block.
The intended sketch of this method is as follows:
def forward(self, input1, input2, targets=None): .... .... output1 = self.layer1(input1) output2 = self.layer2(input2) output_dict = {"output1": output1, "output2": output2} if targets is not None: # Function returning a scalar torch.Tensor, defined by the user. loss = self._compute_loss(output1, output2, targets) output_dict["loss"] = loss return output_dict
- Parameters
- inputs:
Tensors comprising everything needed to perform a training update, including labels, which should be optional (i.e have a default value of
None
). At inference time, simply pass the relevant inputs, not including the labels.
- Returns
- output_dict:
Dict[str, torch.Tensor]
The outputs from the model. In order to train a model using the
Trainer
api, you must provide a “loss” key pointing to a scalartorch.Tensor
representing the loss to be optimized.
- output_dict:
-
get_metrics
(self, reset: bool = False) → Dict[str, float][source]¶ Returns a dictionary of metrics. This method will be called by
allennlp.training.Trainer
in order to compute and use model metrics for early stopping and model serialization. We return an empty dictionary here rather than raising as it is not required to implement metrics for a new model. A boolean reset parameter is passed, as frequently a metric accumulator will have some state which should be reset between epochs. This is also compatible withMetrics should be populated during the call to ``forward`
, with theMetric
handling the accumulation of the metric until this method is called.
-
-
allennlp.models.reading_comprehension.util.
get_best_span
(span_start_logits: torch.Tensor, span_end_logits: torch.Tensor) → torch.Tensor[source]¶ This acts the same as the static method
BidirectionalAttentionFlow.get_best_span()
inallennlp/models/reading_comprehension/bidaf.py
. We keep it here so that users can directly import this function without the class.We call the inputs “logits” - they could either be unnormalized logits or normalized log probabilities. A log_softmax operation is a constant shifting of the entire logit vector, so taking an argmax over either one gives the same result.