decomposable_attention
allennlp_models.pair_classification.models.decomposable_attention
DecomposableAttention#
@Model.register("decomposable_attention")
class DecomposableAttention(Model):
| def __init__(
| self,
| vocab: Vocabulary,
| text_field_embedder: TextFieldEmbedder,
| attend_feedforward: FeedForward,
| matrix_attention: MatrixAttention,
| compare_feedforward: FeedForward,
| aggregate_feedforward: FeedForward,
| premise_encoder: Optional[Seq2SeqEncoder] = None,
| hypothesis_encoder: Optional[Seq2SeqEncoder] = None,
| initializer: InitializerApplicator = InitializerApplicator(),
| **kwargs
| ) -> None
This Model
implements the Decomposable Attention model described in A Decomposable
Attention Model for Natural Language Inference
by Parikh et al., 2016, with some optional enhancements before the decomposable attention
actually happens. Parikh's original model allowed for computing an "intra-sentence" attention
before doing the decomposable entailment step. We generalize this to any
Seq2SeqEncoder
that can be applied to
the premise and/or the hypothesis before computing entailment.
The basic outline of this model is to get an embedded representation of each word in the premise and hypothesis, align words between the two, compare the aligned phrases, and make a final entailment decision based on this aggregated comparison. Each step in this process uses a feedforward network to modify the representation.
Registered as a Model
with name "decomposable_attention".
Parameters¶
- vocab :
Vocabulary
- text_field_embedder :
TextFieldEmbedder
Used to embed thepremise
andhypothesis
TextFields
we get as input to the model. - attend_feedforward :
FeedForward
This feedforward network is applied to the encoded sentence representations before the similarity matrix is computed between words in the premise and words in the hypothesis. - matrix_attention :
MatrixAttention
This is the attention function used when computing the similarity matrix between words in the premise and words in the hypothesis. - compare_feedforward :
FeedForward
This feedforward network is applied to the aligned premise and hypothesis representations, individually. - aggregate_feedforward :
FeedForward
This final feedforward network is applied to the concatenated, summed result of thecompare_feedforward
network, and its output is used as the entailment class logits. - premise_encoder :
Seq2SeqEncoder
, optional (default =None
)
After embedding the premise, we can optionally apply an encoder. If this isNone
, we will do nothing. - hypothesis_encoder :
Seq2SeqEncoder
, optional (default =None
)
After embedding the hypothesis, we can optionally apply an encoder. If this isNone
, we will use thepremise_encoder
for the encoding (doing nothing ifpremise_encoder
is alsoNone
). - initializer :
InitializerApplicator
, optional (default =InitializerApplicator()
)
Used to initialize the model parameters.
forward#
class DecomposableAttention(Model):
| ...
| def forward(
| self,
| premise: TextFieldTensors,
| hypothesis: TextFieldTensors,
| label: torch.IntTensor = None,
| metadata: List[Dict[str, Any]] = None
| ) -> Dict[str, torch.Tensor]
Parameters¶
- premise :
TextFieldTensors
From aTextField
- hypothesis :
TextFieldTensors
From aTextField
- label :
torch.IntTensor
, optional (default =None
)
From aLabelField
- metadata :
List[Dict[str, Any]]
, optional (default =None
)
Metadata containing the original tokenization of the premise and hypothesis with 'premise_tokens' and 'hypothesis_tokens' keys respectively.
Returns¶
-
An output dictionary consisting of:
-
label_logits :
torch.FloatTensor
A tensor of shape(batch_size, num_labels)
representing unnormalised log probabilities of the entailment label. - label_probs :
torch.FloatTensor
A tensor of shape(batch_size, num_labels)
representing probabilities of the entailment label. - loss :
torch.FloatTensor
, optional
A scalar loss to be optimised.
get_metrics#
class DecomposableAttention(Model):
| ...
| def get_metrics(self, reset: bool = False) -> Dict[str, float]
make_output_human_readable#
class DecomposableAttention(Model):
| ...
| def make_output_human_readable(
| self,
| output_dict: Dict[str, torch.Tensor]
| ) -> Dict[str, torch.Tensor]
Does a simple argmax over the probabilities, converts index to string label, and
add "label"
key to the dictionary with the result.
default_predictor#
class DecomposableAttention(Model):
| ...
| default_predictor = "textual_entailment"