class allennlp.models.biattentive_classification_network.BiattentiveClassificationNetwork(vocab:, text_field_embedder: allennlp.modules.text_field_embedders.text_field_embedder.TextFieldEmbedder, embedding_dropout: float, pre_encode_feedforward: allennlp.modules.feedforward.FeedForward, encoder: allennlp.modules.seq2seq_encoders.seq2seq_encoder.Seq2SeqEncoder, integrator: allennlp.modules.seq2seq_encoders.seq2seq_encoder.Seq2SeqEncoder, integrator_dropout: float, output_layer: Union[allennlp.modules.feedforward.FeedForward, allennlp.modules.maxout.Maxout], elmo: allennlp.modules.elmo.Elmo, use_input_elmo: bool = False, use_integrator_output_elmo: bool = False, initializer: allennlp.nn.initializers.InitializerApplicator = <allennlp.nn.initializers.InitializerApplicator object>, regularizer: Optional[allennlp.nn.regularizers.regularizer_applicator.RegularizerApplicator] = None)[source]

Bases: allennlp.models.model.Model

This class implements the Biattentive Classification Network model described in section 5 of Learned in Translation: Contextualized Word Vectors (NIPS 2017) for text classification. We assume we’re given a piece of text, and we predict some output label.

At a high level, the model starts by embedding the tokens and running them through a feed-forward neural net (pre_encode_feedforward). Then, we encode these representations with a Seq2SeqEncoder (encoder). We run biattention on the encoder output representations (self-attention in this case, since the two representations that typically go into biattention are identical) and get out an attentive vector representation of the text. We combine this text representation with the encoder outputs computed earlier, and then run this through yet another Seq2SeqEncoder (the integrator). Lastly, we take the output of the integrator and max, min, mean, and self-attention pool to create a final representation, which is passed through a maxout network or some feed-forward layers to output a classification (output_layer).

vocabVocabulary, required

A Vocabulary, required in order to compute sizes for input/output projections.

text_field_embedderTextFieldEmbedder, required

Used to embed the tokens TextField we get as input to the model.


The amount of dropout to apply on the embeddings.


A feedforward network that is run on the embedded tokens before they are passed to the encoder.


The encoder to use on the tokens.


The encoder to use when integrating the attentive text encoding with the token encodings.


The amount of dropout to apply on integrator output.

output_layerUnion[Maxout, FeedForward]

The maxout or feed forward network that takes the final representations and produces a classification prediction.

elmoElmo, optional (default=``None``)

If provided, will be used to concatenate pretrained ELMo representations to either the integrator output (use_integrator_output_elmo) or the input (use_input_elmo).

use_input_elmobool (default=``False``)

If true, concatenate pretrained ELMo representations to the input vectors.

use_integrator_output_elmobool (default=``False``)

If true, concatenate pretrained ELMo representations to the integrator output.

initializerInitializerApplicator, optional (default=``InitializerApplicator()``)

Used to initialize the model parameters.

regularizerRegularizerApplicator, optional (default=``None``)

If provided, will be used to calculate the regularization penalty during training.

decode(self, output_dict: Dict[str, torch.Tensor]) → Dict[str, torch.Tensor][source]

Does a simple argmax over the class probabilities, converts indices to string labels, and adds a "label" key to the dictionary with the result.

forward(self, tokens: Dict[str, torch.LongTensor], label: torch.LongTensor = None) → Dict[str, torch.Tensor][source]
tokensDict[str, torch.LongTensor], required

The output of TextField.as_array().

labeltorch.LongTensor, optional (default = None)

A variable representing the label for each instance in the batch.

An output dictionary consisting of:

A tensor of shape (batch_size, num_classes) representing a distribution over the label classes for each instance.

losstorch.FloatTensor, optional

A scalar loss to be optimised.

classmethod from_params(vocab:, params: allennlp.common.params.Params) → 'BiattentiveClassificationNetwork'[source]

This is the automatic implementation of from_params. Any class that subclasses FromParams (or Registrable, which itself subclasses FromParams) gets this implementation for free. If you want your class to be instantiated from params in the “obvious” way – pop off parameters and hand them to your constructor with the same names – this provides that functionality.

If you need more complex logic in your from from_params method, you’ll have to implement your own method that overrides this one.

get_metrics(self, reset: bool = False) → Dict[str, float][source]

Returns a dictionary of metrics. This method will be called by in order to compute and use model metrics for early stopping and model serialization. We return an empty dictionary here rather than raising as it is not required to implement metrics for a new model. A boolean reset parameter is passed, as frequently a metric accumulator will have some state which should be reset between epochs. This is also compatible with Metrics should be populated during the call to ``forward`, with the Metric handling the accumulation of the metric until this method is called.