allennlp.modules.seq2seq_decoders¶

Modules that transform a sequence of encoded vectors into a sequence of output vectors.

The available Seq2Seq decoders are

"auto_regressive_seq_decoder"

class allennlp.modules.seq2seq_decoders.seq_decoder.SeqDecoder(target_embedder: allennlp.modules.token_embedders.embedding.Embedding)[source]¶

Bases: torch.nn.modules.module.Module, allennlp.common.registrable.Registrable

A SeqDecoder abstract class representing the entire decoder (embedding and neural network) of a Seq2Seq architecture. This is meant to be used with allennlp.models.encoder_decoder.composed_seq2seq.ComposedSeq2Seq.

The implementation of this abstract class ideally uses a decoder neural net allennlp.modules.seq2seq_decoders.decoder_net.DecoderNet for decoding.

The default_implementation allennlp.modules.seq2seq_decoders.seq_decoder.auto_regressive_seq_decoder.AutoRegressiveSeqDecoder covers most use cases. More likely that we will use the default implementation instead of creating a new implementation.

Parameters

target_embedderEmbedding: Embedder for target tokens. Needed in the base class to enable weight tying.

default_implementation: str = 'auto_regressive_seq_decoder'¶

forward(self, encoder_out: Dict[str, torch.LongTensor], target_tokens: Union[Dict[str, torch.LongTensor], NoneType] = None) → Dict[str, torch.Tensor][source]¶

Decoding from encoded states to sequence of outputs also computes loss if target_tokens are given.

Parameters

encoder_outDict[str, torch.LongTensor], required: Dictionary with encoded state, ideally containing the encoded vectors and the source mask.
target_tokensDict[str, torch.LongTensor], optional: The output of TextField.as_array() applied on the target TextField.

get_metrics(self, reset: bool = False) → Dict[str, float][source]¶: The decoder is responsible for computing metrics using the target tokens.

get_output_dim(self) → int[source]¶: The dimension of each timestep of the hidden state in the layer before final softmax. Needed to check whether the model is compaitble for embedding-final layer weight tying.

post_process(self, output_dict: Dict[str, torch.Tensor]) → Dict[str, torch.Tensor][source]¶: Post processing for converting raw outputs to prediction during inference. The composing models such allennlp.models.encoder_decoders.composed_seq2seq.ComposedSeq2Seq can call this method when decode is called.

class allennlp.modules.seq2seq_decoders.auto_regressive_seq_decoder.AutoRegressiveSeqDecoder(vocab: allennlp.data.vocabulary.Vocabulary, decoder_net: allennlp.modules.seq2seq_decoders.decoder_net.DecoderNet, max_decoding_steps: int, target_embedder: allennlp.modules.token_embedders.embedding.Embedding, target_namespace: str = 'tokens', tie_output_embedding: bool = False, scheduled_sampling_ratio: float = 0, label_smoothing_ratio: Optional[float] = None, beam_size: int = 4, tensor_based_metric: allennlp.training.metrics.metric.Metric = None, token_based_metric: allennlp.training.metrics.metric.Metric = None)[source]¶

Bases: allennlp.modules.seq2seq_decoders.seq_decoder.SeqDecoder

An autoregressive decoder that can be used for most seq2seq tasks.

Parameters

vocabVocabulary, required: Vocabulary containing source and target vocabularies. They may be under the same namespace (tokens) or the target tokens can have a different namespace, in which case it needs to be specified as target_namespace.
decoder_netDecoderNet, required: Module that contains implementation of neural network for decoding output elements
max_decoding_stepsint: Maximum length of decoded sequences.
target_embedderEmbedding: Embedder for target tokens.
target_namespacestr, optional (default = ‘target_tokens’): If the target side vocabulary is different from the source side’s, you need to specify the target’s namespace here. If not, we’ll assume it is “tokens”, which is also the default choice for the source side, and this might cause them to share vocabularies.
beam_sizeint, optional (default = 4): Width of the beam for beam search.
tensor_based_metricMetric, optional (default = None): A metric to track on validation data that takes raw tensors when its called. This metric must accept two arguments when called: a batched tensor of predicted token indices, and a batched tensor of gold token indices.
token_based_metricMetric, optional (default = None): A metric to track on validation data that takes lists of lists of tokens as input. This metric must accept two arguments when called, both of type List[List[str]]. The first is a predicted sequence for each item in the batch and the second is a gold sequence for each item in the batch.
scheduled_sampling_ratiofloat optional (default = 0): Defines ratio between teacher forced training and real output usage. If its zero (teacher forcing only) and decoder_net`supports parallel decoding, we get the output predictions in a single forward pass of the `decoder_net.

forward(self, encoder_out: Dict[str, torch.LongTensor], target_tokens: Dict[str, torch.LongTensor] = None) → Dict[str, torch.Tensor][source]¶

Decoding from encoded states to sequence of outputs also computes loss if target_tokens are given.

Parameters

encoder_outDict[str, torch.LongTensor], required: Dictionary with encoded state, ideally containing the encoded vectors and the source mask.
target_tokensDict[str, torch.LongTensor], optional: The output of TextField.as_array() applied on the target TextField.

get_metrics(self, reset: bool = False) → Dict[str, float][source]¶: The decoder is responsible for computing metrics using the target tokens.

get_output_dim(self)[source]¶: The dimension of each timestep of the hidden state in the layer before final softmax. Needed to check whether the model is compaitble for embedding-final layer weight tying.

post_process(self, output_dict: Dict[str, torch.Tensor]) → Dict[str, torch.Tensor][source]¶: This method trims the output predictions to the first end symbol, replaces indices with corresponding tokens, and adds a field called predicted_tokens to the output_dict.

take_step(self, last_predictions: torch.Tensor, state: Dict[str, torch.Tensor]) → Tuple[torch.Tensor, Dict[str, torch.Tensor]][source]¶

Take a decoding step. This is called by the beam search class.

Parameters

last_predictionstorch.Tensor: A tensor of shape (group_size,), which gives the indices of the predictions during the last time step.
stateDict[str, torch.Tensor]: A dictionary of tensors that contain the current state information needed to predict the next step, which includes the encoder outputs, the source mask, and the decoder hidden state and context. Each of these tensors has shape (group_size, *), where * can be any other number of dimensions.

Returns

Tuple[torch.Tensor, Dict[str, torch.Tensor]]: A tuple of (log_probabilities, updated_state), where log_probabilities is a tensor of shape (group_size, num_classes) containing the predicted log probability of each class for the next step, for each item in the group, while updated_state is a dictionary of tensors containing the encoder outputs, source mask, and updated decoder hidden state and context.

Notes

We treat the inputs as a batch, even though group_size is not necessarily equal to batch_size, since the group may contain multiple states for each source sentence in the batch.

class allennlp.modules.seq2seq_decoders.decoder_net.DecoderNet(decoding_dim: int, target_embedding_dim: int, decodes_parallel: bool)[source]¶

Bases: torch.nn.modules.module.Module, allennlp.common.registrable.Registrable

This class abstracts the neural architectures for decoding the encoded states and embedded previous step prediction vectors into a new sequence of output vectors.

The implementations of DecoderNet is used by implementations of allennlp.modules.seq2seq_decoders.seq_decoder.SeqDecoder such as allennlp.modules.seq2seq_decoders.seq_decoder.auto_regressive_seq_decoder.AutoRegressiveSeqDecoder.

The outputs of this module would be likely used by allennlp.modules.seq2seq_decoders.seq_decoder.SeqDecoder to apply the final output feedforward layer and softmax.

Parameters

decoding_dimint, required: Defines dimensionality of output vectors.
target_embedding_dimint, required: Defines dimensionality of target embeddings. Since this model takes it’s output on a previous step as input of following step, this is also an input dimensionality.
decodes_parallelbool, required: Defines whether the decoder generates multiple next step predictions at in a single forward.

forward(self, previous_state: Dict[str, torch.Tensor], encoder_outputs: torch.Tensor, source_mask: torch.Tensor, previous_steps_predictions: torch.Tensor, previous_steps_mask: Union[torch.Tensor, NoneType] = None) → Tuple[Dict[str, torch.Tensor], torch.Tensor][source]¶

Performs a decoding step, and returns dictionary with decoder hidden state or cache and the decoder output. The decoder output is a 3d tensor (group_size, steps_count, decoder_output_dim) if self.decodes_parallel is True, else it is a 2d tensor with (group_size, decoder_output_dim).

Parameters

previous_steps_predictionstorch.Tensor, required: Embeddings of predictions on previous step. Shape: (group_size, steps_count, decoder_output_dim)
encoder_outputstorch.Tensor, required: Vectors of all encoder outputs. Shape: (group_size, max_input_sequence_length, encoder_output_dim)
source_masktorch.Tensor, required: This tensor contains mask for each input sequence. Shape: (group_size, max_input_sequence_length)
previous_stateDict[str, torch.Tensor], required: previous state of decoder

Returns

Tuple[Dict[str, torch.Tensor], torch.Tensor]
Tuple of new decoder state and decoder output. Output should be used to generate out sequence elements

get_output_dim(self) → int[source]¶: Returns the dimension of each vector in the sequence output by this DecoderNet. This is not the shape of the returned tensor, but the last element of that shape.

init_decoder_state(self, encoder_out: Dict[str, torch.LongTensor]) → Dict[str, torch.Tensor][source]¶

Initialize the encoded state to be passed to the first decoding time step.

Parameters

batch_sizeint: Size of batch
final_encoder_outputtorch.Tensor: Last state of the Encoder

Returns

Dict[str, torch.Tensor]
Initial state

class allennlp.modules.seq2seq_decoders.lstm_cell_decoder_net.LstmCellDecoderNet(decoding_dim: int, target_embedding_dim: int, attention: Optional[allennlp.modules.attention.attention.Attention] = None, bidirectional_input: bool = False)[source]¶

Bases: allennlp.modules.seq2seq_decoders.decoder_net.DecoderNet

This decoder net implements simple decoding network with LSTMCell and Attention.

Parameters

decoding_dimint, required: Defines dimensionality of output vectors.
target_embedding_dimint, required: Defines dimensionality of input target embeddings. Since this model takes it’s output on a previous step as input of following step, this is also an input dimensionality.
attentionAttention, optional (default = None): If you want to use attention to get a dynamic summary of the encoder outputs at each step of decoding, this is the function used to compute similarity between the decoder hidden state and encoder outputs.

forward(self, previous_state: Dict[str, torch.Tensor], encoder_outputs: torch.Tensor, source_mask: torch.Tensor, previous_steps_predictions: torch.Tensor, previous_steps_mask: Union[torch.Tensor, NoneType] = None) → Tuple[Dict[str, torch.Tensor], torch.Tensor][source]¶

Performs a decoding step, and returns dictionary with decoder hidden state or cache and the decoder output. The decoder output is a 3d tensor (group_size, steps_count, decoder_output_dim) if self.decodes_parallel is True, else it is a 2d tensor with (group_size, decoder_output_dim).

Parameters

previous_steps_predictionstorch.Tensor, required: Embeddings of predictions on previous step. Shape: (group_size, steps_count, decoder_output_dim)
encoder_outputstorch.Tensor, required: Vectors of all encoder outputs. Shape: (group_size, max_input_sequence_length, encoder_output_dim)
source_masktorch.Tensor, required: This tensor contains mask for each input sequence. Shape: (group_size, max_input_sequence_length)
previous_stateDict[str, torch.Tensor], required: previous state of decoder

Returns

Tuple[Dict[str, torch.Tensor], torch.Tensor]
Tuple of new decoder state and decoder output. Output should be used to generate out sequence elements

init_decoder_state(self, encoder_out: Dict[str, torch.LongTensor]) → Dict[str, torch.Tensor][source]¶

Initialize the encoded state to be passed to the first decoding time step.

Parameters

batch_sizeint: Size of batch
final_encoder_outputtorch.Tensor: Last state of the Encoder

Returns

Dict[str, torch.Tensor]
Initial state

class allennlp.modules.seq2seq_decoders.stacked_self_attention_decoder_net.Decoder(layer: torch.nn.modules.module.Module, num_layers: int)[source]¶

Bases: torch.nn.modules.module.Module

Transformer N layer decoder with masking. Code taken from http://nlp.seas.harvard.edu/2018/04/03/attention.html

forward(self, x: torch.Tensor, memory: torch.Tensor, src_mask: torch.Tensor, tgt_mask: torch.Tensor) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class allennlp.modules.seq2seq_decoders.stacked_self_attention_decoder_net.DecoderLayer(size: int, self_attn: allennlp.modules.seq2seq_encoders.bidirectional_language_model_transformer.MultiHeadedAttention, src_attn: allennlp.modules.seq2seq_encoders.bidirectional_language_model_transformer.MultiHeadedAttention, feed_forward: <module 'torch.nn.functional' from '/Users/michael/miniconda3/envs/allennlp090/lib/python3.7/site-packages/torch/nn/functional.py'>, dropout: float)[source]¶

Bases: torch.nn.modules.module.Module

A single layer of transformer decoder. Code taken from http://nlp.seas.harvard.edu/2018/04/03/attention.html

forward(self, x: torch.Tensor, memory: torch.Tensor, src_mask: torch.Tensor, tgt_mask: torch.Tensor) → torch.Tensor[source]¶: Follow Figure 1 (right) for connections.

class allennlp.modules.seq2seq_decoders.stacked_self_attention_decoder_net.StackedSelfAttentionDecoderNet(decoding_dim: int, target_embedding_dim: int, feedforward_hidden_dim: int, num_layers: int, num_attention_heads: int, use_positional_encoding: bool = True, positional_encoding_max_steps: int = 5000, dropout_prob: float = 0.1, residual_dropout_prob: float = 0.2, attention_dropout_prob: float = 0.1)[source]¶

Bases: allennlp.modules.seq2seq_decoders.decoder_net.DecoderNet

A Stacked self-attention decoder implementation.

Parameters

decoding_dimint, required: Defines dimensionality of output vectors.
target_embedding_dimint, required: Defines dimensionality of input target embeddings. Since this model takes it’s output on a previous step as input of following step, this is also an input dimensionality.
feedforward_hidden_dimint, required.: The middle dimension of the FeedForward network. The input and output dimensions are fixed to ensure sizes match up for the self attention layers.
num_layersint, required.: The number of stacked self attention -> feedfoward -> layer normalisation blocks.
num_attention_headsint, required.: The number of attention heads to use per layer.
use_positional_encoding: ``bool``, optional, (default = True): Whether to add sinusoidal frequencies to the input tensor. This is strongly recommended, as without this feature, the self attention layers have no idea of absolute or relative position (as they are just computing pairwise similarity between vectors of elements), which can be important features for many tasks.
dropout_probfloat, optional, (default = 0.1): The dropout probability for the feedforward network.
residual_dropout_probfloat, optional, (default = 0.2): The dropout probability for the residual connections.
attention_dropout_probfloat, optional, (default = 0.1): The dropout probability for the attention distributions in each attention layer.

forward(self, previous_state: Dict[str, torch.Tensor], encoder_outputs: torch.Tensor, source_mask: torch.Tensor, previous_steps_predictions: torch.Tensor, previous_steps_mask: Union[torch.Tensor, NoneType] = None) → Tuple[Dict[str, torch.Tensor], torch.Tensor][source]¶

Performs a decoding step, and returns dictionary with decoder hidden state or cache and the decoder output. The decoder output is a 3d tensor (group_size, steps_count, decoder_output_dim) if self.decodes_parallel is True, else it is a 2d tensor with (group_size, decoder_output_dim).

Parameters

previous_steps_predictionstorch.Tensor, required: Embeddings of predictions on previous step. Shape: (group_size, steps_count, decoder_output_dim)
encoder_outputstorch.Tensor, required: Vectors of all encoder outputs. Shape: (group_size, max_input_sequence_length, encoder_output_dim)
source_masktorch.Tensor, required: This tensor contains mask for each input sequence. Shape: (group_size, max_input_sequence_length)
previous_stateDict[str, torch.Tensor], required: previous state of decoder

Returns

Tuple[Dict[str, torch.Tensor], torch.Tensor]
Tuple of new decoder state and decoder output. Output should be used to generate out sequence elements

init_decoder_state(self, encoder_out: Dict[str, torch.LongTensor]) → Dict[str, torch.Tensor][source]¶

Initialize the encoded state to be passed to the first decoding time step.

Parameters

batch_sizeint: Size of batch
final_encoder_outputtorch.Tensor: Last state of the Encoder

Returns

Dict[str, torch.Tensor]
Initial state