allennlp.modules.seq2seq_decoders¶
Modules that transform a sequence of encoded vectors into a sequence of output vectors.
The available Seq2Seq decoders are
-
class
allennlp.modules.seq2seq_decoders.seq_decoder.
SeqDecoder
(target_embedder: allennlp.modules.token_embedders.embedding.Embedding)[source]¶ Bases:
torch.nn.modules.module.Module
,allennlp.common.registrable.Registrable
A
SeqDecoder
abstract class representing the entire decoder (embedding and neural network) of a Seq2Seq architecture. This is meant to be used withallennlp.models.encoder_decoder.composed_seq2seq.ComposedSeq2Seq
.The implementation of this abstract class ideally uses a decoder neural net
allennlp.modules.seq2seq_decoders.decoder_net.DecoderNet
for decoding.The default_implementation
allennlp.modules.seq2seq_decoders.seq_decoder.auto_regressive_seq_decoder.AutoRegressiveSeqDecoder
covers most use cases. More likely that we will use the default implementation instead of creating a new implementation.- Parameters
- target_embedder
Embedding
Embedder for target tokens. Needed in the base class to enable weight tying.
- target_embedder
-
default_implementation
: str = 'auto_regressive_seq_decoder'¶
-
forward
(self, encoder_out: Dict[str, torch.LongTensor], target_tokens: Union[Dict[str, torch.LongTensor], NoneType] = None) → Dict[str, torch.Tensor][source]¶ Decoding from encoded states to sequence of outputs also computes loss if
target_tokens
are given.- Parameters
- encoder_out
Dict[str, torch.LongTensor]
, required Dictionary with encoded state, ideally containing the encoded vectors and the source mask.
- target_tokens
Dict[str, torch.LongTensor]
, optional The output of TextField.as_array() applied on the target TextField.
- encoder_out
-
get_metrics
(self, reset: bool = False) → Dict[str, float][source]¶ The decoder is responsible for computing metrics using the target tokens.
-
get_output_dim
(self) → int[source]¶ The dimension of each timestep of the hidden state in the layer before final softmax. Needed to check whether the model is compaitble for embedding-final layer weight tying.
-
post_process
(self, output_dict: Dict[str, torch.Tensor]) → Dict[str, torch.Tensor][source]¶ Post processing for converting raw outputs to prediction during inference. The composing models such
allennlp.models.encoder_decoders.composed_seq2seq.ComposedSeq2Seq
can call this method when decode is called.
-
class
allennlp.modules.seq2seq_decoders.auto_regressive_seq_decoder.
AutoRegressiveSeqDecoder
(vocab: allennlp.data.vocabulary.Vocabulary, decoder_net: allennlp.modules.seq2seq_decoders.decoder_net.DecoderNet, max_decoding_steps: int, target_embedder: allennlp.modules.token_embedders.embedding.Embedding, target_namespace: str = 'tokens', tie_output_embedding: bool = False, scheduled_sampling_ratio: float = 0, label_smoothing_ratio: Optional[float] = None, beam_size: int = 4, tensor_based_metric: allennlp.training.metrics.metric.Metric = None, token_based_metric: allennlp.training.metrics.metric.Metric = None)[source]¶ Bases:
allennlp.modules.seq2seq_decoders.seq_decoder.SeqDecoder
An autoregressive decoder that can be used for most seq2seq tasks.
- Parameters
- vocab
Vocabulary
, required Vocabulary containing source and target vocabularies. They may be under the same namespace (tokens) or the target tokens can have a different namespace, in which case it needs to be specified as target_namespace.
- decoder_net
DecoderNet
, required Module that contains implementation of neural network for decoding output elements
- max_decoding_steps
int
Maximum length of decoded sequences.
- target_embedder
Embedding
Embedder for target tokens.
- target_namespace
str
, optional (default = ‘target_tokens’) If the target side vocabulary is different from the source side’s, you need to specify the target’s namespace here. If not, we’ll assume it is “tokens”, which is also the default choice for the source side, and this might cause them to share vocabularies.
- beam_size
int
, optional (default = 4) Width of the beam for beam search.
- tensor_based_metric
Metric
, optional (default = None) A metric to track on validation data that takes raw tensors when its called. This metric must accept two arguments when called: a batched tensor of predicted token indices, and a batched tensor of gold token indices.
- token_based_metric
Metric
, optional (default = None) A metric to track on validation data that takes lists of lists of tokens as input. This metric must accept two arguments when called, both of type List[List[str]]. The first is a predicted sequence for each item in the batch and the second is a gold sequence for each item in the batch.
- scheduled_sampling_ratio
float
optional (default = 0) Defines ratio between teacher forced training and real output usage. If its zero (teacher forcing only) and decoder_net`supports parallel decoding, we get the output predictions in a single forward pass of the `decoder_net.
- vocab
-
forward
(self, encoder_out: Dict[str, torch.LongTensor], target_tokens: Dict[str, torch.LongTensor] = None) → Dict[str, torch.Tensor][source]¶ Decoding from encoded states to sequence of outputs also computes loss if
target_tokens
are given.- Parameters
- encoder_out
Dict[str, torch.LongTensor]
, required Dictionary with encoded state, ideally containing the encoded vectors and the source mask.
- target_tokens
Dict[str, torch.LongTensor]
, optional The output of TextField.as_array() applied on the target TextField.
- encoder_out
-
get_metrics
(self, reset: bool = False) → Dict[str, float][source]¶ The decoder is responsible for computing metrics using the target tokens.
-
get_output_dim
(self)[source]¶ The dimension of each timestep of the hidden state in the layer before final softmax. Needed to check whether the model is compaitble for embedding-final layer weight tying.
-
post_process
(self, output_dict: Dict[str, torch.Tensor]) → Dict[str, torch.Tensor][source]¶ This method trims the output predictions to the first end symbol, replaces indices with corresponding tokens, and adds a field called
predicted_tokens
to theoutput_dict
.
-
take_step
(self, last_predictions: torch.Tensor, state: Dict[str, torch.Tensor]) → Tuple[torch.Tensor, Dict[str, torch.Tensor]][source]¶ Take a decoding step. This is called by the beam search class.
- Parameters
- last_predictions
torch.Tensor
A tensor of shape
(group_size,)
, which gives the indices of the predictions during the last time step.- state
Dict[str, torch.Tensor]
A dictionary of tensors that contain the current state information needed to predict the next step, which includes the encoder outputs, the source mask, and the decoder hidden state and context. Each of these tensors has shape
(group_size, *)
, where*
can be any other number of dimensions.
- last_predictions
- Returns
- Tuple[torch.Tensor, Dict[str, torch.Tensor]]
A tuple of
(log_probabilities, updated_state)
, wherelog_probabilities
is a tensor of shape(group_size, num_classes)
containing the predicted log probability of each class for the next step, for each item in the group, whileupdated_state
is a dictionary of tensors containing the encoder outputs, source mask, and updated decoder hidden state and context.
Notes
We treat the inputs as a batch, even though
group_size
is not necessarily equal tobatch_size
, since the group may contain multiple states for each source sentence in the batch.
-
class
allennlp.modules.seq2seq_decoders.decoder_net.
DecoderNet
(decoding_dim: int, target_embedding_dim: int, decodes_parallel: bool)[source]¶ Bases:
torch.nn.modules.module.Module
,allennlp.common.registrable.Registrable
This class abstracts the neural architectures for decoding the encoded states and embedded previous step prediction vectors into a new sequence of output vectors.
The implementations of
DecoderNet
is used by implementations ofallennlp.modules.seq2seq_decoders.seq_decoder.SeqDecoder
such asallennlp.modules.seq2seq_decoders.seq_decoder.auto_regressive_seq_decoder.AutoRegressiveSeqDecoder
.The outputs of this module would be likely used by
allennlp.modules.seq2seq_decoders.seq_decoder.SeqDecoder
to apply the final output feedforward layer and softmax.- Parameters
- decoding_dim
int
, required Defines dimensionality of output vectors.
- target_embedding_dim
int
, required Defines dimensionality of target embeddings. Since this model takes it’s output on a previous step as input of following step, this is also an input dimensionality.
- decodes_parallel
bool
, required Defines whether the decoder generates multiple next step predictions at in a single forward.
- decoding_dim
-
forward
(self, previous_state: Dict[str, torch.Tensor], encoder_outputs: torch.Tensor, source_mask: torch.Tensor, previous_steps_predictions: torch.Tensor, previous_steps_mask: Union[torch.Tensor, NoneType] = None) → Tuple[Dict[str, torch.Tensor], torch.Tensor][source]¶ Performs a decoding step, and returns dictionary with decoder hidden state or cache and the decoder output. The decoder output is a 3d tensor (group_size, steps_count, decoder_output_dim) if self.decodes_parallel is True, else it is a 2d tensor with (group_size, decoder_output_dim).
- Parameters
- previous_steps_predictions
torch.Tensor
, required Embeddings of predictions on previous step. Shape: (group_size, steps_count, decoder_output_dim)
- encoder_outputs
torch.Tensor
, required Vectors of all encoder outputs. Shape: (group_size, max_input_sequence_length, encoder_output_dim)
- source_mask
torch.Tensor
, required This tensor contains mask for each input sequence. Shape: (group_size, max_input_sequence_length)
- previous_state
Dict[str, torch.Tensor]
, required previous state of decoder
- previous_steps_predictions
- Returns
- Tuple[Dict[str, torch.Tensor], torch.Tensor]
- Tuple of new decoder state and decoder output. Output should be used to generate out sequence elements
-
get_output_dim
(self) → int[source]¶ Returns the dimension of each vector in the sequence output by this
DecoderNet
. This is not the shape of the returned tensor, but the last element of that shape.
-
init_decoder_state
(self, encoder_out: Dict[str, torch.LongTensor]) → Dict[str, torch.Tensor][source]¶ Initialize the encoded state to be passed to the first decoding time step.
- Parameters
- batch_size
int
Size of batch
- final_encoder_output
torch.Tensor
Last state of the Encoder
- batch_size
- Returns
Dict[str, torch.Tensor]
- Initial state
-
class
allennlp.modules.seq2seq_decoders.lstm_cell_decoder_net.
LstmCellDecoderNet
(decoding_dim: int, target_embedding_dim: int, attention: Optional[allennlp.modules.attention.attention.Attention] = None, bidirectional_input: bool = False)[source]¶ Bases:
allennlp.modules.seq2seq_decoders.decoder_net.DecoderNet
This decoder net implements simple decoding network with LSTMCell and Attention.
- Parameters
- decoding_dim
int
, required Defines dimensionality of output vectors.
- target_embedding_dim
int
, required Defines dimensionality of input target embeddings. Since this model takes it’s output on a previous step as input of following step, this is also an input dimensionality.
- attention
Attention
, optional (default = None) If you want to use attention to get a dynamic summary of the encoder outputs at each step of decoding, this is the function used to compute similarity between the decoder hidden state and encoder outputs.
- decoding_dim
-
forward
(self, previous_state: Dict[str, torch.Tensor], encoder_outputs: torch.Tensor, source_mask: torch.Tensor, previous_steps_predictions: torch.Tensor, previous_steps_mask: Union[torch.Tensor, NoneType] = None) → Tuple[Dict[str, torch.Tensor], torch.Tensor][source]¶ Performs a decoding step, and returns dictionary with decoder hidden state or cache and the decoder output. The decoder output is a 3d tensor (group_size, steps_count, decoder_output_dim) if self.decodes_parallel is True, else it is a 2d tensor with (group_size, decoder_output_dim).
- Parameters
- previous_steps_predictions
torch.Tensor
, required Embeddings of predictions on previous step. Shape: (group_size, steps_count, decoder_output_dim)
- encoder_outputs
torch.Tensor
, required Vectors of all encoder outputs. Shape: (group_size, max_input_sequence_length, encoder_output_dim)
- source_mask
torch.Tensor
, required This tensor contains mask for each input sequence. Shape: (group_size, max_input_sequence_length)
- previous_state
Dict[str, torch.Tensor]
, required previous state of decoder
- previous_steps_predictions
- Returns
- Tuple[Dict[str, torch.Tensor], torch.Tensor]
- Tuple of new decoder state and decoder output. Output should be used to generate out sequence elements
-
init_decoder_state
(self, encoder_out: Dict[str, torch.LongTensor]) → Dict[str, torch.Tensor][source]¶ Initialize the encoded state to be passed to the first decoding time step.
- Parameters
- batch_size
int
Size of batch
- final_encoder_output
torch.Tensor
Last state of the Encoder
- batch_size
- Returns
Dict[str, torch.Tensor]
- Initial state
-
class
allennlp.modules.seq2seq_decoders.stacked_self_attention_decoder_net.
Decoder
(layer: torch.nn.modules.module.Module, num_layers: int)[source]¶ Bases:
torch.nn.modules.module.Module
Transformer N layer decoder with masking. Code taken from http://nlp.seas.harvard.edu/2018/04/03/attention.html
-
forward
(self, x: torch.Tensor, memory: torch.Tensor, src_mask: torch.Tensor, tgt_mask: torch.Tensor) → torch.Tensor[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
-
class
allennlp.modules.seq2seq_decoders.stacked_self_attention_decoder_net.
DecoderLayer
(size: int, self_attn: allennlp.modules.seq2seq_encoders.bidirectional_language_model_transformer.MultiHeadedAttention, src_attn: allennlp.modules.seq2seq_encoders.bidirectional_language_model_transformer.MultiHeadedAttention, feed_forward: <module 'torch.nn.functional' from '/Users/michael/miniconda3/envs/allennlp090/lib/python3.7/site-packages/torch/nn/functional.py'>, dropout: float)[source]¶ Bases:
torch.nn.modules.module.Module
A single layer of transformer decoder. Code taken from http://nlp.seas.harvard.edu/2018/04/03/attention.html
-
class
allennlp.modules.seq2seq_decoders.stacked_self_attention_decoder_net.
StackedSelfAttentionDecoderNet
(decoding_dim: int, target_embedding_dim: int, feedforward_hidden_dim: int, num_layers: int, num_attention_heads: int, use_positional_encoding: bool = True, positional_encoding_max_steps: int = 5000, dropout_prob: float = 0.1, residual_dropout_prob: float = 0.2, attention_dropout_prob: float = 0.1)[source]¶ Bases:
allennlp.modules.seq2seq_decoders.decoder_net.DecoderNet
A Stacked self-attention decoder implementation.
- Parameters
- decoding_dim
int
, required Defines dimensionality of output vectors.
- target_embedding_dim
int
, required Defines dimensionality of input target embeddings. Since this model takes it’s output on a previous step as input of following step, this is also an input dimensionality.
- feedforward_hidden_dim
int
, required. The middle dimension of the FeedForward network. The input and output dimensions are fixed to ensure sizes match up for the self attention layers.
- num_layers
int
, required. The number of stacked self attention -> feedfoward -> layer normalisation blocks.
- num_attention_heads
int
, required. The number of attention heads to use per layer.
- use_positional_encoding: ``bool``, optional, (default = True)
Whether to add sinusoidal frequencies to the input tensor. This is strongly recommended, as without this feature, the self attention layers have no idea of absolute or relative position (as they are just computing pairwise similarity between vectors of elements), which can be important features for many tasks.
- dropout_prob
float
, optional, (default = 0.1) The dropout probability for the feedforward network.
- residual_dropout_prob
float
, optional, (default = 0.2) The dropout probability for the residual connections.
- attention_dropout_prob
float
, optional, (default = 0.1) The dropout probability for the attention distributions in each attention layer.
- decoding_dim
-
forward
(self, previous_state: Dict[str, torch.Tensor], encoder_outputs: torch.Tensor, source_mask: torch.Tensor, previous_steps_predictions: torch.Tensor, previous_steps_mask: Union[torch.Tensor, NoneType] = None) → Tuple[Dict[str, torch.Tensor], torch.Tensor][source]¶ Performs a decoding step, and returns dictionary with decoder hidden state or cache and the decoder output. The decoder output is a 3d tensor (group_size, steps_count, decoder_output_dim) if self.decodes_parallel is True, else it is a 2d tensor with (group_size, decoder_output_dim).
- Parameters
- previous_steps_predictions
torch.Tensor
, required Embeddings of predictions on previous step. Shape: (group_size, steps_count, decoder_output_dim)
- encoder_outputs
torch.Tensor
, required Vectors of all encoder outputs. Shape: (group_size, max_input_sequence_length, encoder_output_dim)
- source_mask
torch.Tensor
, required This tensor contains mask for each input sequence. Shape: (group_size, max_input_sequence_length)
- previous_state
Dict[str, torch.Tensor]
, required previous state of decoder
- previous_steps_predictions
- Returns
- Tuple[Dict[str, torch.Tensor], torch.Tensor]
- Tuple of new decoder state and decoder output. Output should be used to generate out sequence elements
-
init_decoder_state
(self, encoder_out: Dict[str, torch.LongTensor]) → Dict[str, torch.Tensor][source]¶ Initialize the encoded state to be passed to the first decoding time step.
- Parameters
- batch_size
int
Size of batch
- final_encoder_output
torch.Tensor
Last state of the Encoder
- batch_size
- Returns
Dict[str, torch.Tensor]
- Initial state