An LSTM with Recurrent Dropout and the option to use highway connections between layers. Based on PyText version (that was based on a previous AllenNLP version)


class AugmentedLSTMCell(torch.nn.Module):
 | def __init__(
 |     self,
 |     embed_dim: int,
 |     lstm_dim: int,
 |     use_highway: bool = True,
 |     use_bias: bool = True
 | )

AugmentedLSTMCell implements a AugmentedLSTM cell.


  • embed_dim : int
    The number of expected features in the input.
  • lstm_dim : int
    Number of features in the hidden state of the LSTM.
  • use_highway : bool, optional (default = True)
    If True we append a highway network to the outputs of the LSTM.
  • use_bias : bool, optional (default = True)
    If True we use a bias in our LSTM calculations, otherwise we don't.


  • input_linearity : nn.Module
    Fused weight matrix which computes a linear function over the input.
  • state_linearity : nn.Module
    Fused weight matrix which computes a linear function over the states.


class AugmentedLSTMCell(torch.nn.Module):
 | ...
 | def reset_parameters(self)

Use sensible default initializations for parameters.


class AugmentedLSTMCell(torch.nn.Module):
 | ...
 | def forward(
 |     self,
 |     x: torch.Tensor,
 |     states=Tuple[torch.Tensor, torch.Tensor],
 |     variational_dropout_mask: Optional[torch.BoolTensor] = None
 | ) -> Tuple[torch.Tensor, torch.Tensor]


DO NOT USE THIS LAYER DIRECTLY, instead use the AugmentedLSTM class


  • x : torch.Tensor
    Input tensor of shape (bsize x input_dim).
  • states : Tuple[torch.Tensor, torch.Tensor]
    Tuple of tensors containing the hidden state and the cell state of each element in the batch. Each of these tensors have a dimension of (bsize x nhid). Defaults to None.


  • Tuple[torch.Tensor, torch.Tensor]
    Returned states. Shape of each state is (bsize x nhid).


class AugmentedLstm(torch.nn.Module):
 | def __init__(
 |     self,
 |     input_size: int,
 |     hidden_size: int,
 |     go_forward: bool = True,
 |     recurrent_dropout_probability: float = 0.0,
 |     use_highway: bool = True,
 |     use_input_projection_bias: bool = True
 | )

AugmentedLstm implements a one-layer single directional AugmentedLSTM layer. AugmentedLSTM is an LSTM which optionally appends an optional highway network to the output layer. Furthermore the dropout controls the level of variational dropout done.


  • input_size : int
    The number of expected features in the input.
  • hidden_size : int
    Number of features in the hidden state of the LSTM. Defaults to 32.
  • go_forward : bool
    Whether to compute features left to right (forward) or right to left (backward).
  • recurrent_dropout_probability : float
    Variational dropout probability to use. Defaults to 0.0.
  • use_highway : bool
    If True we append a highway network to the outputs of the LSTM.
  • use_input_projection_bias : bool
    If True we use a bias in our LSTM calculations, otherwise we don't.


  • cell : AugmentedLSTMCell
    AugmentedLSTMCell that is applied at every timestep.


class AugmentedLstm(torch.nn.Module):
 | ...
 | def forward(
 |     self,
 |     inputs: PackedSequence,
 |     states: Optional[Tuple[torch.Tensor, torch.Tensor]] = None
 | ) -> Tuple[PackedSequence, Tuple[torch.Tensor, torch.Tensor]]

Warning: Would be better to use the BiAugmentedLstm class in a regular model

Given an input batch of sequential data such as word embeddings, produces a single layer unidirectional AugmentedLSTM representation of the sequential input and new state tensors.


  • inputs : PackedSequence
    bsize sequences of shape (len, input_dim) each, in PackedSequence format
  • states : Tuple[torch.Tensor, torch.Tensor]
    Tuple of tensors containing the initial hidden state and the cell state of each element in the batch. Each of these tensors have a dimension of (1 x bsize x nhid). Defaults to None.


  • Tuple[PackedSequence, Tuple[torch.Tensor, torch.Tensor]]
    AugmentedLSTM representation of input and the state of the LSTM t = seq_len. Shape of representation is (bsize x seq_len x representation_dim). Shape of each state is (1 x bsize x nhid).


class BiAugmentedLstm(torch.nn.Module):
 | def __init__(
 |     self,
 |     input_size: int,
 |     hidden_size: int,
 |     num_layers: int = 1,
 |     bias: bool = True,
 |     recurrent_dropout_probability: float = 0.0,
 |     bidirectional: bool = False,
 |     padding_value: float = 0.0,
 |     use_highway: bool = True
 | ) -> None

BiAugmentedLstm implements a generic AugmentedLSTM representation layer. BiAugmentedLstm is an LSTM which optionally appends an optional highway network to the output layer. Furthermore the dropout controls the level of variational dropout done.


  • input_size : int
    The dimension of the inputs to the LSTM.
  • hidden_size : int
    The dimension of the outputs of the LSTM.
  • num_layers : int
    Number of recurrent layers. Eg. setting num_layers=2 would mean stacking two LSTMs together to form a stacked LSTM, with the second LSTM taking in the outputs of the first LSTM and computing the final result. Defaults to 1.
  • bias : bool
    If True we use a bias in our LSTM calculations, otherwise we don't.
  • recurrent_dropout_probability : float, optional (default = 0.0)
    Variational dropout probability to use.
  • bidirectional : bool
    If True, becomes a bidirectional LSTM. Defaults to True.
  • padding_value : float, optional (default = 0.0)
    Value for the padded elements. Defaults to 0.0.
  • use_highway : bool, optional (default = True)
    Whether or not to use highway connections between layers. This effectively involves reparameterising the normal output of an LSTM as::
    gate = sigmoid(W_x1 * x_t + W_h * h_t)
    output = gate * h_t  + (1 - gate) * (W_x2 * x_t)


  • output_accumulator : PackedSequence
    The outputs of the LSTM for each timestep. A tensor of shape (batch_size, max_timesteps, hidden_size) where for a given batch element, all outputs past the sequence length for that batch are zero tensors.


class BiAugmentedLstm(torch.nn.Module):
 | ...
 | def forward(
 |     self,
 |     inputs: torch.Tensor,
 |     states: Optional[Tuple[torch.Tensor, torch.Tensor]] = None
 | ) -> Tuple[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]

Given an input batch of sequential data such as word embeddings, produces a AugmentedLSTM representation of the sequential input and new state tensors.


  • inputs : PackedSequence
    A tensor of shape (batch_size, num_timesteps, input_size) to apply the LSTM over.
  • states : Tuple[torch.Tensor, torch.Tensor]
    Tuple of tensors containing the initial hidden state and the cell state of each element in the batch. Each of these tensors have a dimension of (bsize x num_layers x num_directions * nhid). Defaults to None.


  • Tuple[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]
    AgumentedLSTM representation of input and the state of the LSTM t = seq_len. Shape of representation is (bsize x seq_len x representation_dim). Shape of each state is (bsize x num_layers * num_directions x nhid).