augmented_lstm

allennlp.modules.augmented_lstm

An LSTM with Recurrent Dropout and the option to use highway connections between layers. Based on PyText version (that was based on a previous AllenNLP version)

AugmentedLSTMCell#

class AugmentedLSTMCell(torch.nn.Module):
 | def __init__(
 |     self,
 |     embed_dim: int,
 |     lstm_dim: int,
 |     use_highway: bool = True,
 |     use_bias: bool = True
 | )

AugmentedLSTMCell implements a AugmentedLSTM cell.

Parameters

embed_dim : int
The number of expected features in the input.
lstm_dim : int
Number of features in the hidden state of the LSTM.
use_highway : bool, optional (default = True)
If True we append a highway network to the outputs of the LSTM.
use_bias : bool, optional (default = True)
If True we use a bias in our LSTM calculations, otherwise we don't.

Attributes

input_linearity : nn.Module
Fused weight matrix which computes a linear function over the input.
state_linearity : nn.Module
Fused weight matrix which computes a linear function over the states.

reset_parameters#

class AugmentedLSTMCell(torch.nn.Module):
 | ...
 | def reset_parameters(self)

Use sensible default initializations for parameters.

forward#

class AugmentedLSTMCell(torch.nn.Module):
 | ...
 | def forward(
 |     self,
 |     x: torch.Tensor,
 |     states=Tuple[torch.Tensor, torch.Tensor],
 |     variational_dropout_mask: Optional[torch.BoolTensor] = None
 | ) -> Tuple[torch.Tensor, torch.Tensor]

Warning

DO NOT USE THIS LAYER DIRECTLY, instead use the AugmentedLSTM class

Parameters

x : torch.Tensor
Input tensor of shape (bsize x input_dim).
states : Tuple[torch.Tensor, torch.Tensor]
Tuple of tensors containing the hidden state and the cell state of each element in the batch. Each of these tensors have a dimension of (bsize x nhid). Defaults to None.

Returns

Tuple[torch.Tensor, torch.Tensor]
Returned states. Shape of each state is (bsize x nhid).

AugmentedLstm#

class AugmentedLstm(torch.nn.Module):
 | def __init__(
 |     self,
 |     input_size: int,
 |     hidden_size: int,
 |     go_forward: bool = True,
 |     recurrent_dropout_probability: float = 0.0,
 |     use_highway: bool = True,
 |     use_input_projection_bias: bool = True
 | )

AugmentedLstm implements a one-layer single directional AugmentedLSTM layer. AugmentedLSTM is an LSTM which optionally appends an optional highway network to the output layer. Furthermore the dropout controls the level of variational dropout done.

Parameters

input_size : int
The number of expected features in the input.
hidden_size : int
Number of features in the hidden state of the LSTM. Defaults to 32.
go_forward : bool
Whether to compute features left to right (forward) or right to left (backward).
recurrent_dropout_probability : float
Variational dropout probability to use. Defaults to 0.0.
use_highway : bool
If True we append a highway network to the outputs of the LSTM.
use_input_projection_bias : bool
If True we use a bias in our LSTM calculations, otherwise we don't.

Attributes

cell : AugmentedLSTMCell
AugmentedLSTMCell that is applied at every timestep.

forward#

class AugmentedLstm(torch.nn.Module):
 | ...
 | def forward(
 |     self,
 |     inputs: PackedSequence,
 |     states: Optional[Tuple[torch.Tensor, torch.Tensor]] = None
 | ) -> Tuple[PackedSequence, Tuple[torch.Tensor, torch.Tensor]]

Warning: Would be better to use the BiAugmentedLstm class in a regular model

Given an input batch of sequential data such as word embeddings, produces a single layer unidirectional AugmentedLSTM representation of the sequential input and new state tensors.

Parameters

inputs : PackedSequence
bsize sequences of shape (len, input_dim) each, in PackedSequence format
states : Tuple[torch.Tensor, torch.Tensor]
Tuple of tensors containing the initial hidden state and the cell state of each element in the batch. Each of these tensors have a dimension of (1 x bsize x nhid). Defaults to None.

Returns

Tuple[PackedSequence, Tuple[torch.Tensor, torch.Tensor]]
AugmentedLSTM representation of input and the state of the LSTM t = seq_len. Shape of representation is (bsize x seq_len x representation_dim). Shape of each state is (1 x bsize x nhid).

BiAugmentedLstm#

class BiAugmentedLstm(torch.nn.Module):
 | def __init__(
 |     self,
 |     input_size: int,
 |     hidden_size: int,
 |     num_layers: int = 1,
 |     bias: bool = True,
 |     recurrent_dropout_probability: float = 0.0,
 |     bidirectional: bool = False,
 |     padding_value: float = 0.0,
 |     use_highway: bool = True
 | ) -> None

BiAugmentedLstm implements a generic AugmentedLSTM representation layer. BiAugmentedLstm is an LSTM which optionally appends an optional highway network to the output layer. Furthermore the dropout controls the level of variational dropout done.

Parameters

input_size : int
The dimension of the inputs to the LSTM.
hidden_size : int
The dimension of the outputs of the LSTM.
num_layers : int
Number of recurrent layers. Eg. setting num_layers=2 would mean stacking two LSTMs together to form a stacked LSTM, with the second LSTM taking in the outputs of the first LSTM and computing the final result. Defaults to 1.
bias : bool
If True we use a bias in our LSTM calculations, otherwise we don't.
recurrent_dropout_probability : float, optional (default = 0.0)
Variational dropout probability to use.
bidirectional : bool
If True, becomes a bidirectional LSTM. Defaults to True.
padding_value : float, optional (default = 0.0)
Value for the padded elements. Defaults to 0.0.
use_highway : bool, optional (default = True)
Whether or not to use highway connections between layers. This effectively involves reparameterising the normal output of an LSTM as::
```
gate = sigmoid(W_x1 * x_t + W_h * h_t)
output = gate * h_t  + (1 - gate) * (W_x2 * x_t)
```

Returns

output_accumulator : PackedSequence
The outputs of the LSTM for each timestep. A tensor of shape (batch_size, max_timesteps, hidden_size) where for a given batch element, all outputs past the sequence length for that batch are zero tensors.

forward#

class BiAugmentedLstm(torch.nn.Module):
 | ...
 | def forward(
 |     self,
 |     inputs: torch.Tensor,
 |     states: Optional[Tuple[torch.Tensor, torch.Tensor]] = None
 | ) -> Tuple[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]

Given an input batch of sequential data such as word embeddings, produces a AugmentedLSTM representation of the sequential input and new state tensors.

Parameters

inputs : PackedSequence
A tensor of shape (batch_size, num_timesteps, input_size) to apply the LSTM over.
states : Tuple[torch.Tensor, torch.Tensor]
Tuple of tensors containing the initial hidden state and the cell state of each element in the batch. Each of these tensors have a dimension of (bsize x num_layers x num_directions * nhid). Defaults to None.

Returns

Tuple[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]
AgumentedLSTM representation of input and the state of the LSTM t = seq_len. Shape of representation is (bsize x seq_len x representation_dim). Shape of each state is (bsize x num_layers * num_directions x nhid).