augmented_lstm
allennlp.modules.augmented_lstm
An LSTM with Recurrent Dropout and the option to use highway connections between layers. Based on PyText version (that was based on a previous AllenNLP version)
AugmentedLSTMCell#
class AugmentedLSTMCell(torch.nn.Module):
| def __init__(
| self,
| embed_dim: int,
| lstm_dim: int,
| use_highway: bool = True,
| use_bias: bool = True
| )
AugmentedLSTMCell
implements a AugmentedLSTM cell.
Parameters
- embed_dim :
int
The number of expected features in the input. - lstm_dim :
int
Number of features in the hidden state of the LSTM. - use_highway :
bool
, optional (default =True
)
IfTrue
we append a highway network to the outputs of the LSTM. - use_bias :
bool
, optional (default =True
)
IfTrue
we use a bias in our LSTM calculations, otherwise we don't.
Attributes
- input_linearity :
nn.Module
Fused weight matrix which computes a linear function over the input. - state_linearity :
nn.Module
Fused weight matrix which computes a linear function over the states.
reset_parameters#
class AugmentedLSTMCell(torch.nn.Module):
| ...
| def reset_parameters(self)
Use sensible default initializations for parameters.
forward#
class AugmentedLSTMCell(torch.nn.Module):
| ...
| def forward(
| self,
| x: torch.Tensor,
| states=Tuple[torch.Tensor, torch.Tensor],
| variational_dropout_mask: Optional[torch.BoolTensor] = None
| ) -> Tuple[torch.Tensor, torch.Tensor]
Warning
DO NOT USE THIS LAYER DIRECTLY, instead use the AugmentedLSTM class
Parameters
- x :
torch.Tensor
Input tensor of shape (bsize x input_dim). - states :
Tuple[torch.Tensor, torch.Tensor]
Tuple of tensors containing the hidden state and the cell state of each element in the batch. Each of these tensors have a dimension of (bsize x nhid). Defaults toNone
.
Returns
Tuple[torch.Tensor, torch.Tensor]
Returned states. Shape of each state is (bsize x nhid).
AugmentedLstm#
class AugmentedLstm(torch.nn.Module):
| def __init__(
| self,
| input_size: int,
| hidden_size: int,
| go_forward: bool = True,
| recurrent_dropout_probability: float = 0.0,
| use_highway: bool = True,
| use_input_projection_bias: bool = True
| )
AugmentedLstm
implements a one-layer single directional
AugmentedLSTM layer. AugmentedLSTM is an LSTM which optionally
appends an optional highway network to the output layer. Furthermore the
dropout controls the level of variational dropout done.
Parameters
- input_size :
int
The number of expected features in the input. - hidden_size :
int
Number of features in the hidden state of the LSTM. Defaults to 32. - go_forward :
bool
Whether to compute features left to right (forward) or right to left (backward). - recurrent_dropout_probability :
float
Variational dropout probability to use. Defaults to 0.0. - use_highway :
bool
IfTrue
we append a highway network to the outputs of the LSTM. - use_input_projection_bias :
bool
IfTrue
we use a bias in our LSTM calculations, otherwise we don't.
Attributes
- cell :
AugmentedLSTMCell
AugmentedLSTMCell
that is applied at every timestep.
forward#
class AugmentedLstm(torch.nn.Module):
| ...
| def forward(
| self,
| inputs: PackedSequence,
| states: Optional[Tuple[torch.Tensor, torch.Tensor]] = None
| ) -> Tuple[PackedSequence, Tuple[torch.Tensor, torch.Tensor]]
Warning: Would be better to use the BiAugmentedLstm class in a regular model
Given an input batch of sequential data such as word embeddings, produces a single layer unidirectional AugmentedLSTM representation of the sequential input and new state tensors.
Parameters
- inputs :
PackedSequence
bsize
sequences of shape(len, input_dim)
each, in PackedSequence format - states :
Tuple[torch.Tensor, torch.Tensor]
Tuple of tensors containing the initial hidden state and the cell state of each element in the batch. Each of these tensors have a dimension of (1 x bsize x nhid). Defaults toNone
.
Returns
Tuple[PackedSequence, Tuple[torch.Tensor, torch.Tensor]]
AugmentedLSTM representation of input and the state of the LSTMt = seq_len
. Shape of representation is (bsize x seq_len x representation_dim). Shape of each state is (1 x bsize x nhid).
BiAugmentedLstm#
class BiAugmentedLstm(torch.nn.Module):
| def __init__(
| self,
| input_size: int,
| hidden_size: int,
| num_layers: int = 1,
| bias: bool = True,
| recurrent_dropout_probability: float = 0.0,
| bidirectional: bool = False,
| padding_value: float = 0.0,
| use_highway: bool = True
| ) -> None
BiAugmentedLstm
implements a generic AugmentedLSTM representation layer.
BiAugmentedLstm is an LSTM which optionally appends an optional highway network to the output layer.
Furthermore the dropout controls the level of variational dropout done.
Parameters
- input_size :
int
The dimension of the inputs to the LSTM. - hidden_size :
int
The dimension of the outputs of the LSTM. - num_layers :
int
Number of recurrent layers. Eg. settingnum_layers=2
would mean stacking two LSTMs together to form a stacked LSTM, with the second LSTM taking in the outputs of the first LSTM and computing the final result. Defaults to 1. - bias :
bool
IfTrue
we use a bias in our LSTM calculations, otherwise we don't. - recurrent_dropout_probability :
float
, optional (default =0.0
)
Variational dropout probability to use. - bidirectional :
bool
IfTrue
, becomes a bidirectional LSTM. Defaults toTrue
. - padding_value :
float
, optional (default =0.0
)
Value for the padded elements. Defaults to 0.0. - use_highway :
bool
, optional (default =True
)
Whether or not to use highway connections between layers. This effectively involves reparameterising the normal output of an LSTM as::gate = sigmoid(W_x1 * x_t + W_h * h_t) output = gate * h_t + (1 - gate) * (W_x2 * x_t)
Returns
- output_accumulator :
PackedSequence
The outputs of the LSTM for each timestep. A tensor of shape (batch_size, max_timesteps, hidden_size) where for a given batch element, all outputs past the sequence length for that batch are zero tensors.
forward#
class BiAugmentedLstm(torch.nn.Module):
| ...
| def forward(
| self,
| inputs: torch.Tensor,
| states: Optional[Tuple[torch.Tensor, torch.Tensor]] = None
| ) -> Tuple[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]
Given an input batch of sequential data such as word embeddings, produces a AugmentedLSTM representation of the sequential input and new state tensors.
Parameters
- inputs :
PackedSequence
A tensor of shape (batch_size, num_timesteps, input_size) to apply the LSTM over. - states :
Tuple[torch.Tensor, torch.Tensor]
Tuple of tensors containing the initial hidden state and the cell state of each element in the batch. Each of these tensors have a dimension of (bsize x num_layers x num_directions * nhid). Defaults toNone
.
Returns
Tuple[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]
AgumentedLSTM representation of input and the state of the LSTMt = seq_len
. Shape of representation is (bsize x seq_len x representation_dim). Shape of each state is (bsize x num_layers * num_directions x nhid).