allennlp.state_machines.transition_functions¶
This module contains TransitionFunctions
for state-machine-based decoders. The
TransitionFunction
parameterizes transitions between States
. These TransitionFunctions
are all pytorch Modules` that have trainable parameters. The BasicTransitionFunction
is
simply an LSTM decoder with attention over an input utterance, and the other classes typically
subclass this and add functionality to it.
-
class
allennlp.state_machines.transition_functions.transition_function.
TransitionFunction
[source]¶ Bases:
torch.nn.modules.module.Module
,typing.Generic
A
TransitionFunction
is a module that assigns scores to state transitions in a transition-based decoder.The
TransitionFunction
takes aState
and outputs a ranked list of next states, ordered by the state’s score.The intention with this class is that a model will implement a subclass of
TransitionFunction
that defines how exactly you want to handle the input and what computations get done at each step of decoding, and how states are scored. This subclass then gets passed to aDecoderTrainer
to have its parameters trained.-
take_step
(self, state: ~StateType, max_actions: int = None, allowed_actions: List[Set] = None) → List[~StateType][source]¶ The main method in the
TransitionFunction
API. This function defines the computation done at each step of decoding and returns a ranked list of next states.The input state is grouped, to allow for efficient computation, but the output states should all have a
group_size
of 1, to make things easier on the decoding algorithm. They will get regrouped later as needed.Because of the way we handle grouping in the decoder states, constructing a new state is actually a relatively expensive operation. If you know a priori that only some of the states will be needed (either because you have a set of gold action sequences, or you have a fixed beam size), passing that information into this function will keep us from constructing more states than we need, which will greatly speed up your computation.
IMPORTANT: This method must returns states already sorted by their score, otherwise
BeamSearch
and other methods will break. For efficiency, we do not perform an additional sort in those methods.ALSO IMPORTANT: When
allowed_actions
is given andmax_actions
is not, we assume you want to evaluate all possible states and do not need any sorting (e.g., this is true for maximum marginal likelihood training that does not use a beam search). In this case, we may skip the sorting step for efficiency reasons.- Parameters
- state
State
The current state of the decoder, which we will take a step from. We may be grouping together computation for several states here. Because we can have several states for each instance in the original batch being evaluated at the same time, we use
group_size
for this kind of batching, andbatch_size
for the original batch inmodel.forward.
- max_actions
int
, optional If you know that you will only need a certain number of states out of this (e.g., in a beam search), you can pass in the max number of actions that you need, and we will only construct that many states (for each batch instance - not for each group instance!). This can save a whole lot of computation if you have an action space that’s much larger than your beam size.
- allowed_actions
List[Set]
, optional If the
DecoderTrainer
has constraints on which actions need to be evaluated (e.g., maximum marginal likelihood only needs to evaluate action sequences in a given set), you can pass those constraints here, to avoid constructing state objects unnecessarily. If there are no constraints from the trainer, passing a value ofNone
here will allow all actions to be considered.This is a list because it is batched - every instance in the batch has a set of allowed actions. Note that the size of this list is the
group_size
in theState
, not thebatch_size
ofmodel.forward
. The training algorithm needs to convert from the batched allowed action sequences that it has to a grouped allowed action sequence list.
- state
- Returns
- next_states
List[State]
A list of next states, ordered by score.
- next_states
-
-
class
allennlp.state_machines.transition_functions.basic_transition_function.
BasicTransitionFunction
(encoder_output_dim: int, action_embedding_dim: int, input_attention: allennlp.modules.attention.attention.Attention, activation: allennlp.nn.activations.Activation = ReLU(), add_action_bias: bool = True, dropout: float = 0.0, num_layers: int = 1)[source]¶ Bases:
allennlp.state_machines.transition_functions.transition_function.TransitionFunction
This is a typical transition function for a state-based decoder. We use an LSTM to track decoder state, and at every timestep we compute an attention over the input question/utterance to help in selecting the action. All actions have an embedding, and we use a dot product between a predicted action embedding and the allowed actions to compute a distribution over actions at each timestep.
We allow the first action to be predicted separately from everything else. This is optional, and is because that’s how the original WikiTableQuestions semantic parser was written. The intuition is that maybe you want to predict the type of your output program outside of the typical LSTM decoder (or maybe Jayant just didn’t realize this could be treated as another action…).
- Parameters
- encoder_output_dim
int
- action_embedding_dim
int
- input_attention
Attention
- activation
Activation
, optional (default=relu) The activation that gets applied to the decoder LSTM input and to the action query.
- add_action_bias
bool
, optional (default=True) If
True
, there has been a bias dimension added to the embedding of each action, which gets used when predicting the next action. We add a dimension of ones to our predicted action vector in this case to account for that.- dropout
float
(optional, default=0.0) - num_layers: ``int``, (optional, default=1)
The number of layers in the decoder LSTM.
- encoder_output_dim
-
attend_on_question
(self, query: torch.Tensor, encoder_outputs: torch.Tensor, encoder_output_mask: torch.Tensor) → Tuple[torch.Tensor, torch.Tensor][source]¶ Given a query (which is typically the decoder hidden state), compute an attention over the output of the question encoder, and return a weighted sum of the question representations given this attention. We also return the attention weights themselves.
This is a simple computation, but we have it as a separate method so that the
forward
method on the main parser module can call it on the initial hidden state, to simplify the logic intake_step
.
-
take_step
(self, state: allennlp.state_machines.states.grammar_based_state.GrammarBasedState, max_actions: int = None, allowed_actions: List[Set[int]] = None) → List[allennlp.state_machines.states.grammar_based_state.GrammarBasedState][source]¶ The main method in the
TransitionFunction
API. This function defines the computation done at each step of decoding and returns a ranked list of next states.The input state is grouped, to allow for efficient computation, but the output states should all have a
group_size
of 1, to make things easier on the decoding algorithm. They will get regrouped later as needed.Because of the way we handle grouping in the decoder states, constructing a new state is actually a relatively expensive operation. If you know a priori that only some of the states will be needed (either because you have a set of gold action sequences, or you have a fixed beam size), passing that information into this function will keep us from constructing more states than we need, which will greatly speed up your computation.
IMPORTANT: This method must returns states already sorted by their score, otherwise
BeamSearch
and other methods will break. For efficiency, we do not perform an additional sort in those methods.ALSO IMPORTANT: When
allowed_actions
is given andmax_actions
is not, we assume you want to evaluate all possible states and do not need any sorting (e.g., this is true for maximum marginal likelihood training that does not use a beam search). In this case, we may skip the sorting step for efficiency reasons.- Parameters
- state
State
The current state of the decoder, which we will take a step from. We may be grouping together computation for several states here. Because we can have several states for each instance in the original batch being evaluated at the same time, we use
group_size
for this kind of batching, andbatch_size
for the original batch inmodel.forward.
- max_actions
int
, optional If you know that you will only need a certain number of states out of this (e.g., in a beam search), you can pass in the max number of actions that you need, and we will only construct that many states (for each batch instance - not for each group instance!). This can save a whole lot of computation if you have an action space that’s much larger than your beam size.
- allowed_actions
List[Set]
, optional If the
DecoderTrainer
has constraints on which actions need to be evaluated (e.g., maximum marginal likelihood only needs to evaluate action sequences in a given set), you can pass those constraints here, to avoid constructing state objects unnecessarily. If there are no constraints from the trainer, passing a value ofNone
here will allow all actions to be considered.This is a list because it is batched - every instance in the batch has a set of allowed actions. Note that the size of this list is the
group_size
in theState
, not thebatch_size
ofmodel.forward
. The training algorithm needs to convert from the batched allowed action sequences that it has to a grouped allowed action sequence list.
- state
- Returns
- next_states
List[State]
A list of next states, ordered by score.
- next_states
-
class
allennlp.state_machines.transition_functions.linking_transition_function.
LinkingTransitionFunction
(encoder_output_dim: int, action_embedding_dim: int, input_attention: allennlp.modules.attention.attention.Attention, activation: allennlp.nn.activations.Activation = ReLU(), add_action_bias: bool = True, mixture_feedforward: allennlp.modules.feedforward.FeedForward = None, dropout: float = 0.0, num_layers: int = 1)[source]¶ Bases:
allennlp.state_machines.transition_functions.basic_transition_function.BasicTransitionFunction
This transition function adds the ability to consider linked actions to the
BasicTransitionFunction
(which is just an LSTM decoder with attention). These actions are potentially unseen at training time, so we need to handle them without requiring the action to have an embedding. Instead, we rely on a linking score between each action and the words in the question/utterance, and use these scores, along with the attention, to do something similar to a copy mechanism when producing these actions.When both linked and global (embedded) actions are available, we need some way to compare the scores for these two sets of actions. The original WikiTableQuestion semantic parser just concatenated the logits together before doing a joint softmax, but this is quite brittle, because the logits might have quite different scales. So we have the option here of predicting a mixture probability between two independently normalized distributions.
- Parameters
- encoder_output_dim
int
- action_embedding_dim
int
- input_attention
Attention
- activation
Activation
, optional (default=relu) The activation that gets applied to the decoder LSTM input and to the action query.
- add_action_bias
bool
, optional (default=True) If
True
, there has been a bias dimension added to the embedding of each action, which gets used when predicting the next action. We add a dimension of ones to our predicted action vector in this case to account for that.- mixture_feedforward
FeedForward
optional (default=None) If given, we’ll use this to compute a mixture probability between global actions and linked actions given the hidden state at every timestep of decoding, instead of concatenating the logits for both (where the logits may not be compatible with each other).
- dropout
float
(optional, default=0.0) - num_layers: ``int`` (optional, default=1)
The number of layers in the decoder LSTM.
- encoder_output_dim
-
class
allennlp.state_machines.transition_functions.coverage_transition_function.
CoverageTransitionFunction
(encoder_output_dim: int, action_embedding_dim: int, input_attention: allennlp.modules.attention.attention.Attention, activation: allennlp.nn.activations.Activation = ReLU(), add_action_bias: bool = True, dropout: float = 0.0)[source]¶ Bases:
allennlp.state_machines.transition_functions.basic_transition_function.BasicTransitionFunction
Adds a coverage penalty to the
BasicTransitionFunction
(which is just an LSTM decoder with attention). This coverage penalty is on the output action sequence, and requires an externally-computed agenda of actions that are expected to be produced during decoding, and encourages the model to select actions on that agenda.The way that we encourage the model to select actions on the agenda is that we add the embeddings for actions on the agenda (that are available at this decoding step and haven’t yet been taken) to the predicted action embedding. We weight that addition by a learned multiplier that gets initialized to 1.
- Parameters
- encoder_output_dim
int
- action_embedding_dim
int
- input_attention
Attention
- activation
Activation
, optional (default=relu) The activation that gets applied to the decoder LSTM input and to the action query.
- add_action_bias
bool
, optional (default=True) If
True
, there has been a bias dimension added to the embedding of each action, which gets used when predicting the next action. We add a dimension of ones to our predicted action vector in this case to account for that.- dropout
float
(optional, default=0.0)
- encoder_output_dim
-
class
allennlp.state_machines.transition_functions.linking_coverage_transition_function.
LinkingCoverageTransitionFunction
(encoder_output_dim: int, action_embedding_dim: int, input_attention: allennlp.modules.attention.attention.Attention, activation: allennlp.nn.activations.Activation = ReLU(), add_action_bias: bool = True, mixture_feedforward: allennlp.modules.feedforward.FeedForward = None, dropout: float = 0.0)[source]¶ Bases:
allennlp.state_machines.transition_functions.coverage_transition_function.CoverageTransitionFunction
Combines both linking and coverage on top of the
BasicTransitionFunction
(which is just an LSTM decoder with attention). This adds the ability to consider linked actions in addition to global (embedded) actions, and it adds a coverage penalty over the output action sequence, combining theLinkingTransitionFunction
with theCoverageTransitionFunction
.The one thing that’s unique to this class is how the coverage penalty interacts with linked actions. Instead of boosting the action’s embedding, as we do in the
CoverageTransitionFunction
, we boost the action’s logit directly (as there is no action embedding for linked actions).- Parameters
- encoder_output_dim
int
- action_embedding_dim
int
- input_attention
Attention
- activation
Activation
, optional (default=relu) The activation that gets applied to the decoder LSTM input and to the action query.
- add_action_bias
bool
, optional (default=True) If
True
, there has been a bias dimension added to the embedding of each action, which gets used when predicting the next action. We add a dimension of ones to our predicted action vector in this case to account for that.- dropout
float
(optional, default=0.0)
- encoder_output_dim