simple_seq2seq
allennlp_models.generation.models.simple_seq2seq
SimpleSeq2Seq#
@Model.register("simple_seq2seq")
class SimpleSeq2Seq(Model):
| def __init__(
| self,
| vocab: Vocabulary,
| source_embedder: TextFieldEmbedder,
| encoder: Seq2SeqEncoder,
| max_decoding_steps: int,
| attention: Attention = None,
| beam_size: int = None,
| target_namespace: str = "tokens",
| target_embedding_dim: int = None,
| scheduled_sampling_ratio: float = 0.0,
| use_bleu: bool = True,
| bleu_ngram_weights: Iterable[float] = (0.25, 0.25, 0.25, 0.25),
| target_pretrain_file: str = None,
| target_decoder_layers: int = 1
| ) -> None
This SimpleSeq2Seq
class is a Model
which takes a sequence, encodes it, and then
uses the encoded representations to decode another sequence. You can use this as the basis for
a neural machine translation system, an abstractive summarization system, or any other common
seq2seq problem. The model here is simple, but should be a decent starting place for
implementing recent models for these tasks.
Parameters
- vocab :
Vocabulary
Vocabulary containing source and target vocabularies. They may be under the same namespace (tokens
) or the target tokens can have a different namespace, in which case it needs to be specified astarget_namespace
. - source_embedder :
TextFieldEmbedder
Embedder for source side sequences - encoder :
Seq2SeqEncoder
The encoder of the "encoder/decoder" model - max_decoding_steps :
int
Maximum length of decoded sequences. - target_namespace :
str
, optional (default ='tokens'
)
If the target side vocabulary is different from the source side's, you need to specify the target's namespace here. If not, we'll assume it is "tokens", which is also the default choice for the source side, and this might cause them to share vocabularies. - target_embedding_dim :
int
, optional (default ='source_embedding_dim'
)
You can specify an embedding dimensionality for the target side. If not, we'll use the same value as the source embedder's. - target_pretrain_file :
str
, optional (default =None
)
Path to target pretrain embedding files - target_decoder_layers :
int
, optional (default =1
)
Nums of layer for decoder - attention :
Attention
, optional (default =None
)
If you want to use attention to get a dynamic summary of the encoder outputs at each step of decoding, this is the function used to compute similarity between the decoder hidden state and encoder outputs. - beam_size :
int
, optional (default =None
)
Width of the beam for beam search. If not specified, greedy decoding is used. - scheduled_sampling_ratio :
float
, optional (default =0.
)
At each timestep during training, we sample a random number between 0 and 1, and if it is not less than this value, we use the ground truth labels for the whole batch. Else, we use the predictions from the previous time step for the whole batch. If this value is 0.0 (default), this corresponds to teacher forcing, and if it is 1.0, it corresponds to not using target side ground truth labels. See the following paper for more information: Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks. Bengio et al., 2015. - use_bleu :
bool
, optional (default =True
)
If True, the BLEU metric will be calculated during validation. - ngram_weights :
Iterable[float]
, optional (default =(0.25, 0.25, 0.25, 0.25)
)
Weights to assign to scores for each ngram size.
take_step#
class SimpleSeq2Seq(Model):
| ...
| def take_step(
| self,
| last_predictions: torch.Tensor,
| state: Dict[str, torch.Tensor],
| step: int
| ) -> Tuple[torch.Tensor, Dict[str, torch.Tensor]]
Take a decoding step. This is called by the beam search class. Parameters
- last_predictions :
torch.Tensor
A tensor of shape(group_size,)
, which gives the indices of the predictions during the last time step. - state :
Dict[str, torch.Tensor]
A dictionary of tensors that contain the current state information needed to predict the next step, which includes the encoder outputs, the source mask, and the decoder hidden state and context. Each of these tensors has shape(group_size, *)
, where*
can be any other number of dimensions. - step :
int
The time step in beam search decoding.
Returns
-
Tuple[torch.Tensor, Dict[str, torch.Tensor]]
A tuple of(log_probabilities, updated_state)
, wherelog_probabilities
is a tensor of shape(group_size, num_classes)
containing the predicted log probability of each class for the next step, for each item in the group, whileupdated_state
is a dictionary of tensors containing the encoder outputs, source mask, and updated decoder hidden state and context. Notes We treat the inputs as a batch, even thoughgroup_size
is not necessarilyequal to
batch_size
, since the group may contain multiple states for each source sentence in the batch.
forward#
class SimpleSeq2Seq(Model):
| ...
| @overrides
| def forward(
| self,
| source_tokens: TextFieldTensors,
| target_tokens: TextFieldTensors = None
| ) -> Dict[str, torch.Tensor]
Make foward pass with decoder logic for producing the entire target sequence. Parameters
- source_tokens :
TextFieldTensors
The output ofTextField.as_array()
applied on the sourceTextField
. This will be passed through aTextFieldEmbedder
and then through an encoder. -
target_tokens :
TextFieldTensors
, optional (default =None
)
Output ofTextfield.as_array()
applied on targetTextField
. We assume that the target tokens are also represented as aTextField
. Returns -
Dict[str, torch.Tensor]
make_output_human_readable#
class SimpleSeq2Seq(Model):
| ...
| @overrides
| def make_output_human_readable(
| self,
| output_dict: Dict[str, Any]
| ) -> Dict[str, Any]
Finalize predictions.
This method overrides Model.make_output_human_readable
, which gets called after Model.forward
, at test
time, to finalize predictions. The logic for the decoder part of the encoder-decoder lives
within the forward
method.
This method trims the output predictions to the first end symbol, replaces indices with
corresponding tokens, and adds a field called predicted_tokens
to the output_dict
.
get_metrics#
class SimpleSeq2Seq(Model):
| ...
| @overrides
| def get_metrics(self, reset: bool = False) -> Dict[str, float]
default_predictor#
class SimpleSeq2Seq(Model):
| ...
| default_predictor = "seq2seq"