Skip to content

gated_cnn_encoder

allennlp.modules.seq2seq_encoders.gated_cnn_encoder

[SOURCE]


ResidualBlock#

class ResidualBlock(torch.nn.Module):
 | def __init__(
 |     self,
 |     input_dim: int,
 |     layers: Sequence[Sequence[int]],
 |     direction: str,
 |     do_weight_norm: bool = True,
 |     dropout: float = 0.0
 | ) -> None

forward#

class ResidualBlock(torch.nn.Module):
 | ...
 | def forward(self, x: torch.Tensor) -> torch.Tensor

x = (batch_size, dim, timesteps) outputs: (batch_size, dim, timesteps) = f(x) + x

GatedCnnEncoder#

@Seq2SeqEncoder.register("gated-cnn-encoder")
class GatedCnnEncoder(Seq2SeqEncoder):
 | def __init__(
 |     self,
 |     input_dim: int,
 |     layers: Sequence[Sequence[Sequence[int]]],
 |     dropout: float = 0.0,
 |     return_all_layers: bool = False
 | ) -> None

This is work-in-progress and has not been fully tested yet. Use at your own risk!

A Seq2SeqEncoder that uses a Gated CNN.

see

Language Modeling with Gated Convolutional Networks, Yann N. Dauphin et al, ICML 2017 https://arxiv.org/abs/1612.08083

Convolutional Sequence to Sequence Learning, Jonas Gehring et al, ICML 2017 https://arxiv.org/abs/1705.03122

Some possibilities:

Each element of the list is wrapped in a residual block: input_dim = 512 layers = [ [[4, 512]], [[4, 512], [4, 512]], [[4, 512], [4, 512]], [[4, 512], [4, 512]] dropout = 0.05

A "bottleneck architecture" input_dim = 512 layers = [ [[4, 512]], [[1, 128], [5, 128], [1, 512]], ... ]

An architecture with dilated convolutions input_dim = 512 layers = [ [[2, 512, 1]], [[2, 512, 2]], [[2, 512, 4]], [[2, 512, 8]], # receptive field == 16 [[2, 512, 1]], [[2, 512, 2]], [[2, 512, 4]], [[2, 512, 8]], # receptive field == 31 [[2, 512, 1]], [[2, 512, 2]], [[2, 512, 4]], [[2, 512, 8]], # receptive field == 46 [[2, 512, 1]], [[2, 512, 2]], [[2, 512, 4]], [[2, 512, 8]], # receptive field == 57 ]

Registered as a Seq2SeqEncoder with name "gated-cnn-encoder".

Parameters

  • input_dim : int
    The dimension of the inputs.
  • layers : Sequence[Sequence[Sequence[int]]]
    The layer dimensions for each ResidualBlock.
  • dropout : float, optional (default = 0.0)
    The dropout for each ResidualBlock.
  • return_all_layers : bool, optional (default = False)
    Whether to return all layers or just the last layer.

forward#

class GatedCnnEncoder(Seq2SeqEncoder):
 | ...
 | def forward(
 |     self,
 |     token_embeddings: torch.Tensor,
 |     mask: torch.BoolTensor
 | )

Convolutions need transposed input

get_input_dim#

class GatedCnnEncoder(Seq2SeqEncoder):
 | ...
 | def get_input_dim(self) -> int

get_output_dim#

class GatedCnnEncoder(Seq2SeqEncoder):
 | ...
 | def get_output_dim(self) -> int

is_bidirectional#

class GatedCnnEncoder(Seq2SeqEncoder):
 | ...
 | def is_bidirectional(self) -> bool