Skip to content





class SinusoidalPositionalEncoding(torch.nn.Module,  FromParams):
 | def __init__(
 |     self,
 |     min_timescale: float = 1.0,
 |     max_timescale: float = 1.0e4
 | )

Implements the frequency-based positional encoding described in Attention is All you Need.

Adds sinusoids of different frequencies to a Tensor. A sinusoid of a different frequency and phase is added to each dimension of the input Tensor. This allows the attention heads to use absolute and relative positions.

The number of timescales is equal to hidden_dim / 2 within the range (min_timescale, max_timescale). For each timescale, the two sinusoidal signals sin(timestep / timescale) and cos(timestep / timescale) are generated and concatenated along the hidden_dim dimension.


  • tensor : torch.Tensor
    a Tensor with shape (batch_size, timesteps, hidden_dim).
  • min_timescale : float, optional (default = 1.0)
    The smallest timescale to use.
  • max_timescale : float, optional (default = 1.0e4)
    The largest timescale to use.


  • torch.Tensor
    The input tensor augmented with the sinusoidal frequencies.


class SinusoidalPositionalEncoding(torch.nn.Module,  FromParams):
 | ...
 | def forward(self, input_tensor: torch.Tensor)

TODO: Another option is to specify the expected size in init, so that we can construct the positional encoding beforehand, and simply add it to the input tensor in forward.