Skip to content

polynomial_decay

allennlp.training.learning_rate_schedulers.polynomial_decay

[SOURCE]


PolynomialDecay#

@LearningRateScheduler.register("polynomial_decay")
class PolynomialDecay(LearningRateScheduler):
 | def __init__(
 |     self,
 |     optimizer: torch.optim.Optimizer,
 |     num_epochs: int,
 |     num_steps_per_epoch: int,
 |     power=1.0,
 |     warmup_steps=0,
 |     end_learning_rate=0.0,
 |     last_epoch: int = -1
 | )

Implements polynomial decay Learning rate scheduling. The learning rate is first linearly increased for the first warmup_steps training steps. Then it is decayed for total_steps - warmup_steps from the initial learning rate to end_learning_rate using a polynomial of degree power.

Formally,

lr = (initial_lr - end_learning_rate) * ((total_steps - steps)/(total_steps - warmup_steps)) ** power

Parameters

  • total_steps : int
    The total number of steps to adjust the learning rate for.
  • warmup_steps : int
    The number of steps to linearly increase the learning rate.
  • power : float, optional (default = 1.0)
    The power of the polynomial used for decaying.
  • end_learning_rate : float, optional (default = 0.0)
    Final learning rate to decay towards.

get_values#

class PolynomialDecay(LearningRateScheduler):
 | ...
 | @overrides
 | def get_values(self)

step#

class PolynomialDecay(LearningRateScheduler):
 | ...
 | @overrides
 | def step(self, metric: float = None) -> None

step_batch#

class PolynomialDecay(LearningRateScheduler):
 | ...
 | @overrides
 | def step_batch(self, batch_num_total: int = None) -> None