slanted_triangular

allennlp.training.learning_rate_schedulers.slanted_triangular

SlantedTriangular¶

@LearningRateScheduler.register("slanted_triangular")
class SlantedTriangular(LearningRateScheduler):
 | def __init__(
 |     self,
 |     optimizer: torch.optim.Optimizer,
 |     num_epochs: int,
 |     num_steps_per_epoch: Optional[int] = None,
 |     cut_frac: float = 0.1,
 |     ratio: int = 32,
 |     last_epoch: int = -1,
 |     gradual_unfreezing: bool = False,
 |     discriminative_fine_tuning: bool = False,
 |     decay_factor: float = 0.38
 | ) -> None

Implements the Slanted Triangular Learning Rate schedule with optional gradual unfreezing and discriminative fine-tuning. The schedule corresponds to first linearly increasing the learning rate over some number of epochs, and then linearly decreasing it over the remaining epochs.

If we gradually unfreeze, then in the first epoch of training, only the top layer is trained; in the second epoch, the top two layers are trained, etc. During freezing, the learning rate is increased and annealed over one epoch. After freezing finished, the learning rate is increased and annealed over the remaining training iterations.

Note that with this schedule, early stopping should typically be avoided.

Registered as a LearningRateScheduler with name "slanted_triangular".

Parameters¶

optimizer : torch.optim.Optimizer
This argument does not get an entry in a configuration file for the object.
num_epochs : int
The total number of epochs for which the model should be trained.
num_steps_per_epoch : Optional[int], optional (default = None)
The number of steps (updates, batches) per training epoch.
cut_frac : float, optional (default = 0.1)
The fraction of the steps to increase the learning rate.
ratio : float, optional (default = 32)
The ratio of the smallest to the (largest) base learning rate.
gradual_unfreezing : bool, optional (default = False)
Whether gradual unfreezing should be used.
discriminative_fine_tuning : bool, optional (default = False)
Whether discriminative fine-tuning (different learning rates per layer) are used.
decay_factor : float, optional (default = 0.38)
The decay factor by which the learning rate is reduced with discriminative fine-tuning when going a layer deeper.

step¶

class SlantedTriangular(LearningRateScheduler):
 | ...
 | def step(self, metric: float = None) -> None

step_batch¶

class SlantedTriangular(LearningRateScheduler):
 | ...
 | def step_batch(self, batch_num_total: int = None)

get_values¶

class SlantedTriangular(LearningRateScheduler):
 | ...
 | def get_values(self)