AllenNLP just uses PyTorch optimizers , with a thin wrapper to allow registering them and instantiating them from_params.

The available optimizers are

class, lr=0.001, betas=(0.9, 0.999), eps=1e-08)[source]

Bases: torch.optim.optimizer.Optimizer

NOTE: This class has been copied verbatim from the separate Dense and Sparse versions of Adam in Pytorch.

Implements Adam algorithm with dense & sparse gradients. It has been proposed in Adam: A Method for Stochastic Optimization.


iterable of parameters to optimize or dicts defining parameter groups

lrfloat, optional (default: 1e-3)

The learning rate.

betasTuple[float, float], optional (default: (0.9, 0.999))

coefficients used for computing running averages of gradient and its square.

epsfloat, optional, (default: 1e-8)

A term added to the denominator to improve numerical stability.

step(self, closure=None)[source]

Performs a single optimization step.

closurecallable, optional.

A closure that reevaluates the model and returns the loss.


Bases: allennlp.common.registrable.Registrable

This class just allows us to implement Registrable for Pytorch Optimizers.

default_implementation: str = 'adam'
classmethod from_params(model_parameters: List, params: allennlp.common.params.Params)[source]

This is the automatic implementation of from_params. Any class that subclasses FromParams (or Registrable, which itself subclasses FromParams) gets this implementation for free. If you want your class to be instantiated from params in the “obvious” way – pop off parameters and hand them to your constructor with the same names – this provides that functionality.

If you need more complex logic in your from from_params method, you’ll have to implement your own method that overrides this one.