allennlp.training.optimizers#

AllenNLP just uses PyTorch optimizers <https://pytorch.org/docs/master/optim.html>_ , with a thin wrapper to allow registering them and instantiating them from_params.

The available optimizers are

  • "adadelta" <https://pytorch.org/docs/master/optim.html#torch.optim.Adadelta>_
  • "adagrad" <https://pytorch.org/docs/master/optim.html#torch.optim.Adagrad>_
  • "adam" <https://pytorch.org/docs/master/optim.html#torch.optim.Adam>_
  • "adamw" <https://pytorch.org/docs/master/optim.html#torch.optim.AdamW>_
  • "huggingface_adamw" <https://huggingface.co/transformers/main_classes/optimizer_schedules.html#transformers.AdamW>_
  • "sparse_adam" <https://pytorch.org/docs/master/optim.html#torch.optim.SparseAdam>_
  • "sgd" <https://pytorch.org/docs/master/optim.html#torch.optim.SGD>_
  • "rmsprop <https://pytorch.org/docs/master/optim.html#torch.optim.RMSprop>_
  • "adamax <https://pytorch.org/docs/master/optim.html#torch.optim.Adamax>_
  • "averaged_sgd <https://pytorch.org/docs/master/optim.html#torch.optim.ASGD>_

AdadeltaOptimizer#

AdadeltaOptimizer(
    self,
    model_parameters: List[Tuple[str, torch.nn.parameter.Parameter]],
    parameter_groups: List[Tuple[List[str], Dict[str, Any]]] = None,
    lr: float = 1.0,
    rho: float = 0.9,
    eps: float = 1e-06,
    weight_decay: float = 0.0,
)

Registered as an Optimizer with name "adadelta".

AdagradOptimizer#

AdagradOptimizer(
    self,
    model_parameters: List[Tuple[str, torch.nn.parameter.Parameter]],
    parameter_groups: List[Tuple[List[str], Dict[str, Any]]] = None,
    lr: float = 0.01,
    lr_decay: float = 0.0,
    weight_decay: float = 0.0,
    initial_accumulator_value: float = 0.0,
    eps: float = 1e-10,
)

Registered as an Optimizer with name "adagrad".

AdamaxOptimizer#

AdamaxOptimizer(
    self,
    model_parameters: List[Tuple[str, torch.nn.parameter.Parameter]],
    parameter_groups: List[Tuple[List[str], Dict[str, Any]]] = None,
    lr: float = 0.002,
    betas: Tuple[float, float] = (0.9, 0.999),
    eps: float = 1e-08,
    weight_decay: float = 0.0,
)

Registered as an Optimizer with name "adamax".

AdamOptimizer#

AdamOptimizer(
    self,
    model_parameters: List[Tuple[str, torch.nn.parameter.Parameter]],
    parameter_groups: List[Tuple[List[str], Dict[str, Any]]] = None,
    lr: float = 0.001,
    betas: Tuple[float, float] = (0.9, 0.999),
    eps: float = 1e-08,
    weight_decay: float = 0.0,
    amsgrad: bool = False,
)

Registered as an Optimizer with name "adam".

AdamWOptimizer#

AdamWOptimizer(
    self,
    model_parameters: List[Tuple[str, torch.nn.parameter.Parameter]],
    parameter_groups: List[Tuple[List[str], Dict[str, Any]]] = None,
    lr: float = 0.001,
    betas: Tuple[float, float] = (0.9, 0.999),
    eps: float = 1e-08,
    weight_decay: float = 0.01,
    amsgrad: bool = False,
)

Registered as an Optimizer with name "adamw".

AveragedSgdOptimizer#

AveragedSgdOptimizer(
    self,
    model_parameters: List[Tuple[str, torch.nn.parameter.Parameter]],
    parameter_groups: List[Tuple[List[str], Dict[str, Any]]] = None,
    lr: float = 0.01,
    lambd: float = 0.0001,
    alpha: float = 0.75,
    t0: float = 1000000.0,
    weight_decay: float = 0.0,
)

Registered as an Optimizer with name "averaged_sgd".

DenseSparseAdam#

DenseSparseAdam(
    self,
    model_parameters: List[Tuple[str, torch.nn.parameter.Parameter]],
    parameter_groups: List[Tuple[List[str], Dict[str, Any]]] = None,
    lr = 0.001,
    betas = (0.9, 0.999),
    eps = 1e-08,
)

NOTE: This class has been copied verbatim from the separate Dense and Sparse versions of Adam in Pytorch.

Implements Adam algorithm with dense & sparse gradients. It has been proposed in Adam: A Method for Stochastic Optimization.

Registered as an Optimizer with name "dense_sparse_adam".

Parameters

  • params : iterable iterable of parameters to optimize or dicts defining parameter groups
  • lr : float, optional (default: 1e-3) The learning rate.
  • betas : Tuple[float, float], optional (default: (0.9, 0.999)) coefficients used for computing running averages of gradient and its square.
  • eps : float, optional, (default: 1e-8) A term added to the denominator to improve numerical stability.

step#

DenseSparseAdam.step(self, closure=None)

Performs a single optimization step.

Parameters

  • closure : callable, optional. A closure that reevaluates the model and returns the loss.

HuggingfaceAdamWOptimizer#

HuggingfaceAdamWOptimizer(
    self,
    model_parameters: List[Tuple[str, torch.nn.parameter.Parameter]],
    parameter_groups: List[Tuple[List[str], Dict[str, Any]]] = None,
    lr: float = 0.001,
    betas: Tuple[float, float] = (0.9, 0.999),
    eps: float = 1e-06,
    weight_decay: float = 0.0,
    correct_bias: bool = False,
)

Registered as an Optimizer with name "huggingface_adamw".

make_parameter_groups#

make_parameter_groups(
    model_parameters: List[Tuple[str, torch.nn.parameter.Parameter]],
    groups: List[Tuple[List[str], Dict[str, Any]]] = None,
) -> Union[List[Dict[str, Any]], List[torch.nn.parameter.Parameter]]

Takes a list of model parameters with associated names (typically coming from something like model.parameters), along with a grouping (as specified below), and prepares them to be passed to the __init__ function of a torch.Optimizer. This means separating the parameters into groups with the given regexes, and prepping whatever keyword arguments are given for those regexes in groups.

groups contains something like:

 [
      (["regex1", "regex2"], {"lr": 1e-3}),
      (["regex3"], {"lr": 1e-4})
 ]
 ```

The return value in the right format to be passed directly as the `params` argument to a pytorch
`Optimizer`.  If there are multiple groups specified, this is list of dictionaries, where each
dict contains a "parameter group" and groups specific options, e.g., {'params': [list of
parameters], 'lr': 1e-3, ...}.  Any config option not specified in the additional options (e.g.
for the default group) is inherited from the top level arguments given in the constructor.  See:
https://pytorch.org/docs/0.3.0/optim.html?#per-parameter-options.  See also our
`test_optimizer_parameter_groups` test for an example of how this works in this code.

The dictionary's return type is labeled as `Any`, because it can be a `List[torch.nn.Parameter]`
(for the "params" key), or anything else (typically a float) for the other keys.

## Optimizer
```python
Optimizer(self, /, *args, **kwargs)

This class just allows us to implement Registrable for Pytorch Optimizers. We do something a little bit different with Optimizers, because they are implemented as classes in PyTorch, and we want to use those classes. To make things easy, we just inherit from those classes, using multiple inheritance to also inherit from Optimizer. The only reason we do this is to make type inference on parameters possible, so we can construct these objects using our configuration framework. If you are writing your own script, you can safely ignore these classes and just use the torch.optim classes directly.

If you are implementing one of these classes, the model_parameters and parameter_groups arguments to __init__ are important, and should always be present. The trainer will pass the trainable parameters in the model to the optimizer using the name model_parameters, so if you use a different name, your code will crash. Nothing will technically crash if you use a name other than parameter_groups for your second argument, it will just be annoyingly inconsistent.

default_implementation#

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

RmsPropOptimizer#

RmsPropOptimizer(
    self,
    model_parameters: List[Tuple[str, torch.nn.parameter.Parameter]],
    parameter_groups: List[Tuple[List[str], Dict[str, Any]]] = None,
    lr: float = 0.01,
    alpha: float = 0.99,
    eps: float = 1e-08,
    weight_decay: float = 0.0,
    momentum: float = 0.0,
    centered: bool = False,
)

Registered as an Optimizer with name "rmsprop".

SgdOptimizer#

SgdOptimizer(
    self,
    model_parameters: List[Tuple[str, torch.nn.parameter.Parameter]],
    lr: float,
    parameter_groups: List[Tuple[List[str], Dict[str, Any]]] = None,
    momentum: float = 0.0,
    dampening: float = 0,
    weight_decay: float = 0.0,
    nesterov: bool = False,
)

Registered as an Optimizer with name "sgd".

SparseAdamOptimizer#

SparseAdamOptimizer(
    self,
    model_parameters: List[Tuple[str, torch.nn.parameter.Parameter]],
    parameter_groups: List[Tuple[List[str], Dict[str, Any]]] = None,
    lr: float = 0.001,
    betas: Tuple[float, float] = (0.9, 0.999),
    eps: float = 1e-08,
)

Registered as an Optimizer with name "sparse_adam".