BatchCallback(self, /, *args, **kwargs)
An optional callback that you can pass to the
GradientDescentTrainer that will be called at
the end of every batch, during both training and validation. We have no default implementation
of this, but you can implement your own callback and do whatever you want, such as saving
predictions to disk or extra logging.
EpochCallback(self, /, *args, **kwargs)
An optional callback that you can pass to the
GradientDescentTrainer that will be called at
the end of every epoch (and before the start of training, with
epoch=-1). We have no default
implementation of this, but you can implement your own callback and do whatever you want, such
as additional modifications of the trainer's state in between epochs.
GradientDescentTrainer( self, model: allennlp.models.model.Model, optimizer: torch.optim.optimizer.Optimizer, data_loader: torch.utils.data.dataloader.DataLoader, patience: Optional[int] = None, validation_metric: str = '-loss', validation_data_loader: torch.utils.data.dataloader.DataLoader = None, num_epochs: int = 20, serialization_dir: Optional[str] = None, checkpointer: allennlp.training.checkpointer.Checkpointer = None, cuda_device: int = -1, grad_norm: Optional[float] = None, grad_clipping: Optional[float] = None, learning_rate_scheduler: Optional[allennlp.training.learning_rate_schedulers.learning_rate_scheduler.LearningRateScheduler] = None, momentum_scheduler: Optional[allennlp.training.momentum_schedulers.momentum_scheduler.MomentumScheduler] = None, tensorboard_writer: allennlp.training.tensorboard_writer.TensorboardWriter = None, moving_average: Optional[allennlp.training.moving_average.MovingAverage] = None, batch_callbacks: List[allennlp.training.trainer.BatchCallback] = None, epoch_callbacks: List[allennlp.training.trainer.EpochCallback] = None, distributed: bool = False, local_rank: int = 0, world_size: int = 1, num_gradient_accumulation_steps: int = 1, opt_level: Optional[str] = None, ) -> None
A trainer for doing supervised learning with gradient descent. It just takes a labeled dataset
DataLoader, and uses the supplied
Optimizer to learn the weights for your model over
some fixed number of epochs. You can also pass in a validation dataloader and enable early
stopping. There are many other bells and whistles as well.
Registered as a
Trainer with the name "gradient_descent" (and is also the default
The constructor that is registered is
from_partial_objects - see the arguments to that
function for the exact keys that should be used, if you are using a configuration file. They
largely match the arguments to
__init__, and we don't repeat their docstrings in
Model, required. An AllenNLP model to be optimized. Pytorch Modules can also be optimized if their
forwardmethod returns a dictionary with a "loss" key, containing a scalar tensor representing the loss function to be optimized.
If you are training your model using GPUs, your model should already be on the correct device. (If you are using our
traincommand this will be handled for you.) - optimizer :
torch.nn.Optimizer, required. An instance of a Pytorch Optimizer, instantiated with the parameters of the model to be optimized. - data_loader :
DataLoader, required. A pytorch
Dataset, yielding padded indexed batches. - patience : Optional[int] > 0, optional (default=None) - Number of epochs to be patient before early stopping: the training is stopped after
patienceepochs with no improvement. If given, it must be
> 0. If None, early stopping is disabled. - validation_metric : str, optional (default="loss") Validation metric to measure for whether to stop training using patience and whether to serialize an
is_bestmodel each epoch. The metric name must be prepended with either "+" or "-", which specifies whether the metric is an increasing or decreasing function. - validation_dataloader :
DataLoader, optional (default=None) A
DataLoaderto use for the validation set. If
None, then use the training
DataLoaderwith the validation data. - num_epochs : int, optional (default = 20) Number of training epochs. - serialization_dir : str, optional (default=None) Path to directory for saving and loading model files. Models will not be saved if this parameter is not passed. - checkpointer :
Checkpointer, optional (default=None) A
Checkpointeris responsible for periodically saving model weights. If none is given here, we will construct one with default parameters. - cuda_device :
int, optional (default = -1) An integer specifying the CUDA device(s) to use for this process. If -1, the CPU is used. Data parallelism is controlled at the allennlp train level, so each trainer will have a single GPU. - grad_norm :
float, optional, (default = None). If provided, gradient norms will be rescaled to have a maximum of this value. - grad_clipping :
float, optional (default =
None). If provided, gradients will be clipped
during the backward passto have an (absolute) maximum of this value. If you are getting
NaNsin your gradients during training that are not solved by using
grad_norm, you may need this. - learning_rate_scheduler :
LearningRateScheduler, optional (default = None) If specified, the learning rate will be decayed with respect to this schedule at the end of each epoch (or batch, if the scheduler implements the
step_batchmethod). If you use
torch.optim.lr_scheduler.ReduceLROnPlateau, this will use the
validation_metricprovided to determine if learning has plateaued. To support updating the learning rate on every batch, this can optionally implement
step_batch(batch_num_total)which updates the learning rate given the batch number. - momentum_scheduler :
MomentumScheduler, optional (default = None) If specified, the momentum will be updated at the end of each batch or epoch according to the schedule. - tensorboard_writer :
TensorboardWriter, optional If this is not provided, we will construct a
TensorboardWriterwith default parameters and use that. - moving_average :
MovingAverage, optional, (default = None) If provided, we will maintain moving averages for all parameters. During training, we employ a shadow variable for each parameter, which maintains the moving average. During evaluation, we backup the original parameters and assign the moving averages to corresponding parameters. Be careful that when saving the checkpoint, we will save the moving averages of parameters. This is necessary because we want the saved model to perform as well as the validated model if we load it later. But this may cause problems if you restart the training from checkpoint. - batch_callbacks :
List[BatchCallback], optional (default = None) A list of callbacks that will be called at the end of every batch, during both train and validation. - epoch_callbacks :
List[EpochCallback], optional (default = None) A list of callbacks that will be called at the end of every epoch, and at the start of training (with epoch = -1). - distributed :
bool, optional, (default = False) If set, PyTorch's
DistributedDataParallelis used to train the model in multiple GPUs. This also requires
world_sizeto be greater than 1. - local_rank :
int, optional, (default = 0) This is the unique identifier of the
Trainerin a distributed process group. The GPU device id is used as the rank. - world_size :
int, (default = 1) The number of
Trainerworkers participating in the distributed training. - num_gradient_accumulation_steps :
int, optional, (default = 1) Gradients are accumulated for the given number of steps before doing an optimizer step. This can be useful to accommodate batches that are larger than the RAM size. Refer Thomas Wolf's - post for details on Gradient Accumulation. - __opt_level :
str, optional, (default =
None) Each opt_level establishes a set of properties that govern Amp’s implementation of pure or mixed precision training. Must be a choice of
"O3". - __See the Apex documentation for more details. If
None, Amp is not used. Defaults to
GradientDescentTrainer.batch_outputs( self, batch: Dict[str, Union[torch.Tensor, Dict[str, torch.Tensor]]], for_training: bool, ) -> Dict[str, torch.Tensor]
Does a forward pass on the given batch and returns the output dictionary that the model returns, after adding any specified regularization penalty to the loss (if training).
GradientDescentTrainer.from_partial_objects( model: allennlp.models.model.Model, serialization_dir: str, data_loader: allennlp.data.dataloader.DataLoader, validation_data_loader: allennlp.data.dataloader.DataLoader = None, local_rank: int = 0, patience: int = None, validation_metric: str = '-loss', num_epochs: int = 20, cuda_device: int = -1, grad_norm: float = None, grad_clipping: float = None, distributed: bool = None, world_size: int = 1, num_gradient_accumulation_steps: int = 1, opt_level: Optional[str] = None, no_grad: List[str] = None, optimizer: allennlp.common.lazy.Lazy = None, learning_rate_scheduler: allennlp.common.lazy.Lazy = None, momentum_scheduler: allennlp.common.lazy.Lazy = None, tensorboard_writer: allennlp.common.lazy.Lazy = None, moving_average: allennlp.common.lazy.Lazy = None, checkpointer: allennlp.common.lazy.Lazy = None, batch_callbacks: List[allennlp.training.trainer.BatchCallback] = None, epoch_callbacks: List[allennlp.training.trainer.EpochCallback] = None, ) -> 'Trainer'
This method exists so that we can have a documented method to construct this class using
FromParams. If you are not using
FromParams or config files, you can safely ignore this
The reason we can't just use
FromParams here is because there are
sequential dependencies to this class's arguments. Anything that has a
annotation needs something from one of the non-
Lazy arguments. The
Optimizer needs to
have the parameters from the
Model before it's constructed, and the
Schedulers need to
Optimizer. Because of this, the typical way we construct things
doesn't work, so we use
Lazy to allow for constructing the objects sequentially.
If you're not using
FromParams, you can just construct these arguments in the right order
yourself in your code and call the constructor directly.
GradientDescentTrainer.rescale_gradients(self) -> Union[float, NoneType]
Performs gradient rescaling. Is a no-op if gradient rescaling is not enabled.
GradientDescentTrainer.train(self) -> Dict[str, Any]
Trains the supplied model with the supplied parameters.
Trainer( self, serialization_dir: str, cuda_device: int = -1, distributed: bool = False, local_rank: int = 0, world_size: int = 1, ) -> None
The base class for an AllenNLP trainer. It can do pretty much
anything you want. Your subclass should implement
and also probably
str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.
Trainer.get_checkpoint_state( self, ) -> Iterator[Tuple[Dict[str, Any], Dict[str, Any]]]
Returns a tuple of (model state, training state), where training state could have several internal components (e.g., for an, optimizer, learning rate scheduler, etc.).
This is a context manager, and should be called as
with trainer.get_checkpoint_state() as
state:, so that the trainer has the opportunity to change and restore its internal state
for checkpointing. This is used, e.g., for moving averages of model weights.
Trainer.train(self) -> Dict[str, Any]
Train a model and return the results.