Helper functions for Trainers


Bases: object

tqdm_ignores_underscores = False allennlp.common.params.Params, serialization_dir: str, recover: bool, force: bool) → None[source]

This function creates the serialization directory if it doesn’t exist. If it already exists and is non-empty, then it verifies that we’re recovering from a training with an identical configuration.

params: ``Params``

A parameter object specifying an AllenNLP Experiment.

serialization_dir: ``str``

The directory in which to save results and logs.

recover: ``bool``

If True, we will try to recover from an existing serialization directory, and crash if the directory doesn’t exist, or doesn’t match the configuration we’re given.

force: ``bool``

If True, we will overwrite the serialization directory if it already exists. List[Dict[str, Union[torch.Tensor, Dict[str, torch.Tensor]]]], model: allennlp.models.model.Model, cuda_devices: List) → Dict[str, torch.Tensor][source]

Performs a forward pass using multiple GPUs. This is a simplification of torch.nn.parallel.data_parallel to support the allennlp model interface. allennlp.common.params.Params, cache_directory: str = None, cache_prefix: str = None) → Dict[str, Iterable[]][source]

Load all the datasets specified by the config.

cache_directorystr, optional

If given, we will instruct the DatasetReaders that we construct to cache their instances in this location (or read their instances from caches in this location, if a suitable cache already exists). This is essentially a base directory for the cache, as we will additionally add the cache_prefix to this directory, giving an actual cache location of cache_directory + cache_prefix.

cache_prefixstr, optional

This works in conjunction with the cache_directory. The idea is that the cache_directory contains caches for all different parameter settings, while the cache_prefix captures a specific set of parameters that led to a particular cache file. That is, if you change the tokenization settings inside your DatasetReader, you don’t want to read cached data that used the old settings. In order to avoid this, we compute a hash of the parameters used to construct each DatasetReader and use that as a “prefix” to the cache files inside the base cache_directory. So, a given input_file would be cached essentially as cache_directory + cache_prefix + input_file, where you specify a cache_directory, the cache_prefix is based on the dataset reader parameters, and the input_file is whatever path you provided to In order to allow you to give recognizable names to these prefixes if you want them, you can manually specify the cache_prefix. Note that in some rare cases this can be dangerous, as we’ll use the same prefix for both train and validation dataset readers. Dict[str, float]) → str[source] allennlp.models.model.Model, grad_clipping: Union[float, NoneType]) → None[source] allennlp.models.model.Model, instances: Iterable[], data_iterator:, cuda_device: int, batch_weight_key: str) → Dict[str, Any][source] Union[Dict, torch.Tensor]) → int[source]

Returns the size of the batch dimension. Assumes a well-formed batch, returns 0 otherwise. allennlp.models.model.Model, total_loss: float, num_batches: int, reset: bool = False) → Dict[str, float][source]

Gets the metrics but sets "loss" to the total loss divided by the num_batches so that the "loss" metric is “average loss per batch”.[source]

Move the optimizer state to GPU, if necessary. After calling, any parameter specific state in the optimizer will be located on the same device as the parameter. allennlp.models.model.Model, grad_norm: Union[float, NoneType] = None) → Union[float, NoneType][source]

Performs gradient rescaling. Is a no-op if gradient rescaling is not enabled., max_norm, norm_type=2) → float[source]

Clips gradient norm of an iterable of parameters.

The norm is computed over all gradients together, as if they were concatenated into a single vector. Gradients are modified in-place. Supports sparse gradients.


An iterable of Tensors that will have gradients normalized.


The max norm of the gradients.


The type of the used p-norm. Can be 'inf' for infinity norm.

Total norm of the parameters (viewed as a single vector). str) → datetime.datetime[source]

Convert human readable string to datetime.datetime. int) → str[source]

Convert seconds past Epoch to human readable string.