allennlp.training.util¶
Helper functions for Trainers
-
allennlp.training.util.
create_serialization_dir
(params: allennlp.common.params.Params, serialization_dir: str, recover: bool, force: bool) → None[source]¶ This function creates the serialization directory if it doesn’t exist. If it already exists and is non-empty, then it verifies that we’re recovering from a training with an identical configuration.
- Parameters
- params: ``Params``
A parameter object specifying an AllenNLP Experiment.
- serialization_dir: ``str``
The directory in which to save results and logs.
- recover: ``bool``
If
True
, we will try to recover from an existing serialization directory, and crash if the directory doesn’t exist, or doesn’t match the configuration we’re given.- force: ``bool``
If
True
, we will overwrite the serialization directory if it already exists.
-
allennlp.training.util.
data_parallel
(batch_group: List[Dict[str, Union[torch.Tensor, Dict[str, torch.Tensor]]]], model: allennlp.models.model.Model, cuda_devices: List) → Dict[str, torch.Tensor][source]¶ Performs a forward pass using multiple GPUs. This is a simplification of torch.nn.parallel.data_parallel to support the allennlp model interface.
-
allennlp.training.util.
datasets_from_params
(params: allennlp.common.params.Params, cache_directory: str = None, cache_prefix: str = None) → Dict[str, Iterable[allennlp.data.instance.Instance]][source]¶ Load all the datasets specified by the config.
- Parameters
- params
Params
- cache_directory
str
, optional If given, we will instruct the
DatasetReaders
that we construct to cache their instances in this location (or read their instances from caches in this location, if a suitable cache already exists). This is essentially a base directory for the cache, as we will additionally add thecache_prefix
to this directory, giving an actual cache location ofcache_directory + cache_prefix
.- cache_prefix
str
, optional This works in conjunction with the
cache_directory
. The idea is that thecache_directory
contains caches for all different parameter settings, while thecache_prefix
captures a specific set of parameters that led to a particular cache file. That is, if you change the tokenization settings inside yourDatasetReader
, you don’t want to read cached data that used the old settings. In order to avoid this, we compute a hash of the parameters used to construct eachDatasetReader
and use that as a “prefix” to the cache files inside the basecache_directory
. So, a giveninput_file
would be cached essentially ascache_directory + cache_prefix + input_file
, where you specify acache_directory
, thecache_prefix
is based on the dataset reader parameters, and theinput_file
is whatever path you provided toDatasetReader.read()
. In order to allow you to give recognizable names to these prefixes if you want them, you can manually specify thecache_prefix
. Note that in some rare cases this can be dangerous, as we’ll use the same prefix for both train and validation dataset readers.
- params
-
allennlp.training.util.
enable_gradient_clipping
(model: allennlp.models.model.Model, grad_clipping: Union[float, NoneType]) → None[source]¶
-
allennlp.training.util.
evaluate
(model: allennlp.models.model.Model, instances: Iterable[allennlp.data.instance.Instance], data_iterator: allennlp.data.iterators.data_iterator.DataIterator, cuda_device: int, batch_weight_key: str) → Dict[str, Any][source]¶
-
allennlp.training.util.
get_batch_size
(batch: Union[Dict, torch.Tensor]) → int[source]¶ Returns the size of the batch dimension. Assumes a well-formed batch, returns 0 otherwise.
-
allennlp.training.util.
get_metrics
(model: allennlp.models.model.Model, total_loss: float, num_batches: int, reset: bool = False) → Dict[str, float][source]¶ Gets the metrics but sets
"loss"
to the total loss divided by thenum_batches
so that the"loss"
metric is “average loss per batch”.
-
allennlp.training.util.
move_optimizer_to_cuda
(optimizer)[source]¶ Move the optimizer state to GPU, if necessary. After calling, any parameter specific state in the optimizer will be located on the same device as the parameter.
-
allennlp.training.util.
rescale_gradients
(model: allennlp.models.model.Model, grad_norm: Union[float, NoneType] = None) → Union[float, NoneType][source]¶ Performs gradient rescaling. Is a no-op if gradient rescaling is not enabled.
-
allennlp.training.util.
sparse_clip_norm
(parameters, max_norm, norm_type=2) → float[source]¶ Clips gradient norm of an iterable of parameters.
The norm is computed over all gradients together, as if they were concatenated into a single vector. Gradients are modified in-place. Supports sparse gradients.
- Parameters
- parameters
(Iterable[torch.Tensor])
An iterable of Tensors that will have gradients normalized.
- max_norm
float
The max norm of the gradients.
- norm_type
float
The type of the used p-norm. Can be
'inf'
for infinity norm.
- parameters
- Returns
- Total norm of the parameters (viewed as a single vector).