allennlp.training.util¶
Helper functions for Trainers
-
allennlp.training.util.create_serialization_dir(params: allennlp.common.params.Params, serialization_dir: str, recover: bool, force: bool) → None[source]¶ This function creates the serialization directory if it doesn’t exist. If it already exists and is non-empty, then it verifies that we’re recovering from a training with an identical configuration.
- Parameters
- params: ``Params``
A parameter object specifying an AllenNLP Experiment.
- serialization_dir: ``str``
The directory in which to save results and logs.
- recover: ``bool``
If
True, we will try to recover from an existing serialization directory, and crash if the directory doesn’t exist, or doesn’t match the configuration we’re given.- force: ``bool``
If
True, we will overwrite the serialization directory if it already exists.
-
allennlp.training.util.data_parallel(batch_group: List[Dict[str, Union[torch.Tensor, Dict[str, torch.Tensor]]]], model: allennlp.models.model.Model, cuda_devices: List) → Dict[str, torch.Tensor][source]¶ Performs a forward pass using multiple GPUs. This is a simplification of torch.nn.parallel.data_parallel to support the allennlp model interface.
-
allennlp.training.util.datasets_from_params(params: allennlp.common.params.Params, cache_directory: str = None, cache_prefix: str = None) → Dict[str, Iterable[allennlp.data.instance.Instance]][source]¶ Load all the datasets specified by the config.
- Parameters
- params
Params - cache_directory
str, optional If given, we will instruct the
DatasetReadersthat we construct to cache their instances in this location (or read their instances from caches in this location, if a suitable cache already exists). This is essentially a base directory for the cache, as we will additionally add thecache_prefixto this directory, giving an actual cache location ofcache_directory + cache_prefix.- cache_prefix
str, optional This works in conjunction with the
cache_directory. The idea is that thecache_directorycontains caches for all different parameter settings, while thecache_prefixcaptures a specific set of parameters that led to a particular cache file. That is, if you change the tokenization settings inside yourDatasetReader, you don’t want to read cached data that used the old settings. In order to avoid this, we compute a hash of the parameters used to construct eachDatasetReaderand use that as a “prefix” to the cache files inside the basecache_directory. So, a giveninput_filewould be cached essentially ascache_directory + cache_prefix + input_file, where you specify acache_directory, thecache_prefixis based on the dataset reader parameters, and theinput_fileis whatever path you provided toDatasetReader.read(). In order to allow you to give recognizable names to these prefixes if you want them, you can manually specify thecache_prefix. Note that in some rare cases this can be dangerous, as we’ll use the same prefix for both train and validation dataset readers.
- params
-
allennlp.training.util.enable_gradient_clipping(model: allennlp.models.model.Model, grad_clipping: Union[float, NoneType]) → None[source]¶
-
allennlp.training.util.evaluate(model: allennlp.models.model.Model, instances: Iterable[allennlp.data.instance.Instance], data_iterator: allennlp.data.iterators.data_iterator.DataIterator, cuda_device: int, batch_weight_key: str) → Dict[str, Any][source]¶
-
allennlp.training.util.get_batch_size(batch: Union[Dict, torch.Tensor]) → int[source]¶ Returns the size of the batch dimension. Assumes a well-formed batch, returns 0 otherwise.
-
allennlp.training.util.get_metrics(model: allennlp.models.model.Model, total_loss: float, num_batches: int, reset: bool = False) → Dict[str, float][source]¶ Gets the metrics but sets
"loss"to the total loss divided by thenum_batchesso that the"loss"metric is “average loss per batch”.
-
allennlp.training.util.move_optimizer_to_cuda(optimizer)[source]¶ Move the optimizer state to GPU, if necessary. After calling, any parameter specific state in the optimizer will be located on the same device as the parameter.
-
allennlp.training.util.rescale_gradients(model: allennlp.models.model.Model, grad_norm: Union[float, NoneType] = None) → Union[float, NoneType][source]¶ Performs gradient rescaling. Is a no-op if gradient rescaling is not enabled.
-
allennlp.training.util.sparse_clip_norm(parameters, max_norm, norm_type=2) → float[source]¶ Clips gradient norm of an iterable of parameters.
The norm is computed over all gradients together, as if they were concatenated into a single vector. Gradients are modified in-place. Supports sparse gradients.
- Parameters
- parameters
(Iterable[torch.Tensor]) An iterable of Tensors that will have gradients normalized.
- max_norm
float The max norm of the gradients.
- norm_type
float The type of the used p-norm. Can be
'inf'for infinity norm.
- parameters
- Returns
- Total norm of the parameters (viewed as a single vector).