Skip to content

initializers

[ allennlp.nn.initializers ]


An initializer is just a PyTorch function. Here we implement a proxy class that allows us to register them and supply any additional function arguments (for example, the mean and std of a normal initializer) as named arguments to the constructor.

The available initialization functions are

Initializer Objects#

class Initializer(Registrable)

An initializer is really just a bare pytorch function. This class is a proxy that allows us to implement Registrable for those functions.

default_implementation#

default_implementation = "normal"

uniform_unit_scaling#

def uniform_unit_scaling(
    tensor: torch.Tensor,
    nonlinearity: str = "linear"
)

An initaliser which preserves output variance for approximately gaussian distributed inputs. This boils down to initialising layers using a uniform distribution in the range (-sqrt(3/dim[0]) * scale, sqrt(3 / dim[0]) * scale), where dim[0] is equal to the input dimension of the parameter and the scale is a constant scaling factor which depends on the non-linearity used.

See Random Walk Initialisation for Training Very Deep Feedforward Networks <https://www.semanticscholar.org/paper/Random-Walk-Initialization-for-Training-Very-Deep-Sussillo-Abbott/be9728a0728b6acf7a485225b1e41592176eda0b>_ for more information.

Parameters

  • tensor : torch.Tensor
    The tensor to initialise.
  • nonlinearity : str, optional (default = "linear")
    The non-linearity which is performed after the projection that this tensor is involved in. This must be the name of a function contained in the torch.nn.functional package.

Returns

  • The initialised tensor.

block_orthogonal#

def block_orthogonal(
    tensor: torch.Tensor,
    split_sizes: List[int],
    gain: float = 1.0
) -> None

An initializer which allows initializing model parameters in "blocks". This is helpful in the case of recurrent models which use multiple gates applied to linear projections, which can be computed efficiently if they are concatenated together. However, they are separate parameters which should be initialized independently.

Parameters

  • tensor : torch.Tensor
    A tensor to initialize.
  • split_sizes : List[int]
    A list of length tensor.ndim() specifying the size of the blocks along that particular dimension. E.g. [10, 20] would result in the tensor being split into chunks of size 10 along the first dimension and 20 along the second.
  • gain : float, optional (default = 1.0)
    The gain (scaling) applied to the orthogonal initialization.

zero#

def zero(tensor: torch.Tensor) -> None

lstm_hidden_bias#

def lstm_hidden_bias(tensor: torch.Tensor) -> None

Initialize the biases of the forget gate to 1, and all other gates to 0, following Jozefowicz et al., An Empirical Exploration of Recurrent Network Architectures

NormalInitializer Objects#

class NormalInitializer(_InitializerWrapper):
 | def __init__(self, mean: float = 0.0, std: float = 0.1)

Registered as an Initializer with name "normal".

OrthogonalInitializer Objects#

class OrthogonalInitializer(_InitializerWrapper):
 | def __init__(self, gain: float = 1.0)

Registered as an Initializer with name "orthogonal".

UniformInitializer Objects#

class UniformInitializer(_InitializerWrapper):
 | def __init__(self, a: float = 0.0, b: float = 1.0)

Registered as an Initializer with name "uniform".

ConstantInitializer Objects#

class ConstantInitializer(_InitializerWrapper):
 | def __init__(self, val: float)

Registered as an Initializer with name "constant".

DiracInitializer Objects#

class DiracInitializer(_InitializerWrapper):
 | def __init__(self)

Registered as an Initializer with name "dirac".

XavierUniformInitializer Objects#

class XavierUniformInitializer(_InitializerWrapper):
 | def __init__(self, gain: float = 1.0)

Registered as an Initializer with name "xavir_uniform".

XavierNormalInitializer Objects#

class XavierNormalInitializer(_InitializerWrapper):
 | def __init__(self, gain: float = 1.0)

Registered as an Initializer with name "xavier_normal".

KaimingUniformInitializer Objects#

class KaimingUniformInitializer(_InitializerWrapper):
 | def __init__(
 |     self,
 |     a: float = 0.0,
 |     mode: str = "fan_in",
 |     nonlinearity: str = "leaky_relu"
 | )

Registered as an Initializer with name "kaiming_uniform".

KaimingNormalInitializer Objects#

class KaimingNormalInitializer(_InitializerWrapper):
 | def __init__(
 |     self,
 |     a: float = 0.0,
 |     mode: str = "fan_in",
 |     nonlinearity: str = "leaky_relu"
 | )

Registered as an Initializer with name "kaiming_normal".

SparseInitializer Objects#

class SparseInitializer(_InitializerWrapper):
 | def __init__(self, sparsity: float, std: float = 0.01)

Registered as an Initializer with name "sparse".

EyeInitializer Objects#

class EyeInitializer(_InitializerWrapper):
 | def __init__(self)

Registered as an Initializer with name "eye".

BlockOrthogonalInitializer Objects#

class BlockOrthogonalInitializer(_InitializerWrapper):
 | def __init__(self, split_sizes: List[int], gain: float = 1.0)

Registered as an Initializer with name "block_orthogonal".

UniformUnitScalingInitializer Objects#

class UniformUnitScalingInitializer(_InitializerWrapper):
 | def __init__(self, nonlinearity: str = "linear")

Registered as an Initializer with name "uniform_unit_scaling".

ZeroInitializer Objects#

class ZeroInitializer(_InitializerWrapper):
 | def __init__(self)

Registered as an Initializer with name "zero".

LstmHiddenBiasInitializer Objects#

class LstmHiddenBiasInitializer(_InitializerWrapper):
 | def __init__(self)

Registered as an Initializer with name "lstm_hidden_bias".

PretrainedModelInitializer Objects#

class PretrainedModelInitializer(Initializer):
 | def __init__(
 |     self,
 |     weights_file_path: str,
 |     parameter_name_overrides: Dict[str, str] = None
 | ) -> None

An initializer which allows initializing parameters using a pretrained model. The initializer will load all of the weights from the weights_file_path and use the name of the new parameters to index into the pretrained parameters. Therefore, by default, the names of the new and pretrained parameters must be the same. However, this behavior can be overridden using the parameter_name_overrides, which remaps the name of the new parameter to the key which should be used to index into the pretrained parameters.

The initializer will load all of the weights from the weights_file_path regardless of which parameters will actually be used to initialize the new model. So, if you need to initialize several parameters using a pretrained model, the most memory-efficient way to do this is to use one PretrainedModelInitializer per weights file and use a regex to match all of the new parameters which need to be initialized.

If you are using a configuration file to instantiate this object, the below entry in the InitializerApplicator parameters will initialize linear_1.weight and linear_2.weight using a pretrained model. linear_1.weight will be initialized to the pretrained parameters called linear_1.weight, but linear_2.weight will be initialized to the pretrained parameters called linear_3.weight::

   ["linear_1.weight|linear_2.weight",
       {
           "type": "pretrained",
           "weights_file_path": "best.th",
           "parameter_name_overrides": {
               "linear_2.weight": "linear_3.weight"
           }
       }
   ]

To initialize weights for all the parameters from a pretrained model (assuming their names remain unchanged), use the following instead:

        [".*",
            {
                "type": "pretrained",
                "weights_file_path": "best.th",
                "parameter_name_overrides": {}
            }
        ]

Registered as an Initializer with name "pretrained".

Parameters

  • weights_file_path : str
    The path to the weights file which has the pretrained model parameters.
  • parameter_name_overrides : Dict[str, str], optional (default = None)
    The mapping from the new parameter name to the name which should be used to index into the pretrained model parameters. If a parameter name is not specified, the initializer will use the parameter's default name as the key.

InitializerApplicator Objects#

class InitializerApplicator(FromParams):
 | def __init__(
 |     self,
 |     regexes: List[Tuple[str, Initializer]] = None,
 |     prevent_regexes: List[str] = None
 | ) -> None

Applies initializers to the parameters of a Module based on regex matches. Any parameter not explicitly matching a regex will not be initialized, instead using whatever the default initialization was in the module's code.

If you are instantiating this object from a config file, an example configuration is as follows:

{
    "regexes": [
        ["parameter_regex_match1",
            {
                "type": "normal"
                "mean": 0.01
                "std": 0.1
            }
        ],
        ["parameter_regex_match2", "uniform"]
    ],
    "prevent_regexes": ["prevent_init_regex"]
}

where the first item in each tuple under the regexes parameters is the regex that matches to parameters, and the second item specifies an Initializer. These values can either be strings, in which case they correspond to the names of initializers, or dictionaries, in which case they must contain the "type" key, corresponding to the name of an initializer. In addition, they may contain auxiliary named parameters which will be fed to the initializer itself. To determine valid auxiliary parameters, please refer to the torch.nn.init documentation.

Parameters

  • regexes : List[Tuple[str, Initializer]], optional (default = [])
    A list mapping parameter regexes to initializers. We will check each parameter against each regex in turn, and apply the initializer paired with the first matching regex, if any.

  • prevent_regexes : List[str], optional (default = None)
    Any parameter name matching one of these regexes will not be initialized, regardless of whether it matches one of the regexes passed in the regexes parameter.