initializers
allennlp.nn.initializers
An initializer is just a PyTorch function.
Here we implement a proxy class that allows us
to register them and supply any additional function arguments
(for example, the mean
and std
of a normal initializer)
as named arguments to the constructor.
The available initialization functions are
- "normal"
- "uniform"
- "constant"
- "eye"
- "dirac"
- "xavier_uniform"
- "xavier_normal"
- "kaiming_uniform"
- "kaiming_normal"
- "orthogonal"
- "sparse"
- "block_orthogonal"
- "uniform_unit_scaling"
- "pretrained"
Initializer¶
class Initializer(Registrable)
An initializer is really just a bare pytorch function. This class
is a proxy that allows us to implement Registrable
for those functions.
default_implementation¶
class Initializer(Registrable):
| ...
| default_implementation = "normal"
__call__¶
class Initializer(Registrable):
| ...
| def __call__(self, tensor: torch.Tensor, **kwargs) -> None
This function is here just to make mypy happy. We expect initialization functions to
follow this API; the builtin pytorch initialization functions follow this just fine, even
though they don't subclass Initialization
. We're just making it explicit here, so mypy
knows that initializers are callable like this.
uniform_unit_scaling¶
def uniform_unit_scaling(
tensor: torch.Tensor,
nonlinearity: str = "linear"
)
An initaliser which preserves output variance for approximately gaussian
distributed inputs. This boils down to initialising layers using a uniform
distribution in the range (-sqrt(3/dim[0]) * scale, sqrt(3 / dim[0]) * scale)
, where
dim[0]
is equal to the input dimension of the parameter and the scale
is a constant scaling factor which depends on the non-linearity used.
See Random Walk Initialisation for Training Very Deep Feedforward Networks
<https://www.semanticscholar.org/paper/Random-Walk-Initialization-for-Training-Very-Deep-Sussillo-Abbott/be9728a0728b6acf7a485225b1e41592176eda0b>
_
for more information.
Parameters¶
- tensor :
torch.Tensor
The tensor to initialise. - nonlinearity :
str
, optional (default ="linear"
)
The non-linearity which is performed after the projection that this tensor is involved in. This must be the name of a function contained in thetorch.nn.functional
package.
Returns¶
- The initialised tensor.
block_orthogonal¶
def block_orthogonal(
tensor: torch.Tensor,
split_sizes: List[int],
gain: float = 1.0
) -> None
An initializer which allows initializing model parameters in "blocks". This is helpful in the case of recurrent models which use multiple gates applied to linear projections, which can be computed efficiently if they are concatenated together. However, they are separate parameters which should be initialized independently.
Parameters¶
- tensor :
torch.Tensor
A tensor to initialize. - split_sizes :
List[int]
A list of lengthtensor.ndim()
specifying the size of the blocks along that particular dimension. E.g.[10, 20]
would result in the tensor being split into chunks of size 10 along the first dimension and 20 along the second. - gain :
float
, optional (default =1.0
)
The gain (scaling) applied to the orthogonal initialization.
zero¶
def zero(tensor: torch.Tensor) -> None
lstm_hidden_bias¶
def lstm_hidden_bias(tensor: torch.Tensor) -> None
Initialize the biases of the forget gate to 1, and all other gates to 0, following Jozefowicz et al., An Empirical Exploration of Recurrent Network Architectures
NormalInitializer¶
@Initializer.register("normal")
class NormalInitializer(_InitializerWrapper):
| def __init__(self, mean: float = 0.0, std: float = 0.1)
Registered as an Initializer
with name "normal".
OrthogonalInitializer¶
@Initializer.register("orthogonal")
class OrthogonalInitializer(_InitializerWrapper):
| def __init__(self, gain: float = 1.0)
Registered as an Initializer
with name "orthogonal".
UniformInitializer¶
@Initializer.register("uniform")
class UniformInitializer(_InitializerWrapper):
| def __init__(self, a: float = 0.0, b: float = 1.0)
Registered as an Initializer
with name "uniform".
ConstantInitializer¶
@Initializer.register("constant")
class ConstantInitializer(_InitializerWrapper):
| def __init__(self, val: float)
Registered as an Initializer
with name "constant".
DiracInitializer¶
@Initializer.register("dirac")
class DiracInitializer(_InitializerWrapper):
| def __init__(self)
Registered as an Initializer
with name "dirac".
XavierUniformInitializer¶
@Initializer.register("xavier_uniform")
class XavierUniformInitializer(_InitializerWrapper):
| def __init__(self, gain: float = 1.0)
Registered as an Initializer
with name "xavir_uniform".
XavierNormalInitializer¶
@Initializer.register("xavier_normal")
class XavierNormalInitializer(_InitializerWrapper):
| def __init__(self, gain: float = 1.0)
Registered as an Initializer
with name "xavier_normal".
KaimingUniformInitializer¶
@Initializer.register("kaiming_uniform")
class KaimingUniformInitializer(_InitializerWrapper):
| def __init__(
| self,
| a: float = 0.0,
| mode: str = "fan_in",
| nonlinearity: str = "leaky_relu"
| )
Registered as an Initializer
with name "kaiming_uniform".
KaimingNormalInitializer¶
@Initializer.register("kaiming_normal")
class KaimingNormalInitializer(_InitializerWrapper):
| def __init__(
| self,
| a: float = 0.0,
| mode: str = "fan_in",
| nonlinearity: str = "leaky_relu"
| )
Registered as an Initializer
with name "kaiming_normal".
SparseInitializer¶
@Initializer.register("sparse")
class SparseInitializer(_InitializerWrapper):
| def __init__(self, sparsity: float, std: float = 0.01)
Registered as an Initializer
with name "sparse".
EyeInitializer¶
@Initializer.register("eye")
class EyeInitializer(_InitializerWrapper):
| def __init__(self)
Registered as an Initializer
with name "eye".
BlockOrthogonalInitializer¶
@Initializer.register("block_orthogonal")
class BlockOrthogonalInitializer(_InitializerWrapper):
| def __init__(self, split_sizes: List[int], gain: float = 1.0)
Registered as an Initializer
with name "block_orthogonal".
UniformUnitScalingInitializer¶
@Initializer.register("uniform_unit_scaling")
class UniformUnitScalingInitializer(_InitializerWrapper):
| def __init__(self, nonlinearity: str = "linear")
Registered as an Initializer
with name "uniform_unit_scaling".
ZeroInitializer¶
@Initializer.register("zero")
class ZeroInitializer(_InitializerWrapper):
| def __init__(self)
Registered as an Initializer
with name "zero".
LstmHiddenBiasInitializer¶
@Initializer.register("lstm_hidden_bias")
class LstmHiddenBiasInitializer(_InitializerWrapper):
| def __init__(self)
Registered as an Initializer
with name "lstm_hidden_bias".
PretrainedModelInitializer¶
@Initializer.register("pretrained")
class PretrainedModelInitializer(Initializer):
| def __init__(
| self,
| weights_file_path: str,
| parameter_name_overrides: Dict[str, str] = None
| ) -> None
An initializer which allows initializing parameters using a pretrained model. The
initializer will load all of the weights from the weights_file_path
and use the
name of the new parameters to index into the pretrained parameters. Therefore,
by default, the names of the new and pretrained parameters must be the same.
However, this behavior can be overridden using the parameter_name_overrides
,
which remaps the name of the new parameter to the key which should be used
to index into the pretrained parameters.
The initializer will load all of the weights from the weights_file_path
regardless of which parameters will actually be used to initialize the new model.
So, if you need to initialize several parameters using a pretrained model, the most
memory-efficient way to do this is to use one PretrainedModelInitializer
per
weights file and use a regex to match all of the new parameters which need to be
initialized.
If you are using a configuration file to instantiate this object, the below entry
in the InitializerApplicator
parameters will initialize linear_1.weight
and
linear_2.weight
using a pretrained model. linear_1.weight
will be initialized
to the pretrained parameters called linear_1.weight
, but linear_2.weight
will
be initialized to the pretrained parameters called linear_3.weight
::
["linear_1.weight|linear_2.weight",
{
"type": "pretrained",
"weights_file_path": "best.th",
"parameter_name_overrides": {
"linear_2.weight": "linear_3.weight"
}
}
]
To initialize weights for all the parameters from a pretrained model (assuming their names remain unchanged), use the following instead:
[".*",
{
"type": "pretrained",
"weights_file_path": "best.th",
"parameter_name_overrides": {}
}
]
Registered as an Initializer
with name "pretrained".
Parameters¶
- weights_file_path :
str
The path to the weights file which has the pretrained model parameters. - parameter_name_overrides :
Dict[str, str]
, optional (default =None
)
The mapping from the new parameter name to the name which should be used to index into the pretrained model parameters. If a parameter name is not specified, the initializer will use the parameter's default name as the key.
__call__¶
class PretrainedModelInitializer(Initializer):
| ...
| def __call__(
| self,
| tensor: torch.Tensor,
| parameter_name: str,
| **kwargs
| ) -> None
InitializerApplicator¶
class InitializerApplicator(FromParams):
| def __init__(
| self,
| regexes: List[Tuple[str, Initializer]] = None,
| prevent_regexes: List[str] = None
| ) -> None
Applies initializers to the parameters of a Module based on regex matches. Any parameter not explicitly matching a regex will not be initialized, instead using whatever the default initialization was in the module's code.
If you are instantiating this object from a config file, an example configuration is as follows:
{
"regexes": [
["parameter_regex_match1",
{
"type": "normal"
"mean": 0.01
"std": 0.1
}
],
["parameter_regex_match2", "uniform"]
],
"prevent_regexes": ["prevent_init_regex"]
}
where the first item in each tuple under the regexes
parameters is the regex that matches to
parameters, and the second item specifies an Initializer.
These values can either be strings,
in which case they correspond to the names of initializers, or dictionaries, in which case they
must contain the "type" key, corresponding to the name of an initializer. In addition, they may
contain auxiliary named parameters which will be fed to the initializer itself. To determine
valid auxiliary parameters, please refer to the torch.nn.init documentation.
Parameters¶
-
regexes :
List[Tuple[str, Initializer]]
, optional (default =[]
)
A list mapping parameter regexes to initializers. We will check each parameter against each regex in turn, and apply the initializer paired with the first matching regex, if any. -
prevent_regexes :
List[str]
, optional (default =None
)
Any parameter name matching one of these regexes will not be initialized, regardless of whether it matches one of the regexes passed in theregexes
parameter.
__call__¶
class InitializerApplicator(FromParams):
| ...
| def __call__(self, module: torch.nn.Module) -> None
Applies an initializer to all parameters in a module that match one of the regexes we were given in this object's constructor. Does nothing to parameters that do not match.
Parameters¶
- module :
torch.nn.Module
The Pytorch module to apply the initializers to.