Skip to content

archival

allennlp.models.archival

[SOURCE]


Helper functions for archiving models and restoring archived models.

Archive

class Archive(NamedTuple)

An archive comprises a Model and its experimental config

model

class Archive(NamedTuple):
 | ...
 | model: Model = None

config

class Archive(NamedTuple):
 | ...
 | config: Params = None

dataset_reader

class Archive(NamedTuple):
 | ...
 | dataset_reader: DatasetReader = None

validation_dataset_reader

class Archive(NamedTuple):
 | ...
 | validation_dataset_reader: DatasetReader = None

meta

class Archive(NamedTuple):
 | ...
 | meta: Optional[Meta] = None

extract_module

class Archive(NamedTuple):
 | ...
 | def extract_module(self, path: str, freeze: bool = True) -> Module

This method can be used to load a module from the pretrained model archive.

It is also used implicitly in FromParams based construction. So instead of using standard params to construct a module, you can instead load a pretrained module from the model archive directly. For eg, instead of using params like {"type": "module_type", ...}, you can use the following template::

{
    "_pretrained": {
        "archive_file": "../path/to/model.tar.gz",
        "path": "path.to.module.in.model",
        "freeze": False
    }
}

If you use this feature with FromParams, take care of the following caveat: Call to initializer(self) at end of model initializer can potentially wipe the transferred parameters by reinitializing them. This can happen if you have setup initializer regex that also matches parameters of the transferred module. To safe-guard against this, you can either update your initializer regex to prevent conflicting match or add extra initializer::

[
    [".*transferred_module_name.*", "prevent"]]
]

Parameters

  • path : str
    Path of target module to be loaded from the model. Eg. "_textfield_embedder.token_embedder_tokens"
  • freeze : bool, optional (default = True)
    Whether to freeze the module parameters or not.

CONFIG_NAME

CONFIG_NAME = "config.json"

verify_include_in_archive

def verify_include_in_archive(
    include_in_archive: Optional[List[str]] = None
)

archive_model

def archive_model(
    serialization_dir: Union[str, PathLike],
    weights: str = _DEFAULT_WEIGHTS,
    archive_path: Union[str, PathLike] = None,
    include_in_archive: Optional[List[str]] = None
) -> str

Archive the model weights, its training configuration, and its vocabulary to model.tar.gz.

Parameters

  • serialization_dir : str
    The directory where the weights and vocabulary are written out.
  • weights : str, optional (default = _DEFAULT_WEIGHTS)
    Which weights file to include in the archive. The default is best.th.
  • archive_path : str, optional (default = None)
    A full path to serialize the model to. The default is "model.tar.gz" inside the serialization_dir. If you pass a directory here, we'll serialize the model to "model.tar.gz" inside the directory.
  • include_in_archive : List[str], optional (default = None)
    Paths relative to serialization_dir that should be archived in addition to the default ones.

Returns

  • The final archive path.

load_archive

def load_archive(
    archive_file: Union[str, PathLike],
    cuda_device: int = -1,
    overrides: Union[str, Dict[str, Any]] = "",
    weights_file: str = None
) -> Archive

Instantiates an Archive from an archived tar.gz file.

Parameters

  • archive_file : Union[str, PathLike]
    The archive file to load the model from.
  • cuda_device : int, optional (default = -1)
    If cuda_device is >= 0, the model will be loaded onto the corresponding GPU. Otherwise it will be loaded onto the CPU.
  • overrides : Union[str, Dict[str, Any]], optional (default = "")
    JSON overrides to apply to the unarchived Params object.
  • weights_file : str, optional (default = None)
    The weights file to use. If unspecified, weights.th in the archive_file will be used.

get_weights_path

def get_weights_path(serialization_dir)

extracted_archive

@contextmanager
def extracted_archive(resolved_archive_file, cleanup=True)