Skip to content

train

allennlp.commands.train

[SOURCE]


The train subcommand can be used to train a model. It requires a configuration file and a directory in which to write the results.

Train

@Subcommand.register("train")
class Train(Subcommand)

add_subparser

class Train(Subcommand):
 | ...
 | def add_subparser(
 |     self,
 |     parser: argparse._SubParsersAction
 | ) -> argparse.ArgumentParser

train_model_from_args

def train_model_from_args(args: argparse.Namespace)

Just converts from an argparse.Namespace object to string paths.

train_model_from_file

def train_model_from_file(
    parameter_filename: Union[str, PathLike],
    serialization_dir: Union[str, PathLike],
    overrides: Union[str, Dict[str, Any]] = "",
    recover: bool = False,
    force: bool = False,
    node_rank: int = 0,
    include_package: List[str] = None,
    dry_run: bool = False,
    file_friendly_logging: bool = False,
    return_model: Optional[bool] = None
) -> Optional[Model]

A wrapper around train_model which loads the params from a file.

Parameters

  • parameter_filename : str
    A json parameter file specifying an AllenNLP experiment.
  • serialization_dir : str
    The directory in which to save results and logs. We just pass this along to train_model.
  • overrides : Union[str, Dict[str, Any]], optional (default = "")
    A JSON string or a dict that we will use to override values in the input parameter file.
  • recover : bool, optional (default = False)
    If True, we will try to recover a training run from an existing serialization directory. This is only intended for use when something actually crashed during the middle of a run. For continuing training a model on new data, see Model.from_archive.
  • force : bool, optional (default = False)
    If True, we will overwrite the serialization directory if it already exists.
  • node_rank : int, optional
    Rank of the current node in distributed training
  • include_package : str, optional
    In distributed mode, extra packages mentioned will be imported in trainer workers.
  • dry_run : bool, optional (default = False)
    Do not train a model, but create a vocabulary, show dataset statistics and other training information.
  • file_friendly_logging : bool, optional (default = False)
    If True, we add newlines to tqdm output, even on an interactive terminal, and we slow down tqdm's output to only once every 10 seconds.
  • return_model : Optional[bool], optional (default = None)
    Whether or not to return the final model. If not specified, this defaults to False for distributed training and True otherwise.

Returns

  • best_model : Optional[str]
    The path to the archived model with the best weights or None if in dry run.
  • best_model : Optional[Model]
    The model with the best epoch weights or None, depending on the value of return_model and dry_run.

train_model

def train_model(
    params: Params,
    serialization_dir: Union[str, PathLike],
    recover: bool = False,
    force: bool = False,
    node_rank: int = 0,
    include_package: List[str] = None,
    dry_run: bool = False,
    file_friendly_logging: bool = False,
    return_model: Optional[bool] = None
) -> Optional[Model]

Trains the model specified in the given Params object, using the data and training parameters also specified in that object, and saves the results in serialization_dir.

Parameters

  • params : Params
    A parameter object specifying an AllenNLP Experiment.
  • serialization_dir : str
    The directory in which to save results and logs.
  • recover : bool, optional (default = False)
    If True, we will try to recover a training run from an existing serialization directory. This is only intended for use when something actually crashed during the middle of a run. For continuing training a model on new data, see Model.from_archive.
  • force : bool, optional (default = False)
    If True, we will overwrite the serialization directory if it already exists.
  • node_rank : int, optional
    Rank of the current node in distributed training
  • include_package : List[str], optional
    In distributed mode, extra packages mentioned will be imported in trainer workers.
  • dry_run : bool, optional (default = False)
    Do not train a model, but create a vocabulary, show dataset statistics and other training information.
  • file_friendly_logging : bool, optional (default = False)
    If True, we add newlines to tqdm output, even on an interactive terminal, and we slow down tqdm's output to only once every 10 seconds.
  • return_model : Optional[bool], optional (default = None)
    Whether or not to return the final model. If not specified, this defaults to False for distributed training and True otherwise.

Returns

  • best_model : Optional[Model]
    The model with the best epoch weights or None, depending on the value of return_model and dry_run.

TrainModel

class TrainModel(Registrable):
 | def __init__(
 |     self,
 |     serialization_dir: str,
 |     model: Model,
 |     trainer: Trainer,
 |     evaluation_data_loader: DataLoader = None,
 |     evaluate_on_test: bool = False,
 |     batch_weight_key: str = ""
 | ) -> None

This class exists so that we can easily read a configuration file with the allennlp train command. The basic logic is that we call train_loop = TrainModel.from_params(params_from_config_file), then train_loop.run(). This class performs very little logic, pushing most of it to the Trainer that has a train() method. The point here is to construct all of the dependencies for the Trainer in a way that we can do it using from_params(), while having all of those dependencies transparently documented and not hidden in calls to params.pop(). If you are writing your own training loop, you almost certainly should not use this class, but you might look at the code for this class to see what we do, to make writing your training loop easier.

In particular, if you are tempted to call the __init__ method of this class, you are probably doing something unnecessary. Literally all we do after __init__ is call trainer.train(). You can do that yourself, if you've constructed a Trainer already. What this class gives you is a way to construct the Trainer by means of a config file. The actual constructor that we use with from_params in this class is from_partial_objects. See that method for a description of all of the allowed top-level keys in a configuration file used with allennlp train.

default_implementation

class TrainModel(Registrable):
 | ...
 | default_implementation = "default"

The default implementation is registered as 'default'.

run

class TrainModel(Registrable):
 | ...
 | def run(self) -> Dict[str, Any]

finish

class TrainModel(Registrable):
 | ...
 | def finish(self, metrics: Dict[str, Any])

from_partial_objects

class TrainModel(Registrable):
 | ...
 | @classmethod
 | def from_partial_objects(
 |     cls,
 |     serialization_dir: str,
 |     local_rank: int,
 |     dataset_reader: DatasetReader,
 |     train_data_path: Any,
 |     model: Lazy[Model],
 |     data_loader: Lazy[DataLoader],
 |     trainer: Lazy[Trainer],
 |     vocabulary: Lazy[Vocabulary] = Lazy(Vocabulary),
 |     datasets_for_vocab_creation: List[str] = None,
 |     validation_dataset_reader: DatasetReader = None,
 |     validation_data_path: Any = None,
 |     validation_data_loader: Lazy[DataLoader] = None,
 |     test_data_path: Any = None,
 |     evaluate_on_test: bool = False,
 |     batch_weight_key: str = "",
 |     ddp_accelerator: Optional[DdpAccelerator] = None
 | ) -> "TrainModel"

This method is intended for use with our FromParams logic, to construct a TrainModel object from a config file passed to the allennlp train command. The arguments to this method are the allowed top-level keys in a configuration file (except for the first three, which are obtained separately).

You could use this outside of our FromParams logic if you really want to, but there might be easier ways to accomplish your goal than instantiating Lazy objects. If you are writing your own training loop, we recommend that you look at the implementation of this method for inspiration and possibly some utility functions you can call, but you very likely should not use this method directly.

The Lazy type annotations here are a mechanism for building dependencies to an object sequentially - the TrainModel object needs data, a model, and a trainer, but the model needs to see the data before it's constructed (to create a vocabulary) and the trainer needs the data and the model before it's constructed. Objects that have sequential dependencies like this are labeled as Lazy in their type annotations, and we pass the missing dependencies when we call their construct() method, which you can see in the code below.

Parameters

  • serialization_dir : str
    The directory where logs and model archives will be saved.

    In a typical AllenNLP configuration file, this parameter does not get an entry as a top-level key, it gets passed in separately.

  • local_rank : int
    The process index that is initialized using the GPU device id.

    In a typical AllenNLP configuration file, this parameter does not get an entry as a top-level key, it gets passed in separately.

  • dataset_reader : DatasetReader
    The DatasetReader that will be used for training and (by default) for validation.

  • train_data_path : str
    The file (or directory) that will be passed to dataset_reader.read() to construct the training data.

  • model : Lazy[Model]
    The model that we will train. This is lazy because it depends on the Vocabulary; after constructing the vocabulary we call model.construct(vocab=vocabulary).

  • data_loader : Lazy[DataLoader]
    The data_loader we use to batch instances from the dataset reader at training and (by default) validation time. This is lazy because it takes a dataset in it's constructor.

  • trainer : Lazy[Trainer]
    The Trainer that actually implements the training loop. This is a lazy object because it depends on the model that's going to be trained.

  • vocabulary : Lazy[Vocabulary], optional (default = Lazy(Vocabulary))
    The Vocabulary that we will use to convert strings in the data to integer ids (and possibly set sizes of embedding matrices in the Model). By default we construct the vocabulary from the instances that we read.

  • datasets_for_vocab_creation : List[str], optional (default = None)
    If you pass in more than one dataset but don't want to use all of them to construct a vocabulary, you can pass in this key to limit it. Valid entries in the list are "train", "validation" and "test".

  • validation_dataset_reader : DatasetReader, optional (default = None)
    If given, we will use this dataset reader for the validation data instead of dataset_reader.

  • validation_data_path : str, optional (default = None)
    If given, we will use this data for computing validation metrics and early stopping.

  • validation_data_loader : Lazy[DataLoader], optional (default = None)
    If given, the data_loader we use to batch instances from the dataset reader at validation and test time. This is lazy because it takes a dataset in it's constructor.

  • test_data_path : str, optional (default = None)
    If given, we will use this as test data. This makes it available for vocab creation by default, but nothing else.

  • evaluate_on_test : bool, optional (default = False)
    If given, we will evaluate the final model on this data at the end of training. Note that we do not recommend using this for actual test data in every-day experimentation; you should only very rarely evaluate your model on actual test data.

  • batch_weight_key : str, optional (default = "")
    The name of metric used to weight the loss on a per-batch basis. This is only used during evaluation on final test data, if you've specified evaluate_on_test=True.

  • ddp_accelerator : Optional[DdpAccelerator], optional (default = None)
    A DdpAccelerator to use in distributed trainer. Passed to the model and the trainer.