train
allennlp.commands.train
The train
subcommand can be used to train a model.
It requires a configuration file and a directory in
which to write the results.
Train¶
@Subcommand.register("train")
class Train(Subcommand)
add_subparser¶
class Train(Subcommand):
| ...
| def add_subparser(
| self,
| parser: argparse._SubParsersAction
| ) -> argparse.ArgumentParser
train_model_from_args¶
def train_model_from_args(args: argparse.Namespace)
Just converts from an argparse.Namespace
object to string paths.
train_model_from_file¶
def train_model_from_file(
parameter_filename: Union[str, PathLike],
serialization_dir: Union[str, PathLike],
overrides: Union[str, Dict[str, Any]] = "",
recover: bool = False,
force: bool = False,
node_rank: int = 0,
include_package: List[str] = None,
dry_run: bool = False,
file_friendly_logging: bool = False,
return_model: Optional[bool] = None
) -> Optional[Model]
A wrapper around train_model
which loads the params from a file.
Parameters¶
- parameter_filename :
str
A json parameter file specifying an AllenNLP experiment. - serialization_dir :
str
The directory in which to save results and logs. We just pass this along totrain_model
. - overrides :
Union[str, Dict[str, Any]]
, optional (default =""
)
A JSON string or a dict that we will use to override values in the input parameter file. - recover :
bool
, optional (default =False
)
IfTrue
, we will try to recover a training run from an existing serialization directory. This is only intended for use when something actually crashed during the middle of a run. For continuing training a model on new data, seeModel.from_archive
. - force :
bool
, optional (default =False
)
IfTrue
, we will overwrite the serialization directory if it already exists. - node_rank :
int
, optional
Rank of the current node in distributed training - include_package :
str
, optional
In distributed mode, extra packages mentioned will be imported in trainer workers. - dry_run :
bool
, optional (default =False
)
Do not train a model, but create a vocabulary, show dataset statistics and other training information. - file_friendly_logging :
bool
, optional (default =False
)
IfTrue
, we add newlines to tqdm output, even on an interactive terminal, and we slow down tqdm's output to only once every 10 seconds. - return_model :
Optional[bool]
, optional (default =None
)
Whether or not to return the final model. If not specified, this defaults toFalse
for distributed training andTrue
otherwise.
Returns¶
- best_model :
Optional[str]
The path to the archived model with the best weights orNone
if in dry run. - best_model :
Optional[Model]
The model with the best epoch weights orNone
, depending on the value ofreturn_model
anddry_run
.
train_model¶
def train_model(
params: Params,
serialization_dir: Union[str, PathLike],
recover: bool = False,
force: bool = False,
node_rank: int = 0,
include_package: List[str] = None,
dry_run: bool = False,
file_friendly_logging: bool = False,
return_model: Optional[bool] = None
) -> Optional[Model]
Trains the model specified in the given Params
object, using the data
and training parameters also specified in that object, and saves the results in serialization_dir
.
Parameters¶
- params :
Params
A parameter object specifying an AllenNLP Experiment. - serialization_dir :
str
The directory in which to save results and logs. - recover :
bool
, optional (default =False
)
IfTrue
, we will try to recover a training run from an existing serialization directory. This is only intended for use when something actually crashed during the middle of a run. For continuing training a model on new data, seeModel.from_archive
. - force :
bool
, optional (default =False
)
IfTrue
, we will overwrite the serialization directory if it already exists. - node_rank :
int
, optional
Rank of the current node in distributed training - include_package :
List[str]
, optional
In distributed mode, extra packages mentioned will be imported in trainer workers. - dry_run :
bool
, optional (default =False
)
Do not train a model, but create a vocabulary, show dataset statistics and other training information. - file_friendly_logging :
bool
, optional (default =False
)
IfTrue
, we add newlines to tqdm output, even on an interactive terminal, and we slow down tqdm's output to only once every 10 seconds. - return_model :
Optional[bool]
, optional (default =None
)
Whether or not to return the final model. If not specified, this defaults toFalse
for distributed training andTrue
otherwise.
Returns¶
- best_model :
Optional[Model]
The model with the best epoch weights orNone
, depending on the value ofreturn_model
anddry_run
.
TrainModel¶
class TrainModel(Registrable):
| def __init__(
| self,
| serialization_dir: str,
| model: Model,
| trainer: Trainer,
| evaluation_data_loader: DataLoader = None,
| evaluate_on_test: bool = False,
| batch_weight_key: str = ""
| ) -> None
This class exists so that we can easily read a configuration file with the allennlp train
command. The basic logic is that we call train_loop =
TrainModel.from_params(params_from_config_file)
, then train_loop.run()
. This class performs
very little logic, pushing most of it to the Trainer
that has a train()
method. The
point here is to construct all of the dependencies for the Trainer
in a way that we can do
it using from_params()
, while having all of those dependencies transparently documented and
not hidden in calls to params.pop()
. If you are writing your own training loop, you almost
certainly should not use this class, but you might look at the code for this class to see what
we do, to make writing your training loop easier.
In particular, if you are tempted to call the __init__
method of this class, you are probably
doing something unnecessary. Literally all we do after __init__
is call trainer.train()
. You
can do that yourself, if you've constructed a Trainer
already. What this class gives you is a
way to construct the Trainer
by means of a config file. The actual constructor that we use
with from_params
in this class is from_partial_objects
. See that method for a description
of all of the allowed top-level keys in a configuration file used with allennlp train
.
default_implementation¶
class TrainModel(Registrable):
| ...
| default_implementation = "default"
The default implementation is registered as 'default'.
run¶
class TrainModel(Registrable):
| ...
| def run(self) -> Dict[str, Any]
finish¶
class TrainModel(Registrable):
| ...
| def finish(self, metrics: Dict[str, Any])
from_partial_objects¶
class TrainModel(Registrable):
| ...
| @classmethod
| def from_partial_objects(
| cls,
| serialization_dir: str,
| local_rank: int,
| dataset_reader: DatasetReader,
| train_data_path: Any,
| model: Lazy[Model],
| data_loader: Lazy[DataLoader],
| trainer: Lazy[Trainer],
| vocabulary: Lazy[Vocabulary] = Lazy(Vocabulary),
| datasets_for_vocab_creation: List[str] = None,
| validation_dataset_reader: DatasetReader = None,
| validation_data_path: Any = None,
| validation_data_loader: Lazy[DataLoader] = None,
| test_data_path: Any = None,
| evaluate_on_test: bool = False,
| batch_weight_key: str = "",
| ddp_accelerator: Optional[DdpAccelerator] = None
| ) -> "TrainModel"
This method is intended for use with our FromParams
logic, to construct a TrainModel
object from a config file passed to the allennlp train
command. The arguments to this
method are the allowed top-level keys in a configuration file (except for the first three,
which are obtained separately).
You could use this outside of our FromParams
logic if you really want to, but there
might be easier ways to accomplish your goal than instantiating Lazy
objects. If you are
writing your own training loop, we recommend that you look at the implementation of this
method for inspiration and possibly some utility functions you can call, but you very likely
should not use this method directly.
The Lazy
type annotations here are a mechanism for building dependencies to an object
sequentially - the TrainModel
object needs data, a model, and a trainer, but the model
needs to see the data before it's constructed (to create a vocabulary) and the trainer needs
the data and the model before it's constructed. Objects that have sequential dependencies
like this are labeled as Lazy
in their type annotations, and we pass the missing
dependencies when we call their construct()
method, which you can see in the code below.
Parameters¶
-
serialization_dir :
str
The directory where logs and model archives will be saved.In a typical AllenNLP configuration file, this parameter does not get an entry as a top-level key, it gets passed in separately.
-
local_rank :
int
The process index that is initialized using the GPU device id.In a typical AllenNLP configuration file, this parameter does not get an entry as a top-level key, it gets passed in separately.
-
dataset_reader :
DatasetReader
TheDatasetReader
that will be used for training and (by default) for validation. -
train_data_path :
str
The file (or directory) that will be passed todataset_reader.read()
to construct the training data. -
model :
Lazy[Model]
The model that we will train. This is lazy because it depends on theVocabulary
; after constructing the vocabulary we callmodel.construct(vocab=vocabulary)
. -
data_loader :
Lazy[DataLoader]
The data_loader we use to batch instances from the dataset reader at training and (by default) validation time. This is lazy because it takes a dataset in it's constructor. -
trainer :
Lazy[Trainer]
TheTrainer
that actually implements the training loop. This is a lazy object because it depends on the model that's going to be trained. -
vocabulary :
Lazy[Vocabulary]
, optional (default =Lazy(Vocabulary)
)
TheVocabulary
that we will use to convert strings in the data to integer ids (and possibly set sizes of embedding matrices in theModel
). By default we construct the vocabulary from the instances that we read. -
datasets_for_vocab_creation :
List[str]
, optional (default =None
)
If you pass in more than one dataset but don't want to use all of them to construct a vocabulary, you can pass in this key to limit it. Valid entries in the list are "train", "validation" and "test". -
validation_dataset_reader :
DatasetReader
, optional (default =None
)
If given, we will use this dataset reader for the validation data instead ofdataset_reader
. -
validation_data_path :
str
, optional (default =None
)
If given, we will use this data for computing validation metrics and early stopping. -
validation_data_loader :
Lazy[DataLoader]
, optional (default =None
)
If given, the data_loader we use to batch instances from the dataset reader at validation and test time. This is lazy because it takes a dataset in it's constructor. -
test_data_path :
str
, optional (default =None
)
If given, we will use this as test data. This makes it available for vocab creation by default, but nothing else. -
evaluate_on_test :
bool
, optional (default =False
)
If given, we will evaluate the final model on this data at the end of training. Note that we do not recommend using this for actual test data in every-day experimentation; you should only very rarely evaluate your model on actual test data. -
batch_weight_key :
str
, optional (default =""
)
The name of metric used to weight the loss on a per-batch basis. This is only used during evaluation on final test data, if you've specifiedevaluate_on_test=True
. -
ddp_accelerator :
Optional[DdpAccelerator]
, optional (default =None
)
ADdpAccelerator
to use in distributed trainer. Passed to the model and the trainer.