Changelog¶
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
v2.10.1 - 2022-10-18¶
Fixed¶
- Updated dependencies
v2.10.0 - 2022-07-14¶
Added¶
- Added metric
FBetaVerboseMeasurewhich extendsFBetaMeasureto ensure compatibility with logging plugins and add some options. - Added three sample weighting techniques to
ConditionalRandomFieldby supplying three new subclasses:ConditionalRandomFieldWeightEmission,ConditionalRandomFieldWeightTrans, andConditionalRandomFieldWeightLannoy.
Fixed¶
- Fix error from
cached-pathversion update.
v2.9.3 - 2022-04-13¶
Added¶
- Added
verification_tokensargument toTestPretrainedTransformerTokenizer.
Fixed¶
- Updated various dependencies
v2.9.2 - 2022-03-21¶
Fixed¶
- Removed unnecessary dependencies
- Restore functionality of CLI in absence of now-optional checklist-package
v2.9.1 - 2022-03-09¶
Fixed¶
- Updated dependencies, especially around doc creation.
- Running the test suite out-of-tree (e.g. after installation) is now possible by pointing the environment variable
ALLENNLP_SRC_DIRto the sources. - Silenced a warning that happens when you inappropriately clone a tensor.
- Adding more clarification to the
Vocabularydocumentation aroundmin_pretrained_embeddingsandonly_include_pretrained_words. - Fixed bug with type mismatch caused by latest release of
cached-paththat now returns aPathinstead of astr.
Added¶
- We can now transparently read compressed input files during prediction.
- LZMA compression is now supported.
- Added a way to give JSON blobs as input to dataset readers in the
evaluatecommand. - Added the argument
sub_moduleinPretrainedTransformerMismatchedEmbedder - Updated the docs for
PytorchSeq2VecWrapperto specify thatmaskis required rather than sequence lengths for clarity.
Changed¶
- You can automatically include all words from a pretrained file when building a vocabulary by setting the value in
min_pretrained_embeddingsto-1for that particular namespace.
v2.9.0 - 2022-01-27¶
Added¶
- Added an
Evaluatorclass to make comparing source, target, and predictions easier. - Added a way to resize the vocabulary in the T5 module
- Added an argument
reinit_modulestocached_transformers.get()that allows you to re-initialize the pretrained weights of a transformer model, using layer indices or regex strings. - Added attribute
_should_validate_this_epochtoGradientDescentTrainerthat controls whether validation is run at the end of each epoch. - Added
ShouldValidateCallbackthat can be used to configure the frequency of validation during training. - Added a
MaxPoolingSpanExtractor. ThisSpanExtractorrepresents each span by a component wise max-pooling-operation. - Added support for
dist_metrickwarg in initializing fairness metrics, which allows optionally usingwassersteindistance (previously only KL divergence was supported).
Fixed¶
- Fixed the docstring information for the
FBetaMultiLabelMeasuremetric. - Various fixes for Python 3.9
- Fixed the name that the
push-to-hfcommand uses to store weights. FBetaMultiLabelMeasurenow works with multiple dimensions- Support for inferior operating systems when making hardlinks
- Use
,as a separator for filenames in theevaluatecommand, thus allowing for URLs (eg.gs://...) as input files. - Removed a spurious error message "'torch.cuda' has no attribute '_check_driver'" that would be appear in the logs
when a
ConfigurationErrorfor missing GPU was raised. - Load model on CPU post training to save GPU memory.
- Fixed a bug in
ShouldValidateCallbackthat leads to validation occuring after the first epoch regardless ofvalidation_startvalue. - Fixed a bug in
ShouldValidateCallbackthat leads to validation occuring everyvalidation_interval + 1epochs, instead of everyvalidation_intervalepochs. - Fixed a bug in
ShouldValidateCallbackthat leads to validation never occuring at the end of training.
Removed¶
- Removed dependency on the overrides package
- Removed Tango components, since they now live at https://github.com/allenai/tango.
Changed¶
- Make
checklistan optional dependency.
v2.8.0 - 2021-11-01¶
Added¶
- Added support to push models directly to the Hugging Face Hub with the command
allennlp push-to-hf. - More default tests for the
TextualEntailmentSuite.
Changed¶
- The behavior of
--overrideshas changed. Previously the final configuration params were simply taken as the union over the original params and the--overridesparams. But now you can use--overridesto completely replace any part of the original config. For example, passing--overrides '{"model":{"type":"foo"}}'will completely replace the "model" part of the original config. However, when you just want to change a single field in the JSON structure without removing / replacing adjacent fields, you can still use the "dot" syntax. For example,--overrides '{"model.num_layers":3}'will only change thenum_layersparameter to the "model" part of the config, leaving everything else unchanged. - Integrated
cached_pathlibrary to replace existing functionality incommon.file_utils. This introduces some improvements without any breaking changes.
Fixed¶
- Fixed the implementation of
PairedPCABiasDirectioninallennlp.fairness.bias_direction, where the difference vectors should not be centered when performing the PCA. - Fixed the docstring of
ExponentialMovingAverage, which was causing its argument descriptions to render inccorrectly in the docs.
v2.7.0 - 2021-09-01¶
Added¶
- Added in a default behavior to the
_to_paramsmethod ofRegistrableso that in the case it is not implemented by the child class, it will still produce a parameter dictionary. - Added in
_to_paramsimplementations to all tokenizers. - Added support to evaluate mutiple datasets and produce corresponding output files in the
evaluatecommand. - Added more documentation to the learning rate schedulers to include a sample config object for how to use it.
- Moved the pytorch learning rate schedulers wrappers to their own file called
pytorch_lr_schedulers.pyso that they will have their own documentation page. - Added a module
allennlp.nn.parallelwith a new base class,DdpAccelerator, which generalizes PyTorch'sDistributedDataParallelwrapper to support other implementations. Two implementations of this class are provided. The default isTorchDdpAccelerator(registered at "torch"), which is just a thin wrapper aroundDistributedDataParallel. The other isFairScaleFsdpAccelerator, which wraps FairScale'sFullyShardedDataParallel. You can specify theDdpAcceleratorin the "distributed" section of a configuration file under the key "ddp_accelerator". - Added a module
allennlp.nn.checkpointwith a new base class,CheckpointWrapper, for implementations of activation/gradient checkpointing. Two implentations are provided. The default implementation isTorchCheckpointWrapper(registered as "torch"), which exposes PyTorch's checkpoint functionality. The other isFairScaleCheckpointWrapperwhich exposes the more flexible checkpointing funtionality from FairScale. - The
Modelbase class now takes addp_acceleratorparameter (an instance ofDdpAccelerator) which will be available asself.ddp_acceleratorduring distributed training. This is useful when, for example, instantiating submodules in your model's__init__()method by wrapping them withself.ddp_accelerator.wrap_module(). See theallennlp.modules.transformer.t5for an example. - We now log batch metrics to tensorboard and wandb.
- Added Tango components, to be explored in detail in a later post
- Added
ScaledDotProductMatrixAttention, and converted the transformer toolkit to use it - Added tests to ensure that all
AttentionandMatrixAttentionimplementations are interchangeable - Added a way for AllenNLP Tango to read and write datasets lazily.
- Added a way to remix datasets flexibly
- Added
from_pretrained_transformer_and_instancesconstructor toVocabulary TransformerTextFieldnow supports__len__.
Fixed¶
- Fixed a bug in
ConditionalRandomField:transitionsandtag_sequencetensors were not initialized on the desired device causing high CPU usage (see https://github.com/allenai/allennlp/issues/2884) - Fixed a mispelling: the parameter
contructor_extrasinLazy()is now correctly calledconstructor_extras. - Fixed broken links in
allennlp.nn.initializersdocs. - Fixed bug in
BeamSearchwherelast_backpointerswas not being passed to anyConstraints. TransformerTextFieldcan now take tensors of shape(1, n)like the tensors produced from a HuggingFace tokenizer.tqdmlock is now set insideMultiProcessDataLoadingwhen new workers are spawned to avoid contention when writing output.ConfigurationErroris now pickleable.- Checkpointer cleaning was fixed to work on Windows Paths
- Multitask models now support
TextFieldTensorin heads, not just in the backbone. - Fixed the signature of
ScaledDotProductAttentionto match the otherAttentionclasses allennlpcommands will now catchSIGTERMsignals and handle them similar toSIGINT(keyboard interrupt).- The
MultiProcessDataLoaderwill properly shutdown its workers when aSIGTERMis received. - Fixed the way names are applied to Tango
Stepinstances. - Fixed a bug in calculating loss in the distributed setting.
- Fixed a bug when extending a sparse sequence by 0 items.
Changed¶
- The type of the
grad_normparameter ofGradientDescentTraineris nowUnion[float, bool], with a default value ofFalse.Falsemeans gradients are not rescaled and the gradient norm is never even calculated.Truemeans the gradients are still not rescaled but the gradient norm is calculated and passed on to callbacks. Afloatvalue means gradients are rescaled. TensorCachenow supports more concurrent readers and writers.- We no longer log parameter statistics to tensorboard or wandb by default.
v2.6.0 - 2021-07-19¶
Added¶
- Added
on_backwardtraining callback which allows for control over backpropagation and gradient manipulation. - Added
AdversarialBiasMitigator, a Model wrapper to adversarially mitigate biases in predictions produced by a pretrained model for a downstream task. - Added
which_lossparameter toensure_model_can_train_save_and_loadinModelTestCaseto specify which loss to test. - Added
**kwargstoPredictor.from_path(). These key-word argument will be passed on to thePredictor's constructor. - The activation layer in the transformer toolkit now can be queried for its output dimension.
TransformerEmbeddingsnow takes, but ignores, a parameter for the attention mask. This is needed for compatibility with some other modules that get called the same way and use the mask.TransformerPoolercan now be instantiated from a pretrained transformer module, just like the other modules in the transformer toolkit.TransformerTextField, for cases where you don't care about AllenNLP's advanced text handling capabilities.- Added
TransformerModule._post_load_pretrained_state_dict_hook()method. Can be used to modifymissing_keysandunexpected_keysafter loading a pretrained state dictionary. This is useful when tying weights, for example. - Added an end-to-end test for the Transformer Toolkit.
- Added
vocabargument toBeamSearch, which is passed to each contraint inconstraints(if provided).
Fixed¶
- Fixed missing device mapping in the
allennlp.modules.conditional_random_field.pyfile. - Fixed Broken link in
allennlp.fairness.fairness_metrics.Separationdocs - Ensured all
allennlpsubmodules are imported withallennlp.common.plugins.import_plugins(). - Fixed
IndexOutOfBoundsExceptioninMultiOptimizerwhen checking if optimizer received any parameters. - Removed confusing zero mask from VilBERT.
- Ensured
ensure_model_can_train_save_and_loadis consistently random. - Fixed weight tying logic in
T5transformer module. Previously input/output embeddings were always tied. Now this is optional, and the default behavior is taken from theconfig.tie_word_embeddingsvalue when instantiatingfrom_pretrained_module(). - Implemented slightly faster label smoothing.
- Fixed the docs for
PytorchTransformerWrapper - Fixed recovering training jobs with models that expect
get_metrics()to not be called until they have seen at least one batch. - Made the Transformer Toolkit compatible with transformers that don't start their positional embeddings at 0.
- Weights & Biases training callback ("wandb") now works when resuming training jobs.
Changed¶
- Changed behavior of
MultiOptimizerso that while a default optimizer is still required, an error is not thrown if the default optimizer receives no parameters. - Made the epsilon parameter for the layer normalization in token embeddings configurable.
Removed¶
- Removed
TransformerModule._tied_weights. Weights should now just be tied directly in the__init__()method. You can also overrideTransformerModule._post_load_pretrained_state_dict_hook()to remove keys associated with tied weights frommissing_keysafter loading a pretrained state dictionary.
v2.5.0 - 2021-06-03¶
Added¶
- Added
TaskSuitebase class and command line functionality for runningchecklisttest suites, along with implementations forSentimentAnalysisSuite,QuestionAnsweringSuite, andTextualEntailmentSuite. These can be found in theallennlp.confidence_checks.task_checklistsmodule. - Added
BiasMitigatorApplicator, which wraps any Model and mitigates biases by finetuning on a downstream task. - Added
allennlp diffcommand to compute a diff on model checkpoints, analogous to whatgit diffdoes on two files. - Meta data defined by the class
allennlp.common.meta.Metais now saved in the serialization directory and archive file when training models from the command line. This is also now part of theArchivenamed tuple that's returned fromload_archive(). - Added
nn.util.distributed_device()helper function. - Added
allennlp.nn.util.load_state_dicthelper function. - Added a way to avoid downloading and loading pretrained weights in modules that wrap transformers
such as the
PretrainedTransformerEmbedderandPretrainedTransformerMismatchedEmbedder. You can do this by setting the parameterload_weightstoFalse. See PR #5172 for more details. - Added
SpanExtractorWithSpanWidthEmbedding, putting specific span embedding computations into the_embed_spansmethod and leaving the common code inSpanExtractorWithSpanWidthEmbeddingto unify the arguments, and modifiedBidirectionalEndpointSpanExtractor,EndpointSpanExtractorandSelfAttentiveSpanExtractoraccordingly. Now,SelfAttentiveSpanExtractorcan also embed span widths. - Added a
min_stepsparameter toBeamSearchto set a minimum length for the predicted sequences. - Added the
FinalSequenceScorerabstraction to calculate the final scores of the generated sequences inBeamSearch. - Added
shuffleargument toBucketBatchSamplerwhich allows for disabling shuffling. - Added
allennlp.modules.transformer.attention_modulewhich contains a generalizedAttentionModule.SelfAttentionandT5Attentionboth inherit from this. - Added a
Constraintabstract class toBeamSearch, which allows for incorporating constraints on the predictions found byBeamSearch, along with aRepeatedNGramBlockingConstraintconstraint implementation, which allows for preventing repeated n-grams in the output fromBeamSearch. - Added
DataCollatorfor dynamic operations for each batch.
Changed¶
- Use
dist_reduce_sumin distributed metrics. - Allow Google Cloud Storage paths in
cached_path("gs://..."). - Renamed
nn.util.load_state_dict()toread_state_dictto avoid confusion withtorch.nn.Module.load_state_dict(). TransformerModule.from_pretrained_modulenow only accepts a pretrained model ID (e.g. "bert-base-case") instead of an actualtorch.nn.Module. Other parameters to this method have changed as well.- Print the first batch to the console by default.
- Renamed
sanity_checkstoconfidence_checks(sanity_checksis deprecated and will be removed in AllenNLP 3.0). - Trainer callbacks can now store and restore state in case a training run gets interrupted.
- VilBERT backbone now rolls and unrolls extra dimensions to handle input with > 3 dimensions.
BeamSearchis now aRegistrableclass.
Fixed¶
- When
PretrainedTransformerIndexerfolds long sequences, it no longer loses the information from token type ids. - Fixed documentation for
GradientDescentTrainer.cuda_device. - Re-starting a training run from a checkpoint in the middle of an epoch now works correctly.
- When using the "moving average" weights smoothing feature of the trainer, training checkpoints would also get smoothed, with strange results for resuming a training job. This has been fixed.
- When re-starting an interrupted training job, the trainer will now read out the data loader even for epochs and batches that can be skipped. We do this to try to get any random number generators used by the reader or data loader into the same state as they were the first time the training job ran.
- Fixed the potential for a race condition with
cached_path()when extracting archives. Although the race condition is still possible if used withforce_extract=True. - Fixed
wandbcallback to work in distributed training. - Fixed
tqdmlogging into multiple files withallennlp-optuna.
v2.4.0 - 2021-04-22¶
Added¶
- Added a T5 implementation to
modules.transformers.
Changed¶
- Weights & Biases callback can now work in anonymous mode (i.e. without the
WANDB_API_KEYenvironment variable).
Fixed¶
- The
GradientDescentTrainerno longer leaves stray model checkpoints around when it runs out of patience. - Fixed
cached_path()for "hf://" files. - Improved the error message for the
PolynomialDecayLR scheduler whennum_steps_per_epochis missing.
v2.3.1 - 2021-04-20¶
Added¶
- Added support for the HuggingFace Hub as an alternative way to handle loading files. Hub downloads should be made through the
hf://URL scheme. - Add new dimension to the
interpretmodule: influence functions via theInfluenceInterpreterbase class, along with a concrete implementation:SimpleInfluence. - Added a
quietparameter to theMultiProcessDataLoadingthat disablesTqdmprogress bars. - The test for distributed metrics now takes a parameter specifying how often you want to run it.
- Created the fairness module and added three fairness metrics:
Independence,Separation, andSufficiency. - Added four bias metrics to the fairness module:
WordEmbeddingAssociationTest,EmbeddingCoherenceTest,NaturalLanguageInference, andAssociationWithoutGroundTruth. - Added four bias direction methods (
PCABiasDirection,PairedPCABiasDirection,TwoMeansBiasDirection,ClassificationNormalBiasDirection) and four bias mitigation methods (LinearBiasMitigator,HardBiasMitigator,INLPBiasMitigator,OSCaRBiasMitigator).
Changed¶
- Updated CONTRIBUTING.md to remind reader to upgrade pip setuptools to avoid spaCy installation issues.
Fixed¶
- Fixed a bug with the
ShardedDatasetReaderwhen used with multi-process data loading (https://github.com/allenai/allennlp/issues/5132).
v2.3.0 - 2021-04-14¶
Added¶
- Ported the following Huggingface
LambdaLR-based schedulers:ConstantLearningRateScheduler,ConstantWithWarmupLearningRateScheduler,CosineWithWarmupLearningRateScheduler,CosineHardRestartsWithWarmupLearningRateScheduler. - Added new
sub_token_modeparameter topretrained_transformer_mismatched_embedderclass to support first sub-token embedding - Added a way to run a multi task model with a dataset reader as part of
allennlp predict. - Added new
eval_modeinPretrainedTransformerEmbedder. If it is set toTrue, the transformer is always run in evaluation mode, which, e.g., disables dropout and does not update batch normalization statistics. - Added additional parameters to the W&B callback:
entity,group,name,notes, andwandb_kwargs.
Changed¶
- Sanity checks in the
GradientDescentTrainercan now be turned off by setting therun_sanity_checksparameter toFalse. - Allow the order of examples in the task cards to be specified explicitly
histogram_intervalparameter is now deprecated inTensorboardWriter, please usedistribution_intervalinstead.- Memory usage is not logged in tensorboard during training now.
ConsoleLoggerCallbackshould be used instead. - If you use the
min_countparameter of the Vocabulary, but you specify a namespace that does not exist, the vocabulary creation will raise aConfigurationError. - Documentation updates made to SoftmaxLoss regarding padding and the expected shapes of the input and output tensors of
forward. - Moved the data preparation script for coref into allennlp-models.
- If a transformer is not in cache but has override weights, the transformer's pretrained weights are no longer downloaded, that is, only its
config.jsonfile is downloaded. SanityChecksCallbacknow raisesSanityCheckErrorinstead ofAssertionErrorwhen a check fails.jsonpickleremoved from dependencies.- Improved the error message from
Registrable.by_name()when the name passed does not match any registered subclassess. The error message will include a suggestion if there is a close match between the name passed and a registered name.
Fixed¶
- Fixed a bug where some
Activationimplementations could not be pickled due to involving a lambda function. - Fixed
__str__()method onModelCardInfoclass. - Fixed a stall when using distributed training and gradient accumulation at the same time
- Fixed an issue where using the
from_pretrained_transformerVocabularyconstructor in distributed training via theallennlp traincommand would result in the data being iterated through unnecessarily. - Fixed a bug regarding token indexers with the
InterleavingDatasetReaderwhen used with multi-process data loading. - Fixed a warning from
transformerswhen usingmax_lengthin thePretrainedTransformerTokenizer.
Removed¶
- Removed the
strideparameter toPretrainedTransformerTokenizer. This parameter had no effect.
v2.2.0 - 2021-03-26¶
Added¶
- Add new method on
Fieldclass:.human_readable_repr() -> Any - Add new method on
Instanceclass:.human_readable_dict() -> JsonDict. - Added
WandBCallbackclass for Weights & Biases integration, registered as a callback under the name "wandb". - Added
TensorBoardCallbackto replace theTensorBoardWriter. Registered as a callback under the name "tensorboard". - Added
NormalizationBiasVerificationandSanityChecksCallbackfor model sanity checks. SanityChecksCallbackruns by default from theallennlp traincommand. It can be turned off by settingtrainer.enable_default_callbackstofalsein your config.
Changed¶
- Use attributes of
ModelOutputsobject inPretrainedTransformerEmbedderinstead of indexing. - Added support for PyTorch version 1.8 and
torchvisionversion 0.9 . Model.get_parameters_for_histogram_tensorboard_loggingis deprecated in favor ofModel.get_parameters_for_histogram_logging.
Fixed¶
- Makes sure tensors that are stored in
TensorCachealways live on CPUs - Fixed a bug where
FromParamsobjects wrapped inLazy()couldn't be pickled. - Fixed a bug where the
ROUGEmetric couldn't be picked. - Fixed a bug reported by https://github.com/allenai/allennlp/issues/5036. We keeps our spacy POS tagger on.
Removed¶
- Removed
TensorBoardWriter. Please use theTensorBoardCallbackinstead.
v2.1.0 - 2021-02-24¶
Changed¶
coding_schemeparameter is now deprecated inConll2003DatasetReader, please useconvert_to_coding_schemeinstead.- Support spaCy v3
Added¶
- Added
ModelUsagetoModelCardclass. - Added a way to specify extra parameters to the predictor in an
allennlp predictcall. - Added a way to initialize a
Vocabularyfrom transformers models. - Added the ability to use
Predictorswith multitask models through the newMultiTaskPredictor. - Added an example for fields of type
ListField[TextField]toapply_token_indexersAPI docs. - Added
text_keyandlabel_keyparameters toTextClassificationJsonReaderclass. - Added
MultiOptimizer, which allows you to use different optimizers for different parts of your model. - Added a clarification to
predictions_to_labeled_instancesAPI docs for attack from json
Fixed¶
@Registrable.register(...)decorator no longer masks the decorated class's annotations- Ensured that
MeanAbsoluteErroralways returns afloatmetric value instead of aTensor. - Learning rate schedulers that rely on metrics from the validation set were broken in v2.0.0. This brings that functionality back.
- Fixed a bug where the
MultiProcessDataLoadingwould crash whennum_workers > 0,start_method = "spawn",max_instances_in_memory not None, andbatches_per_epoch not None. - Fixed documentation and validation checks for
FBetaMultiLabelMetric. - Fixed handling of HTTP errors when fetching remote resources with
cached_path(). Previously the content would be cached even when certain errors - like 404s - occurred. Now anHTTPErrorwill be raised whenever the HTTP response is not OK. - Fixed a bug where the
MultiTaskDataLoaderwould crash whennum_workers > 0 - Fixed an import error that happens when PyTorch's distributed framework is unavailable on the system.
v2.0.1 - 2021-01-29¶
Added¶
- Added
tokenizer_kwargsandtransformer_kwargsarguments toPretrainedTransformerBackbone - Resize transformers word embeddings layer for
additional_special_tokens
Changed¶
- GradientDescentTrainer makes
serialization_dirwhen it's instantiated, if it doesn't exist.
Fixed¶
common.util.sanitizenow handles sets.
v2.0.0 - 2021-01-27¶
Added¶
- The
TrainerCallbackconstructor acceptsserialization_dirprovided byTrainer. This can be useful forLoggercallbacks those need to store files in the run directory. - The
TrainerCallback.on_start()is fired at the start of the training. - The
TrainerCallbackevent methods now accept**kwargs. This may be useful to maintain backwards-compability of callbacks easier in the future. E.g. we may decide to pass the exception/traceback object in case of failure toon_end()and this older callbacks may simply ignore the argument instead of raising aTypeError. - Added a
TensorBoardCallbackwhich wraps theTensorBoardWriter.
Changed¶
- The
TrainerCallack.on_epoch()does not fire withepoch=-1at the start of the training. Instead,TrainerCallback.on_start()should be used for these cases. TensorBoardBatchMemoryUsageis converted fromBatchCallbackintoTrainerCallback.TrackEpochCallbackis converted fromEpochCallbackintoTrainerCallback.Trainercan accept callbacks simply with namecallbacksinstead oftrainer_callbacks.TensorboardWriterrenamed toTensorBoardWriter, and removed as an argument to theGradientDescentTrainer. In order to enable TensorBoard logging during training, you should utilize theTensorBoardCallbackinstead.
Removed¶
- Removed
EpochCallback,BatchCallbackin favour ofTrainerCallback. The metaclass-wrapping implementation is removed as well. - Removed the
tensorboard_writerparameter toGradientDescentTrainer. You should use theTensorBoardCallbacknow instead.
Fixed¶
- Now Trainer always fires
TrainerCallback.on_end()so all the resources can be cleaned up properly. - Fixed the misspelling, changed
TensoboardBatchMemoryUsagetoTensorBoardBatchMemoryUsage. - We set a value to
epochso in case of firingTrainerCallback.on_end()the variable is bound. This could have lead to an error in case of trying to recover a run after it was finished training.
v2.0.0rc1 - 2021-01-21¶
Added¶
- Added
TensorCacheclass for caching tensors on disk - Added abstraction and concrete implementation for image loading
- Added abstraction and concrete implementation for
GridEmbedder - Added abstraction and demo implementation for an image augmentation module.
- Added abstraction and concrete implementation for region detectors.
- A new high-performance default
DataLoader:MultiProcessDataLoading. - A
MultiTaskModeland abstractions to use with it, includingBackboneandHead. TheMultiTaskModelfirst runs its inputs through theBackbone, then passes the result (and whatever other relevant inputs it got) to eachHeadthat's in use. - A
MultiTaskDataLoader, with a correspondingMultiTaskDatasetReader, and a couple of new configuration objects:MultiTaskEpochSampler(for deciding what proportion to sample from each dataset at every epoch) and aMultiTaskScheduler(for ordering the instances within an epoch). - Transformer toolkit to plug and play with modular components of transformer architectures.
- Added a command to count the number of instances we're going to be training with
- Added a
FileLockclass tocommon.file_utils. This is just like theFileLockfrom thefilelocklibrary, except that it adds an optional flagread_only_ok: bool, which when set toTruechanges the behavior so that a warning will be emitted instead of an exception when lacking write permissions on an existing file lock. This makes it possible to use theFileLockclass on a read-only file system. - Added a new learning rate scheduler:
CombinedLearningRateScheduler. This can be used to combine different LR schedulers, using one after the other. - Added an official CUDA 10.1 Docker image.
- Moving
ModelCardandTaskCardabstractions into the main repository. - Added a util function
allennlp.nn.util.dist_reduce(...)for handling distributed reductions. This is especially useful when implementing a distributedMetric. - Added a
FileLockclass tocommon.file_utils. This is just like theFileLockfrom thefilelocklibrary, except that it adds an optional flagread_only_ok: bool, which when set toTruechanges the behavior so that a warning will be emitted instead of an exception when lacking write permissions on an existing file lock. This makes it possible to use theFileLockclass on a read-only file system. - Added a new learning rate scheduler:
CombinedLearningRateScheduler. This can be used to combine different LR schedulers, using one after the other. - Moving
ModelCardandTaskCardabstractions into the main repository.
Changed¶
DatasetReaders are now always lazy. This means there is nolazyparameter in the base class, and the_read()method should always be a generator.- The
DataLoadernow decides whether to load instances lazily or not. With thePyTorchDataLoaderthis is controlled with thelazyparameter, but with theMultiProcessDataLoadingthis is controlled by themax_instances_in_memorysetting. ArrayFieldis now calledTensorField, and implemented in terms of torch tensors, not numpy.- Improved
nn.util.move_to_devicefunction by avoiding an unnecessary recursive check for tensors and adding anon_blockingoptional argument, which is the same argument as intorch.Tensor.to(). - If you are trying to create a heterogeneous batch, you now get a better error message.
- Readers using the new vision features now explicitly log how they are featurizing images.
master_addrandmaster_portrenamed toprimary_addrandprimary_port, respectively.is_masterparameter for training callbacks renamed tois_primary.masterbranch renamed tomain- Torch version bumped to 1.7.1 in Docker images.
- 'master' branch renamed to 'main'
- Torch version bumped to 1.7.1 in Docker images.
Removed¶
- Removed
nn.util.has_tensor.
Fixed¶
- The
build-vocabcommand no longer crashes when the resulting vocab file is in the current working directory. - VQA models now use the
vqa_scoremetric for early stopping. This results in much better scores. - Fixed typo with
LabelFieldstring representation: removed trailing apostrophe. Vocabulary.from_filesandcached_pathwill issue a warning, instead of failing, when a lock on an existing resource can't be acquired because the file system is read-only.TrackEpochCallbackis now aEpochCallback.
v1.3.0 - 2020-12-15¶
Added¶
- Added links to source code in docs.
- Added
get_embedding_layerandget_text_field_embedderto thePredictorclass; to specify embedding layers for non-AllenNLP models. - Added Gaussian Error Linear Unit (GELU) as an Activation.
Changed¶
- Renamed module
allennlp.data.tokenizers.tokentoallennlp.data.tokenizers.token_classto avoid this bug. transformersdependency updated to version 4.0.1.BasicClassifier's forward method now takes a metadata field.
Fixed¶
- Fixed a lot of instances where tensors were first created and then sent to a device
with
.to(device). Instead, these tensors are now created directly on the target device. - Fixed issue with
GradientDescentTrainerwhen constructed withvalidation_data_loader=Noneandlearning_rate_scheduler!=None. - Fixed a bug when removing all handlers in root logger.
ShardedDatasetReadernow inherits parameters frombase_readerwhen required.- Fixed an issue in
FromParamswhere parameters in theparamsobject used to a construct a class were not passed to the constructor if the value of the parameter was equal to the default value. This caused bugs in some edge cases where a subclass that takes**kwargsneeds to inspectkwargsbefore passing them to its superclass. - Improved the band-aid solution for segmentation faults and the "ImportError: dlopen: cannot load any more object with static TLS"
by adding a
transformersimport. - Added safety checks for extracting tar files
- Turned superfluous warning to info when extending the vocab in the embedding matrix, if no pretrained file was provided
v1.2.2 - 2020-11-17¶
Added¶
- Added Docker builds for other torch-supported versions of CUDA.
- Adds
allennlp-semparseas an official, default plugin.
Fixed¶
GumbelSamplernow sorts the beams by their true log prob.
v1.2.1 - 2020-11-10¶
Added¶
- Added an optional
seedparameter toModelTestCase.set_up_modelwhich sets the random seed forrandom,numpy, andtorch. - Added support for a global plugins file at
~/.allennlp/plugins. - Added more documentation about plugins.
- Added sampler class and parameter in beam search for non-deterministic search, with several
implementations, including
MultinomialSampler,TopKSampler,TopPSampler, andGumbelSampler. UtilizingGumbelSamplerwill give Stochastic Beam Search.
Changed¶
- Pass batch metrics to
BatchCallback.
Fixed¶
- Fixed a bug where forward hooks were not cleaned up with saliency interpreters if there was an exception.
- Fixed the computation of saliency maps in the Interpret code when using mismatched indexing. Previously, we would compute gradients from the top of the transformer, after aggregation from wordpieces to tokens, which gives results that are not very informative. Now, we compute gradients with respect to the embedding layer, and aggregate wordpieces to tokens separately.
- Fixed the heuristics for finding embedding layers in the case of RoBERTa. An update in the
transformerslibrary broke our old heuristic. - Fixed typo with registered name of ROUGE metric. Previously was
rogue, fixed torouge. - Fixed default masks that were erroneously created on the CPU even when a GPU is available.
- Fixed pretrained embeddings for transformers that don't use end tokens.
- Fixed the transformer tokenizer cache when the tokenizers are initialized with custom kwargs.
v1.2.0 - 2020-10-29¶
Changed¶
- Enforced stricter typing requirements around the use of
Optional[T]types. - Changed the behavior of
Lazytypes infrom_paramsmethods. Previously, if you defined aLazyparameter likefoo: Lazy[Foo] = Nonein a customfrom_paramsclassmethod, thenfoowould actually never beNone. This behavior is now different. If no params were given forfoo, it will beNone. You can also now set default values for foo likefoo: Lazy[Foo] = Lazy(Foo). Or, if you want you want a default value but also want to allow forNonevalues, you can write it like this:foo: Optional[Lazy[Foo]] = Lazy(Foo). - Added support for PyTorch version 1.7.
Fixed¶
- Made it possible to instantiate
TrainerCallbackfrom config files. - Fixed the remaining broken internal links in the API docs.
- Fixed a bug where Hotflip would crash with a model that had multiple TokenIndexers and the input used rare vocabulary items.
- Fixed a bug where
BeamSearchwould fail ifmax_stepswas equal to 1. - Fixed
BasicTextFieldEmbedderto not raise ConfigurationError if it has embedders that are empty and not in input
v1.2.0rc1 - 2020-10-22¶
Added¶
- Added a warning when
batches_per_epochfor the validation data loader is inherited from the train data loader. - Added a
build-vocabsubcommand that can be used to build a vocabulary from a training config file. - Added
tokenizer_kwargsargument toPretrainedTransformerMismatchedIndexer. - Added
tokenizer_kwargsandtransformer_kwargsarguments toPretrainedTransformerMismatchedEmbedder. - Added official support for Python 3.8.
- Added a script:
scripts/release_notes.py, which automatically prepares markdown release notes from the CHANGELOG and commit history. - Added a flag
--predictions-output-fileto theevaluatecommand, which tells AllenNLP to write the predictions from the given dataset to the file as JSON lines. - Added the ability to ignore certain missing keys when loading a model from an archive. This is done
by adding a class-level variable called
authorized_missing_keysto any PyTorch module that aModeluses. If defined,authorized_missing_keysshould be a list of regex string patterns. - Added
FBetaMultiLabelMeasure, a multi-label Fbeta metric. This is a subclass of the existingFBetaMeasure. - Added ability to pass additional key word arguments to
cached_transformers.get(), which will be passed on toAutoModel.from_pretrained(). - Added an
overridesargument toPredictor.from_path(). - Added a
cached-pathcommand. - Added a function
inspect_cachetocommon.file_utilsthat prints useful information about the cache. This can also be used from thecached-pathcommand withallennlp cached-path --inspect. - Added a function
remove_cache_entriestocommon.file_utilsthat removes any cache entries matching the given glob patterns. This can used from thecached-pathcommand withallennlp cached-path --remove some-files-*. - Added logging for the main process when running in distributed mode.
- Added a
TrainerCallbackobject to support state sharing between batch and epoch-level training callbacks. - Added support for .tar.gz in PretrainedModelInitializer.
- Made
BeamSearchinstantiablefrom_params. - Pass
serialization_dirtoModelandDatasetReader. - Added an optional
include_in_archiveparameter to the top-level of configuration files. When specified,include_in_archiveshould be a list of paths relative to the serialization directory which will be bundled up with the final archived model from a training run.
Changed¶
- Subcommands that don't require plugins will no longer cause plugins to be loaded or have an
--include-packageflag. - Allow overrides to be JSON string or
dict. transformersdependency updated to version 3.1.0.- When
cached_pathis called on a local archive withextract_archive=True, the archive is now extracted into a unique subdirectory of the cache root instead of a subdirectory of the archive's directory. The extraction directory is also unique to the modification time of the archive, so if the file changes, subsequent calls tocached_pathwill know to re-extract the archive. - Removed the
truncation_strategyparameter toPretrainedTransformerTokenizer. The way we're calling the tokenizer, the truncation strategy takes no effect anyways. - Don't use initializers when loading a model, as it is not needed.
- Distributed training will now automatically search for a local open port if the
master_portparameter is not provided. - In training, save model weights before evaluation.
allennlp.common.util.peak_memory_mbrenamed topeak_cpu_memory, andallennlp.common.util.gpu_memory_mbrenamed topeak_gpu_memory, and they both now return the results in bytes as integers. Also, thepeak_gpu_memoryfunction now utilizes PyTorch functions to find the memory usage instead of shelling out to thenvidia-smicommand. This is more efficient and also more accurate because it only takes into account the tensor allocations of the current PyTorch process.- Make sure weights are first loaded to the cpu when using PretrainedModelInitializer, preventing wasted GPU memory.
- Load dataset readers in
load_archive. - Updated
AllenNlpTestCasedocstring to remove reference tounittest.TestCase
Removed¶
- Removed
common.util.is_masterfunction.
Fixed¶
- Fix CUDA/CPU device mismatch bug during distributed training for categorical accuracy metric.
- Fixed a bug where the reported
batch_lossmetric was incorrect when training with gradient accumulation. - Class decorators now displayed in API docs.
- Fixed up the documentation for the
allennlp.nn.beam_searchmodule. - Ignore
*argswhen constructing classes withFromParams. - Ensured some consistency in the types of the values that metrics return.
- Fix a PyTorch warning by explicitly providing the
as_tupleargument (leaving it as its default value ofFalse) toTensor.nonzero(). - Remove temporary directory when extracting model archive in
load_archiveat end of function rather than viaatexit. - Fixed a bug where using
cached_path()offline could return a cached resource's lock file instead of the cache file. - Fixed a bug where
cached_path()would fail if passed acache_dirwith the user home shortcut~/. - Fixed a bug in our doc building script where markdown links did not render properly
if the "href" part of the link (the part inside the
()) was on a new line. - Changed how gradients are zeroed out with an optimization. See this video from NVIDIA at around the 9 minute mark.
- Fixed a bug where parameters to a
FromParamsclass that are dictionaries wouldn't get logged when an instance is instantiatedfrom_params. - Fixed a bug in distributed training where the vocab would be saved from every worker, when it should have been saved by only the local master process.
- Fixed a bug in the calculation of rouge metrics during distributed training where the total sequence count was not being aggregated across GPUs.
- Fixed
allennlp.nn.util.add_sentence_boundary_token_ids()to usedeviceparameter of input tensor. - Be sure to close the TensorBoard writer even when training doesn't finish.
- Fixed the docstring for
PyTorchSeq2VecWrapper. - Fixed a bug in the cnn_encoder where activations involving masked tokens could be picked up by the max
- Fix intra word tokenization for
PretrainedTransformerTokenizerwhen disabling fast tokenizer.
v1.1.0 - 2020-09-08¶
Fixed¶
- Fixed handling of some edge cases when constructing classes with
FromParamswhere the class accepts**kwargs. - Fixed division by zero error when there are zero-length spans in the input to a
PretrainedTransformerMismatchedIndexer. - Improved robustness of
cached_pathwhen extracting archives so that the cache won't be corrupted if a failure occurs during extraction. - Fixed a bug with the
averageandevalb_bracketing_scoremetrics in distributed training.
Added¶
Predictor.capture_model_internals()now accepts a regex specifying which modules to capture.
v1.1.0rc4 - 2020-08-20¶
Added¶
- Added a workflow to GitHub Actions that will automatically close unassigned stale issues and ping the assignees of assigned stale issues.
Fixed¶
- Fixed a bug in distributed metrics that caused nan values due to repeated addition of an accumulated variable.
v1.1.0rc3 - 2020-08-12¶
Fixed¶
- Fixed how truncation was handled with
PretrainedTransformerTokenizer. Previously, ifmax_lengthwas set toNone, the tokenizer would still do truncation if the transformer model had a default max length in its config. Also, whenmax_lengthwas set to a non-Nonevalue, several warnings would appear for certain transformer models around the use of thetruncationparameter. - Fixed evaluation of all metrics when using distributed training.
- Added a
py.typedmarker. Fixed type annotations inallennlp.training.util. - Fixed problem with automatically detecting whether tokenization is necessary. This affected primarily the Roberta SST model.
- Improved help text for using the --overrides command line flag.
v1.1.0rc2 - 2020-07-31¶
Changed¶
- Upgraded PyTorch requirement to 1.6.
- Replaced the NVIDIA Apex AMP module with torch's native AMP module. The default trainer (
GradientDescentTrainer) now takes ause_amp: boolparameter instead of the oldopt_level: strparameter.
Fixed¶
- Removed unnecessary warning about deadlocks in
DataLoader. - Fixed testing models that only return a loss when they are in training mode.
- Fixed a bug in
FromParamsthat caused silent failure in case of the parameter type beingOptional[Union[...]]. - Fixed a bug where the program crashes if
evaluation_data_loaderis aAllennlpLazyDataset.
Added¶
- Added the option to specify
requires_grad: falsewithin an optimizer's parameter groups. - Added the
file-friendly-loggingflag back to thetraincommand. Also added this flag to thepredict,evaluate, andfind-learning-ratecommands. - Added an
EpochCallbackto track current epoch as a model class member. - Added the option to enable or disable gradient checkpointing for transformer token embedders via boolean parameter
gradient_checkpointing.
Removed¶
- Removed the
opt_levelparameter toModel.loadandload_archive. In order to use AMP with a loaded model now, just run the model's forward pass within torch'sautocastcontext.
v1.1.0rc1 - 2020-07-14¶
Fixed¶
- Reduced the amount of log messages produced by
allennlp.common.file_utils. - Fixed a bug where
PretrainedTransformerEmbedderparameters appeared to be trainable in the log output even whentrain_parameterswas set toFalse. - Fixed a bug with the sharded dataset reader where it would only read a fraction of the instances in distributed training.
- Fixed checking equality of
TensorFields. - Fixed a bug where
NamespaceSwappingFielddid not work correctly with.empty_field(). - Put more sensible defaults on the
huggingface_adamwoptimizer. - Simplified logging so that all logging output always goes to one file.
- Fixed interaction with the python command line debugger.
- Log the grad norm properly even when we're not clipping it.
- Fixed a bug where
PretrainedModelInitializerfails to initialize a model with a 0-dim tensor - Fixed a bug with the layer unfreezing schedule of the
SlantedTriangularlearning rate scheduler. - Fixed a regression with logging in the distributed setting. Only the main worker should write log output to the terminal.
- Pinned the version of boto3 for package managers (e.g. poetry).
- Fixed issue #4330 by updating the
tokenizersdependency. - Fixed a bug in
TextClassificationPredictorso that it passes tokenized inputs to theDatasetReaderin case it does not have a tokenizer. reg_lossis only now returned for models that have some regularization penalty configured.- Fixed a bug that prevented
cached_pathfrom downloading assets from GitHub releases. - Fixed a bug that erroneously increased last label's false positive count in calculating fbeta metrics.
Tqdmoutput now looks much better when the output is being piped or redirected.- Small improvements to how the API documentation is rendered.
- Only show validation progress bar from main process in distributed training.
Added¶
- Adjust beam search to support multi-layer decoder.
- A method to ModelTestCase for running basic model tests when you aren't using config files.
- Added some convenience methods for reading files.
- Added an option to
file_utils.cached_pathto automatically extract archives. - Added the ability to pass an archive file instead of a local directory to
Vocab.from_files. - Added the ability to pass an archive file instead of a glob to
ShardedDatasetReader. - Added a new
"linear_with_warmup"learning rate scheduler. - Added a check in
ShardedDatasetReaderthat ensures the base reader doesn't implement manual distributed sharding itself. - Added an option to
PretrainedTransformerEmbedderandPretrainedTransformerMismatchedEmbedderto use a scalar mix of all hidden layers from the transformer model instead of just the last layer. To utilize this, just setlast_layer_onlytoFalse. cached_path()can now read files inside of archives.- Training metrics now include
batch_lossandbatch_reg_lossin addition to aggregate loss across number of batches.
Changed¶
- Not specifying a
cuda_devicenow automatically determines whether to use a GPU or not. - Discovered plugins are logged so you can see what was loaded.
allennlp.data.DataLoaderis now an abstract registrable class. The default implementation remains the same, but was renamed toallennlp.data.PyTorchDataLoader.BertPoolercan now unwrap and re-wrap extra dimensions if necessary.- New
transformersdependency. Only version >=3.0 now supported.
v1.0.0 - 2020-06-16¶
Fixed¶
- Lazy dataset readers now work correctly with multi-process data loading.
- Fixed race conditions that could occur when using a dataset cache.
Added¶
- A bug where where all datasets would be loaded for vocab creation even if not needed.
- A parameter to the
DatasetReaderclass:manual_multi_process_sharding. This is similar to themanual_distributed_shardingparameter, but applies when using a multi-processDataLoader.
v1.0.0rc6 - 2020-06-11¶
Fixed¶
- A bug where
TextFields could not be duplicated since some tokenizers cannot be deep-copied. See https://github.com/allenai/allennlp/issues/4270. - Our caching mechanism had the potential to introduce race conditions if multiple processes were attempting to cache the same file at once. This was fixed by using a lock file tied to each cached file.
get_text_field_mask()now supports padding indices that are not0.- A bug where
predictor.get_gradients()would return an empty dictionary if an embedding layer had trainable set to false - Fixes
PretrainedTransformerMismatchedIndexerin the case where a token consists of zero word pieces. - Fixes a bug when using a lazy dataset reader that results in a
UserWarningfrom PyTorch being printed at every iteration during training. - Predictor names were inconsistently switching between dashes and underscores. Now they all use underscores.
Predictor.from_pathnow automatically loads plugins (unless you specifyload_plugins=False) so that you don't have to manually import a bunch of modules when instantiating predictors from an archive path.allennlp-serverautomatically found as a plugin once again.
Added¶
- A
duplicate()method onInstances andFields, to be used instead ofcopy.deepcopy() - A batch sampler that makes sure each batch contains approximately the same number of tokens (
MaxTokensBatchSampler) - Functions to turn a sequence of token indices back into tokens
- The ability to use Huggingface encoder/decoder models as token embedders
- Improvements to beam search
- ROUGE metric
- Polynomial decay learning rate scheduler
- A
BatchCallbackfor logging CPU and GPU memory usage to tensorboard. This is mainly for debugging because using it can cause a significant slowdown in training. - Ability to run pretrained transformers as an embedder without training the weights
- Add Optuna Integrated badge to README.md
Changed¶
- Similar to our caching mechanism, we introduced a lock file to the vocab to avoid race conditions when saving/loading the vocab from/to the same serialization directory in different processes.
- Changed the
Token,Instance, andBatchclasses along with allFieldclasses to "slots" classes. This dramatically reduces the size in memory of instances. - SimpleTagger will no longer calculate span-based F1 metric when
calculate_span_f1isFalse. - CPU memory for every worker is now reported in the logs and the metrics. Previously this was only reporting the CPU memory of the master process, and so it was only correct in the non-distributed setting.
- To be consistent with PyTorch
IterableDataset,AllennlpLazyDatasetno longer implements__len__(). Previously it would always return 1. - Removed old tutorials, in favor of the new AllenNLP Guide
- Changed the vocabulary loading to consider new lines for Windows/Linux and Mac.
v1.0.0rc5 - 2020-05-26¶
Fixed¶
- Fix bug where
PretrainedTransformerTokenizercrashed with some transformers (#4267) - Make
cached_pathwork offline. - Tons of docstring inconsistencies resolved.
- Nightly builds no longer run on forks.
- Distributed training now automatically figures out which worker should see which instances
- A race condition bug in distributed training caused from saving the vocab to file from the master process while other processing might be reading those files.
- Unused dependencies in
setup.pyremoved.
Added¶
- Additional CI checks to ensure docstrings are consistently formatted.
- Ability to train on CPU with multiple processes by setting
cuda_devicesto a list of negative integers in your training config. For example:"distributed": {"cuda_devices": [-1, -1]}. This is mainly to make it easier to test and debug distributed training code.. - Documentation for when parameters don't need config file entries.
Changed¶
- The
allennlp test-installcommand now just ensures the core submodules can be imported successfully, and prints out some other useful information such as the version, PyTorch version, and the number of GPU devices available. - All of the tests moved from
allennlp/teststotestsat the root level, andallennlp/tests/fixturesmoved totest_fixturesat the root level. The PyPI source and wheel distributions will no longer include tests and fixtures.
v1.0.0rc4 - 2020-05-14¶
We first introduced this CHANGELOG after release v1.0.0rc4, so please refer to the GitHub release
notes for this and earlier releases.