Changelog#
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
Unreleased#
v1.3.0 - 2020-12-15#
Added#
- Added links to source code in docs.
 - Added 
get_embedding_layerandget_text_field_embedderto thePredictorclass; to specify embedding layers for non-AllenNLP models. - Added Gaussian Error Linear Unit (GELU) as an Activation.
 
Changed#
- Renamed module 
allennlp.data.tokenizers.tokentoallennlp.data.tokenizers.token_classto avoid this bug. transformersdependency updated to version 4.0.1.
Fixed#
- Fixed a lot of instances where tensors were first created and then sent to a device
  with 
.to(device). Instead, these tensors are now created directly on the target device. - Fixed issue with 
GradientDescentTrainerwhen constructed withvalidation_data_loader=Noneandlearning_rate_scheduler!=None. - Fixed a bug when removing all handlers in root logger.
 ShardedDatasetReadernow inherits parameters frombase_readerwhen required.- Fixed an issue in 
FromParamswhere parameters in theparamsobject used to a construct a class were not passed to the constructor if the value of the parameter was equal to the default value. This caused bugs in some edge cases where a subclass that takes**kwargsneeds to inspectkwargsbefore passing them to its superclass. - Improved the band-aid solution for segmentation faults and the "ImportError: dlopen: cannot load any more object with static TLS" 
  by adding a 
transformersimport. - Added safety checks for extracting tar files
 
v1.2.2 - 2020-11-17#
Added#
- Added Docker builds for other torch-supported versions of CUDA.
 - Adds 
allennlp-semparseas an official, default plugin. 
Fixed#
GumbelSamplernow sorts the beams by their true log prob.
v1.2.1 - 2020-11-10#
Added#
- Added an optional 
seedparameter toModelTestCase.set_up_modelwhich sets the random seed forrandom,numpy, andtorch. - Added support for a global plugins file at 
~/.allennlp/plugins. - Added more documentation about plugins.
 - Added sampler class and parameter in beam search for non-deterministic search, with several
  implementations, including 
MultinomialSampler,TopKSampler,TopPSampler, andGumbelSampler. UtilizingGumbelSamplerwill give Stochastic Beam Search. 
Changed#
- Pass batch metrics to 
BatchCallback. 
Fixed#
- Fixed a bug where forward hooks were not cleaned up with saliency interpreters if there was an exception.
 - Fixed the computation of saliency maps in the Interpret code when using mismatched indexing. Previously, we would compute gradients from the top of the transformer, after aggregation from wordpieces to tokens, which gives results that are not very informative. Now, we compute gradients with respect to the embedding layer, and aggregate wordpieces to tokens separately.
 - Fixed the heuristics for finding embedding layers in the case of RoBERTa. An update in the
  
transformerslibrary broke our old heuristic. - Fixed typo with registered name of ROUGE metric. Previously was 
rogue, fixed torouge. - Fixed default masks that were erroneously created on the CPU even when a GPU is available.
 - Fixed pretrained embeddings for transformers that don't use end tokens.
 - Fixed the transformer tokenizer cache when the tokenizers are initialized with custom kwargs.
 
v1.2.0 - 2020-10-29#
Changed#
- Enforced stricter typing requirements around the use of 
Optional[T]types. - Changed the behavior of 
Lazytypes infrom_paramsmethods. Previously, if you defined aLazyparameter likefoo: Lazy[Foo] = Nonein a customfrom_paramsclassmethod, thenfoowould actually never beNone. This behavior is now different. If no params were given forfoo, it will beNone. You can also now set default values for foo likefoo: Lazy[Foo] = Lazy(Foo). Or, if you want you want a default value but also want to allow forNonevalues, you can write it like this:foo: Optional[Lazy[Foo]] = Lazy(Foo). - Added support for PyTorch version 1.7.
 
Fixed#
- Made it possible to instantiate 
TrainerCallbackfrom config files. - Fixed the remaining broken internal links in the API docs.
 - Fixed a bug where Hotflip would crash with a model that had multiple TokenIndexers and the input used rare vocabulary items.
 - Fixed a bug where 
BeamSearchwould fail ifmax_stepswas equal to 1. - Fixed 
BasicTextFieldEmbedderto not raise ConfigurationError if it has embedders that are empty and not in input 
v1.2.0rc1 - 2020-10-22#
Added#
- Added a warning when 
batches_per_epochfor the validation data loader is inherited from the train data loader. - Added a 
build-vocabsubcommand that can be used to build a vocabulary from a training config file. - Added 
tokenizer_kwargsargument toPretrainedTransformerMismatchedIndexer. - Added 
tokenizer_kwargsandtransformer_kwargsarguments toPretrainedTransformerMismatchedEmbedder. - Added official support for Python 3.8.
 - Added a script: 
scripts/release_notes.py, which automatically prepares markdown release notes from the CHANGELOG and commit history. - Added a flag 
--predictions-output-fileto theevaluatecommand, which tells AllenNLP to write the predictions from the given dataset to the file as JSON lines. - Added the ability to ignore certain missing keys when loading a model from an archive. This is done
  by adding a class-level variable called 
authorized_missing_keysto any PyTorch module that aModeluses. If defined,authorized_missing_keysshould be a list of regex string patterns. - Added 
FBetaMultiLabelMeasure, a multi-label Fbeta metric. This is a subclass of the existingFBetaMeasure. - Added ability to pass additional key word arguments to 
cached_transformers.get(), which will be passed on toAutoModel.from_pretrained(). - Added an 
overridesargument toPredictor.from_path(). - Added a 
cached-pathcommand. - Added a function 
inspect_cachetocommon.file_utilsthat prints useful information about the cache. This can also be used from thecached-pathcommand withallennlp cached-path --inspect. - Added a function 
remove_cache_entriestocommon.file_utilsthat removes any cache entries matching the given glob patterns. This can used from thecached-pathcommand withallennlp cached-path --remove some-files-*. - Added logging for the main process when running in distributed mode.
 - Added a 
TrainerCallbackobject to support state sharing between batch and epoch-level training callbacks. - Added support for .tar.gz in PretrainedModelInitializer.
 - Made 
BeamSearchinstantiablefrom_params. - Pass 
serialization_dirtoModelandDatasetReader. - Added an optional 
include_in_archiveparameter to the top-level of configuration files. When specified,include_in_archiveshould be a list of paths relative to the serialization directory which will be bundled up with the final archived model from a training run. 
Changed#
- Subcommands that don't require plugins will no longer cause plugins to be loaded or have an 
--include-packageflag. - Allow overrides to be JSON string or 
dict. transformersdependency updated to version 3.1.0.- When 
cached_pathis called on a local archive withextract_archive=True, the archive is now extracted into a unique subdirectory of the cache root instead of a subdirectory of the archive's directory. The extraction directory is also unique to the modification time of the archive, so if the file changes, subsequent calls tocached_pathwill know to re-extract the archive. - Removed the 
truncation_strategyparameter toPretrainedTransformerTokenizer. The way we're calling the tokenizer, the truncation strategy takes no effect anyways. - Don't use initializers when loading a model, as it is not needed.
 - Distributed training will now automatically search for a local open port if the 
master_portparameter is not provided. - In training, save model weights before evaluation.
 allennlp.common.util.peak_memory_mbrenamed topeak_cpu_memory, andallennlp.common.util.gpu_memory_mbrenamed topeak_gpu_memory, and they both now return the results in bytes as integers. Also, thepeak_gpu_memoryfunction now utilizes PyTorch functions to find the memory usage instead of shelling out to thenvidia-smicommand. This is more efficient and also more accurate because it only takes into account the tensor allocations of the current PyTorch process.- Make sure weights are first loaded to the cpu when using PretrainedModelInitializer, preventing wasted GPU memory.
 - Load dataset readers in 
load_archive. - Updated 
AllenNlpTestCasedocstring to remove reference tounittest.TestCase 
Removed#
- Removed 
common.util.is_masterfunction. 
Fixed#
- Fix CUDA/CPU device mismatch bug during distributed training for categorical accuracy metric.
 - Fixed a bug where the reported 
batch_lossmetric was incorrect when training with gradient accumulation. - Class decorators now displayed in API docs.
 - Fixed up the documentation for the 
allennlp.nn.beam_searchmodule. - Ignore 
*argswhen constructing classes withFromParams. - Ensured some consistency in the types of the values that metrics return.
 - Fix a PyTorch warning by explicitly providing the 
as_tupleargument (leaving it as its default value ofFalse) toTensor.nonzero(). - Remove temporary directory when extracting model archive in 
load_archiveat end of function rather than viaatexit. - Fixed a bug where using 
cached_path()offline could return a cached resource's lock file instead of the cache file. - Fixed a bug where 
cached_path()would fail if passed acache_dirwith the user home shortcut~/. - Fixed a bug in our doc building script where markdown links did not render properly
  if the "href" part of the link (the part inside the 
()) was on a new line. - Changed how gradients are zeroed out with an optimization. See this video from NVIDIA at around the 9 minute mark.
 - Fixed a bug where parameters to a 
FromParamsclass that are dictionaries wouldn't get logged when an instance is instantiatedfrom_params. - Fixed a bug in distributed training where the vocab would be saved from every worker, when it should have been saved by only the local master process.
 - Fixed a bug in the calculation of rouge metrics during distributed training where the total sequence count was not being aggregated across GPUs.
 - Fixed 
allennlp.nn.util.add_sentence_boundary_token_ids()to usedeviceparameter of input tensor. - Be sure to close the TensorBoard writer even when training doesn't finish.
 - Fixed the docstring for 
PyTorchSeq2VecWrapper. - Fixed a bug in the cnn_encoder where activations involving masked tokens could be picked up by the max
 - Fix intra word tokenization for 
PretrainedTransformerTokenizerwhen disabling fast tokenizer. 
v1.1.0 - 2020-09-08#
Fixed#
- Fixed handling of some edge cases when constructing classes with 
FromParamswhere the class accepts**kwargs. - Fixed division by zero error when there are zero-length spans in the input to a
  
PretrainedTransformerMismatchedIndexer. - Improved robustness of 
cached_pathwhen extracting archives so that the cache won't be corrupted if a failure occurs during extraction. - Fixed a bug with the 
averageandevalb_bracketing_scoremetrics in distributed training. 
Added#
Predictor.capture_model_internals()now accepts a regex specifying which modules to capture
v1.1.0rc4 - 2020-08-20#
Added#
- Added a workflow to GitHub Actions that will automatically close unassigned stale issues and ping the assignees of assigned stale issues.
 
Fixed#
- Fixed a bug in distributed metrics that caused nan values due to repeated addition of an accumulated variable.
 
v1.1.0rc3 - 2020-08-12#
Fixed#
- Fixed how truncation was handled with 
PretrainedTransformerTokenizer. Previously, ifmax_lengthwas set toNone, the tokenizer would still do truncation if the transformer model had a default max length in its config. Also, whenmax_lengthwas set to a non-Nonevalue, several warnings would appear for certain transformer models around the use of thetruncationparameter. - Fixed evaluation of all metrics when using distributed training.
 - Added a 
py.typedmarker. Fixed type annotations inallennlp.training.util. - Fixed problem with automatically detecting whether tokenization is necessary. This affected primarily the Roberta SST model.
 - Improved help text for using the --overrides command line flag.
 
v1.1.0rc2 - 2020-07-31#
Changed#
- Upgraded PyTorch requirement to 1.6.
 - Replaced the NVIDIA Apex AMP module with torch's native AMP module. The default trainer (
GradientDescentTrainer) now takes ause_amp: boolparameter instead of the oldopt_level: strparameter. 
Fixed#
- Removed unnecessary warning about deadlocks in 
DataLoader. - Fixed testing models that only return a loss when they are in training mode.
 - Fixed a bug in 
FromParamsthat caused silent failure in case of the parameter type beingOptional[Union[...]]. - Fixed a bug where the program crashes if 
evaluation_data_loaderis aAllennlpLazyDataset. 
Added#
- Added the option to specify 
requires_grad: falsewithin an optimizer's parameter groups. - Added the 
file-friendly-loggingflag back to thetraincommand. Also added this flag to thepredict,evaluate, andfind-learning-ratecommands. - Added an 
EpochCallbackto track current epoch as a model class member. - Added the option to enable or disable gradient checkpointing for transformer token embedders via boolean parameter 
gradient_checkpointing. 
Removed#
- Removed the 
opt_levelparameter toModel.loadandload_archive. In order to use AMP with a loaded model now, just run the model's forward pass within torch'sautocastcontext. 
v1.1.0rc1 - 2020-07-14#
Fixed#
- Reduced the amount of log messages produced by 
allennlp.common.file_utils. - Fixed a bug where 
PretrainedTransformerEmbedderparameters appeared to be trainable in the log output even whentrain_parameterswas set toFalse. - Fixed a bug with the sharded dataset reader where it would only read a fraction of the instances in distributed training.
 - Fixed checking equality of 
ArrayFields. - Fixed a bug where 
NamespaceSwappingFielddid not work correctly with.empty_field(). - Put more sensible defaults on the 
huggingface_adamwoptimizer. - Simplified logging so that all logging output always goes to one file.
 - Fixed interaction with the python command line debugger.
 - Log the grad norm properly even when we're not clipping it.
 - Fixed a bug where 
PretrainedModelInitializerfails to initialize a model with a 0-dim tensor - Fixed a bug with the layer unfreezing schedule of the 
SlantedTriangularlearning rate scheduler. - Fixed a regression with logging in the distributed setting. Only the main worker should write log output to the terminal.
 - Pinned the version of boto3 for package managers (e.g. poetry).
 - Fixed issue #4330 by updating the 
tokenizersdependency. - Fixed a bug in 
TextClassificationPredictorso that it passes tokenized inputs to theDatasetReaderin case it does not have a tokenizer. reg_lossis only now returned for models that have some regularization penalty configured.- Fixed a bug that prevented 
cached_pathfrom downloading assets from GitHub releases. - Fixed a bug that erroneously increased last label's false positive count in calculating fbeta metrics.
 Tqdmoutput now looks much better when the output is being piped or redirected.- Small improvements to how the API documentation is rendered.
 - Only show validation progress bar from main process in distributed training.
 
Added#
- Adjust beam search to support multi-layer decoder.
 - A method to ModelTestCase for running basic model tests when you aren't using config files.
 - Added some convenience methods for reading files.
 - Added an option to 
file_utils.cached_pathto automatically extract archives. - Added the ability to pass an archive file instead of a local directory to 
Vocab.from_files. - Added the ability to pass an archive file instead of a glob to 
ShardedDatasetReader. - Added a new 
"linear_with_warmup"learning rate scheduler. - Added a check in 
ShardedDatasetReaderthat ensures the base reader doesn't implement manual distributed sharding itself. - Added an option to 
PretrainedTransformerEmbedderandPretrainedTransformerMismatchedEmbedderto use a scalar mix of all hidden layers from the transformer model instead of just the last layer. To utilize this, just setlast_layer_onlytoFalse. cached_path()can now read files inside of archives.- Training metrics now include 
batch_lossandbatch_reg_lossin addition to aggregate loss across number of batches. 
Changed#
- Not specifying a 
cuda_devicenow automatically determines whether to use a GPU or not. - Discovered plugins are logged so you can see what was loaded.
 allennlp.data.DataLoaderis now an abstract registrable class. The default implementation remains the same, but was renamed toallennlp.data.PyTorchDataLoader.BertPoolercan now unwrap and re-wrap extra dimensions if necessary.- New 
transformersdependency. Only version >=3.0 now supported. 
v1.0.0 - 2020-06-16#
Fixed#
- Lazy dataset readers now work correctly with multi-process data loading.
 - Fixed race conditions that could occur when using a dataset cache.
 
Added#
- A bug where where all datasets would be loaded for vocab creation even if not needed.
 - A parameter to the 
DatasetReaderclass:manual_multi_process_sharding. This is similar to themanual_distributed_shardingparameter, but applies when using a multi-processDataLoader. 
v1.0.0rc6 - 2020-06-11#
Fixed#
- A bug where 
TextFields could not be duplicated since some tokenizers cannot be deep-copied. See https://github.com/allenai/allennlp/issues/4270. - Our caching mechanism had the potential to introduce race conditions if multiple processes were attempting to cache the same file at once. This was fixed by using a lock file tied to each cached file.
 get_text_field_mask()now supports padding indices that are not0.- A bug where 
predictor.get_gradients()would return an empty dictionary if an embedding layer had trainable set to false - Fixes 
PretrainedTransformerMismatchedIndexerin the case where a token consists of zero word pieces. - Fixes a bug when using a lazy dataset reader that results in a 
UserWarningfrom PyTorch being printed at every iteration during training. - Predictor names were inconsistently switching between dashes and underscores. Now they all use underscores.
 Predictor.from_pathnow automatically loads plugins (unless you specifyload_plugins=False) so that you don't have to manually import a bunch of modules when instantiating predictors from an archive path.allennlp-serverautomatically found as a plugin once again.
Added#
- A 
duplicate()method onInstances andFields, to be used instead ofcopy.deepcopy() - A batch sampler that makes sure each batch contains approximately the same number of tokens (
MaxTokensBatchSampler) - Functions to turn a sequence of token indices back into tokens
 - The ability to use Huggingface encoder/decoder models as token embedders
 - Improvements to beam search
 - ROUGE metric
 - Polynomial decay learning rate scheduler
 - A 
BatchCallbackfor logging CPU and GPU memory usage to tensorboard. This is mainly for debugging because using it can cause a significant slowdown in training. - Ability to run pretrained transformers as an embedder without training the weights
 - Add Optuna Integrated badge to README.md
 
Changed#
- Similar to our caching mechanism, we introduced a lock file to the vocab to avoid race conditions when saving/loading the vocab from/to the same serialization directory in different processes.
 - Changed the 
Token,Instance, andBatchclasses along with allFieldclasses to "slots" classes. This dramatically reduces the size in memory of instances. - SimpleTagger will no longer calculate span-based F1 metric when 
calculate_span_f1isFalse. - CPU memory for every worker is now reported in the logs and the metrics. Previously this was only reporting the CPU memory of the master process, and so it was only correct in the non-distributed setting.
 - To be consistent with PyTorch 
IterableDataset,AllennlpLazyDatasetno longer implements__len__(). Previously it would always return 1. - Removed old tutorials, in favor of the new AllenNLP Guide
 - Changed the vocabulary loading to consider new lines for Windows/Linux and Mac.
 
v1.0.0rc5 - 2020-05-26#
Fixed#
- Fix bug where 
PretrainedTransformerTokenizercrashed with some transformers (#4267) - Make 
cached_pathwork offline. - Tons of docstring inconsistencies resolved.
 - Nightly builds no longer run on forks.
 - Distributed training now automatically figures out which worker should see which instances
 - A race condition bug in distributed training caused from saving the vocab to file from the master process while other processing might be reading those files.
 - Unused dependencies in 
setup.pyremoved. 
Added#
- Additional CI checks to ensure docstrings are consistently formatted.
 - Ability to train on CPU with multiple processes by setting 
cuda_devicesto a list of negative integers in your training config. For example:"distributed": {"cuda_devices": [-1, -1]}. This is mainly to make it easier to test and debug distributed training code.. - Documentation for when parameters don't need config file entries.
 
Changed#
- The 
allennlp test-installcommand now just ensures the core submodules can be imported successfully, and prints out some other useful information such as the version, PyTorch version, and the number of GPU devices available. - All of the tests moved from 
allennlp/teststotestsat the root level, andallennlp/tests/fixturesmoved totest_fixturesat the root level. The PyPI source and wheel distributions will no longer include tests and fixtures. 
v1.0.0rc4 - 2020-05-14#
We first introduced this CHANGELOG after release v1.0.0rc4, so please refer to the GitHub release
notes for this and earlier releases.