Various utilities that don't fit anywhere else.
add_noise_to_dict_values( dictionary: Dict[~A, float], noise_param: float, ) -> Dict[~A, float]
Returns a new dictionary with noise added to every key in
dictionary. The noise is
uniformly distributed within
noise_param percent of the value for every value in the
ensure_list(iterable:Iterable[~A]) -> List[~A]
An Iterable may be a list or a generator. This ensures we get a list without making an unnecessary copy.
TQDM and requests use carriage returns to get the training line to update for each batch without adding more lines to the terminal output. Displaying those in a file won't work correctly, so we'll just make sure that each batch shows up on its one line.
get_spacy_model( spacy_model_name: str, pos_tags: bool, parse: bool, ner: bool, ) -> spacy.language.Language
In order to avoid loading spacy models a whole bunch of times, we'll save references to them, keyed by the options we used to create the spacy model, so any particular configuration only gets loaded once.
gpu_memory_mb() -> Dict[int, int]
Get the current GPU memory usage. Based on https://discuss.pytorch.org/t/access-gpu-memory-usage-in-pytorch/3192/4
Keys are device ids as integers.
Values are memory usage as integers in MB.
Returns an empty
dict if GPUs are not available.
group_by_count( iterable: List[Any], count: int, default_value: Any, ) -> List[List[Any]]
Takes a list and groups it into sublists of size
default_value to pad the
list at the end if the list is not divisable by
>>> group_by_count([1, 2, 3, 4, 5, 6, 7], 3, 0) [[1, 2, 3], [4, 5, 6], [7, 0, 0]]
This is a short method, but it's complicated and hard to remember as a one-liner, so we just make a function out of it.
import_module_and_submodules(package_name:str) -> None
Import all submodules under the given package. Primarily useful so that people using AllenNLP as a library can specify their own custom packages and have their custom classes get loaded and registered.
is_distributed() -> bool
Checks if the distributed process group is available and has been initialized
is_lazy(iterable:Iterable[~A]) -> bool
Checks if the given iterable is lazy, which here just means it's not a list.
is_master( global_rank: int = None, world_size: int = None, num_procs_per_node: int = None, ) -> bool
Checks if the process is a "master" of its node in a distributed process group. If a
process group is not initialized, this returns
- global_rank : int ( default = None )
Global rank of the process if in a distributed process group. If not
given, rank is obtained using
- world_size : int ( default = None )
Number of processes in the distributed group. If not
given, this is obtained using
- num_procs_per_node: int ( default = None ), Number of GPU processes running per node
lazy_groups_of(iterable:Iterable[~A], group_size:int) -> Iterator[List[~A]]
Takes an iterable and batches the individual instances into lists of the specified size. The last list may be smaller if there are instances left over.
Matches a namespace pattern against a namespace string. For example,
tokens but not
pad_sequence_to_length( sequence: List, desired_length: int, default_value: Callable[, Any] = <function <lambda> at 0x7f28bac27510>, padding_on_right: bool = True, ) -> List
Take a list of objects and pads it to the desired length, returning the padded list. The original list is not modified.
sequence : List A list of objects to be padded.
desired_length : int Maximum length of each sequence. Longer sequences are truncated to this length, and shorter ones are padded to it.
default_value: Callable, default=lambda: 0 Callable that outputs a default value (of any type) to use as padding values. This is a lambda to avoid using the same object when the default value is more complex, like a list.
padding_on_right : bool, default=True When we add padding tokens (or truncate the sequence), should we do it on the right or the left?
peak_memory_mb() -> float
Get peak memory usage for this process, as measured by max-resident-set size:
Only works on OSX and Linux, returns 0.0 otherwise.
Sets random seeds for reproducible experiments. This may not work as expected if you use this from within a python project in which you have already imported Pytorch. If you use the scripts/run_model.py entry point to training models with this library, your experiments should be reasonably reproducible. If you are using this from your own project, you will want to call this function before importing Pytorch. Complete determinism is very difficult to achieve with libraries doing optimized linear algebra due to massively parallel execution, which is exacerbated by using GPUs.
- params: Params object or dict, required.
Paramsobject or dict holding the json parameters.
push_python_path( path: Union[os.PathLike, str], ) -> Generator[NoneType, NoneType, NoneType]
Prepends the given path to
This method is intended to use with
with, so after its usage, its value willbe removed from
pushd( new_dir: Union[os.PathLike, str], verbose: bool = False, ) -> Generator[NoneType, NoneType, NoneType]
Changes the current directory to the given path and prepends it to
This method is intended to use with
with, so after its usage, the current directory will be
set to the previous value.
sanitize(x:Any) -> Any
Sanitize turns PyTorch and Numpy types into basic Python types so they can be serialized into JSON.
sanitize_wordpiece(wordpiece:str) -> str
Sanitizes wordpieces from BERT, RoBERTa or ALBERT tokenizers.