allennlp.common.util¶
Various utilities that don’t fit anwhere else.
-
allennlp.common.util.
add_noise_to_dict_values
(dictionary: Dict[~A, float], noise_param: float) → Dict[~A, float][source]¶ Returns a new dictionary with noise added to every key in
dictionary
. The noise is uniformly distributed withinnoise_param
percent of the value for every value in the dictionary.
-
allennlp.common.util.
cleanup_global_logging
(stdout_handler: logging.FileHandler) → None[source]¶ This function closes any open file handles and logs set up by prepare_global_logging.
- Parameters
- stdout_handler
logging.FileHandler
, required. The file handler returned from prepare_global_logging, attached to the global logger.
- stdout_handler
-
allennlp.common.util.
dump_metrics
(file_path: str, metrics: Dict[str, Any], log: bool = False) → None[source]¶
-
allennlp.common.util.
ensure_list
(iterable: Iterable[~A]) → List[~A][source]¶ An Iterable may be a list or a generator. This ensures we get a list without making an unnecessary copy.
-
allennlp.common.util.
get_frozen_and_tunable_parameter_names
(model: torch.nn.modules.module.Module) → List[source]¶
-
allennlp.common.util.
get_spacy_model
(spacy_model_name: str, pos_tags: bool, parse: bool, ner: bool) → spacy.language.Language[source]¶ In order to avoid loading spacy models a whole bunch of times, we’ll save references to them, keyed by the options we used to create the spacy model, so any particular configuration only gets loaded once.
-
allennlp.common.util.
gpu_memory_mb
() → Dict[int, int][source]¶ Get the current GPU memory usage. Based on https://discuss.pytorch.org/t/access-gpu-memory-usage-in-pytorch/3192/4
- Returns
Dict[int, int]
Keys are device ids as integers. Values are memory usage as integers in MB. Returns an empty
dict
if GPUs are not available.
-
allennlp.common.util.
group_by_count
(iterable: List[Any], count: int, default_value: Any) → List[List[Any]][source]¶ Takes a list and groups it into sublists of size
count
, usingdefault_value
to pad the list at the end if the list is not divisable bycount
.For example: >>> group_by_count([1, 2, 3, 4, 5, 6, 7], 3, 0) [[1, 2, 3], [4, 5, 6], [7, 0, 0]]
This is a short method, but it’s complicated and hard to remember as a one-liner, so we just make a function out of it.
-
allennlp.common.util.
import_submodules
(package_name: str) → None[source]¶ Import all submodules under the given package. Primarily useful so that people using AllenNLP as a library can specify their own custom packages and have their custom classes get loaded and registered.
-
allennlp.common.util.
is_lazy
(iterable: Iterable[~A]) → bool[source]¶ Checks if the given iterable is lazy, which here just means it’s not a list.
-
allennlp.common.util.
lazy_groups_of
(iterator: Iterator[~A], group_size: int) → Iterator[List[~A]][source]¶ Takes an iterator and batches the individual instances into lists of the specified size. The last list may be smaller if there are instances left over.
-
allennlp.common.util.
namespace_match
(pattern: str, namespace: str)[source]¶ Matches a namespace pattern against a namespace string. For example,
*tags
matchespassage_tags
andquestion_tags
andtokens
matchestokens
but notstemmed_tokens
.
-
allennlp.common.util.
pad_sequence_to_length
(sequence: List, desired_length: int, default_value: Callable[[], Any] = <function <lambda> at 0x10dd2fc20>, padding_on_right: bool = True) → List[source]¶ Take a list of objects and pads it to the desired length, returning the padded list. The original list is not modified.
- Parameters
- sequenceList
A list of objects to be padded.
- desired_lengthint
Maximum length of each sequence. Longer sequences are truncated to this length, and shorter ones are padded to it.
- default_value: Callable, default=lambda: 0
Callable that outputs a default value (of any type) to use as padding values. This is a lambda to avoid using the same object when the default value is more complex, like a list.
- padding_on_rightbool, default=True
When we add padding tokens (or truncate the sequence), should we do it on the right or the left?
- Returns
- padded_sequenceList
-
allennlp.common.util.
peak_memory_mb
() → float[source]¶ Get peak memory usage for this process, as measured by max-resident-set size:
Only works on OSX and Linux, returns 0.0 otherwise.
-
allennlp.common.util.
prepare_environment
(params: allennlp.common.params.Params)[source]¶ Sets random seeds for reproducible experiments. This may not work as expected if you use this from within a python project in which you have already imported Pytorch. If you use the scripts/run_model.py entry point to training models with this library, your experiments should be reasonably reproducible. If you are using this from your own project, you will want to call this function before importing Pytorch. Complete determinism is very difficult to achieve with libraries doing optimized linear algebra due to massively parallel execution, which is exacerbated by using GPUs.
- Parameters
- params: Params object or dict, required.
A
Params
object or dict holding the json parameters.
-
allennlp.common.util.
prepare_global_logging
(serialization_dir: str, file_friendly_logging: bool) → logging.FileHandler[source]¶ This function configures 3 global logging attributes - streaming stdout and stderr to a file as well as the terminal, setting the formatting for the python logging library and setting the interval frequency for the Tqdm progress bar.
Note that this function does not set the logging level, which is set in
allennlp/run.py
.- Parameters
- serialization_dir
str
, required. The directory to stream logs to.
- file_friendly_logging
bool
, required. Whether logs should clean the output to prevent carriage returns (used to update progress bars on a single terminal line). This option is typically only used if you are running in an environment without a terminal.
- serialization_dir
- Returns
logging.FileHandler
A logging file handler that can later be closed and removed from the global logger.