allennlp.tools

Modules containing official evaluators of various tasks for which we build models.

allennlp.tools.drop_eval.answer_json_to_strings(answer: Dict[str, Any]) → Tuple[Tuple[str, ...], str][source]

Takes an answer JSON blob from the DROP data release and converts it into strings used for evaluation.

allennlp.tools.drop_eval.evaluate_json(annotations: Dict[str, Any], predicted_answers: Dict[str, Any]) → Tuple[float, float][source]

Takes gold annotations and predicted answers and evaluates the predictions for each question in the gold annotations. Both JSON dictionaries must have query_id keys, which are used to match predictions to gold annotations (note that these are somewhat deep in the JSON for the gold annotations, but must be top-level keys in the predicted answers).

The annotations are assumed to have the format of the dev set in the DROP data release. The predicted_answers JSON must be a dictionary keyed by query id, where the value is a string (or list of strings) that is the answer.

allennlp.tools.drop_eval.evaluate_prediction_file(prediction_path: str, gold_path: str, output_path: Union[str, NoneType] = None) → Tuple[float, float][source]

Takes a prediction file and a gold file and evaluates the predictions for each question in the gold file. Both files must be json formatted and must have query_id keys, which are used to match predictions to gold annotations. The gold file is assumed to have the format of the dev set in the DROP data release. The prediction file must be a JSON dictionary keyed by query id, where the value is either a JSON dictionary with an “answer” key, or just a string (or list of strings) that is the answer. Writes a json with global_em and global_f1 metrics to file at the specified output path, unless None is passed as output path.

allennlp.tools.drop_eval.get_metrics(predicted: Union[str, List[str], Tuple[str, ...]], gold: Union[str, List[str], Tuple[str, ...]]) → Tuple[float, float][source]

Takes a predicted answer and a gold answer (that are both either a string or a list of strings), and returns exact match and the DROP F1 metric for the prediction. If you are writing a script for evaluating objects in memory (say, the output of predictions during validation, or while training), this is the function you want to call, after using answer_json_to_strings() when reading the gold answer from the released data file.

This evaluation script relies heavily on the one for DROP (allennlp/tools/drop_eval.py). We need a separate script for Quoref only because the data formats are slightly different.

allennlp.tools.quoref_eval.evaluate_json(annotations: Dict[str, Any], predicted_answers: Dict[str, Any]) → Tuple[float, float][source]

Takes gold annotations and predicted answers and evaluates the predictions for each question in the gold annotations. Both JSON dictionaries must have query_id keys, which are used to match predictions to gold annotations.

The predicted_answers JSON must be a dictionary keyed by query id, where the value is a list of strings (or just one string) that is the answer. The annotations are assumed to have either the format of the dev set in the Quoref data release, or the same format as the predicted answers file.

allennlp.tools.quoref_eval.evaluate_prediction_file(prediction_path: str, gold_path: str, output_path: Union[str, NoneType] = None) → Tuple[float, float][source]

Takes a prediction file and a gold file and evaluates the predictions for each question in the gold file. Both files must be json formatted and must have query_id keys, which are used to match predictions to gold annotations. Writes a json with global_em and global_f1 metrics to file at the specified output path, unless None is passed as output path.

Official evaluation script for v1.1 of the SQuAD dataset.

allennlp.tools.squad_eval.evaluate(dataset, predictions)[source]
allennlp.tools.squad_eval.exact_match_score(prediction, ground_truth)[source]
allennlp.tools.squad_eval.f1_score(prediction, ground_truth)[source]
allennlp.tools.squad_eval.metric_max_over_ground_truths(metric_fn, prediction, ground_truths)[source]
allennlp.tools.squad_eval.normalize_answer(s)[source]

Lower text and remove punctuation, articles and extra whitespace.

This is the official evaluator taken from the original dataset. I made minimal changes to make it Python 3 compatible, and conform to our style guidelines.

class allennlp.tools.wikitables_evaluator.DateValue(year, month, day, original_string=None)[source]

Bases: allennlp.tools.wikitables_evaluator.Value

match(self, other)[source]

Return True if the value matches the other value.

Args:

other (Value)

Returns:

a boolean

static parse(text)[source]

Try to parse into a date.

Return:

tuple (year, month, date) if successful; otherwise None.

property ymd
class allennlp.tools.wikitables_evaluator.NumberValue(amount, original_string=None)[source]

Bases: allennlp.tools.wikitables_evaluator.Value

property amount
match(self, other)[source]

Return True if the value matches the other value.

Args:

other (Value)

Returns:

a boolean

static parse(text)[source]

Try to parse into a number.

Return:

the number (int or float) if successful; otherwise None.

class allennlp.tools.wikitables_evaluator.StringValue(content)[source]

Bases: allennlp.tools.wikitables_evaluator.Value

match(self, other)[source]

Return True if the value matches the other value.

Args:

other (Value)

Returns:

a boolean

class allennlp.tools.wikitables_evaluator.Value[source]

Bases: object

abstract match(self, other)[source]

Return True if the value matches the other value.

Args:

other (Value)

Returns:

a boolean

property normalized
allennlp.tools.wikitables_evaluator.check_denotation(target_values, predicted_values)[source]

Return True if the predicted denotation is correct.

Args:

target_values (list[Value]) predicted_values (list[Value])

Returns:

bool

allennlp.tools.wikitables_evaluator.main()[source]
allennlp.tools.wikitables_evaluator.normalize(x)[source]
allennlp.tools.wikitables_evaluator.to_value(original_string, corenlp_value=None)[source]

Convert the string to Value object.

Args:

original_string (basestring): Original string corenlp_value (basestring): Optional value returned from CoreNLP

Returns:

Value

allennlp.tools.wikitables_evaluator.to_value_list(original_strings, corenlp_values=None)[source]

Convert a list of strings to a list of Values

Args:

original_strings (list[basestring]) corenlp_values (list[basestring or None])

Returns:

list[Value]

allennlp.tools.wikitables_evaluator.tsv_unescape(x)[source]

Unescape strings in the TSV file. Escaped characters include: - newline (0x10) -> backslash + n - vertical bar (0x7C) -> backslash + p - backslash (0x5C) -> backslash + backslash

Parameters
xstr
Returns
str
allennlp.tools.wikitables_evaluator.tsv_unescape_list(x)[source]

Unescape a list in the TSV file. List items are joined with vertical bars (0x5C)

Args:

x (str or unicode)

Returns:

a list of unicodes

Helper script for modifying config.json files that are locked inside model.tar.gz archives. This is useful if you need to rename things or add or remove values, usually because of changes to the library.

This script will untar the archive to a temp directory, launch an editor to modify the config.json, and then re-tar everything to a new archive. If your $EDITOR environment variable is not set, you’ll have to explicitly specify which editor to use.

allennlp.tools.archive_surgery.main()[source]
allennlp.tools.create_elmo_embeddings_from_vocab.main(vocab_path: str, elmo_config_path: str, elmo_weights_path: str, output_dir: str, batch_size: int, device: int, use_custom_oov_token: bool = False)[source]

Creates ELMo word representations from a vocabulary file. These word representations are _independent_ - they are the result of running the CNN and Highway layers of the ELMo model, but not the Bidirectional LSTM. ELMo requires 2 additional tokens: <S> and </S>. The first token in this file is assumed to be an unknown token.

This script produces two artifacts: A new vocabulary file with the <S> and </S> tokens inserted and a glove formatted embedding file containing word : vector pairs, one per line, with all values separated by a space.

allennlp.tools.inspect_cache.main()[source]