evaluation

allennlp.tango.evaluation

AllenNLP Tango is an experimental API and parts of it might change or disappear every time we release a new version.

EvaluationStep¶

@Step.register("evaluation")
class EvaluationStep(Step)

This step evaluates a given model on a given dataset.

DETERMINISTIC¶

class EvaluationStep(Step):
 | ...
 | DETERMINISTIC = True

VERSION¶

class EvaluationStep(Step):
 | ...
 | VERSION = "002"

FORMAT¶

class EvaluationStep(Step):
 | ...
 | FORMAT: Format = JsonFormat(compress="gz")

EvaluationResult¶

@dataclasses.dataclass
class EvaluationResult

metrics¶

class EvaluationResult:
 | ...
 | metrics: Dict[str, Any] = None

predictions¶

class EvaluationResult:
 | ...
 | predictions: List[Dict[str, Any]] = None

run¶

class EvaluationStep(Step):
 | ...
 | def run(
 |     self,
 |     model: Model,
 |     dataset: DatasetDict,
 |     split: str = "validation",
 |     data_loader: Optional[Lazy[TangoDataLoader]] = None
 | ) -> EvaluationResult

Runs an evaluation on a dataset.

model is the model we want to evaluate.
dataset is the dataset we want to evaluate on.
split is the name of the split we want to evaluate on.
data_loader gives you the chance to choose a custom dataloader for the evaluation. By default this step evaluates on batches of 32 instances each.