Skip to content

evaluation

allennlp.tango.evaluation

[SOURCE]


AllenNLP Tango is an experimental API and parts of it might change or disappear every time we release a new version.

EvaluationStep

@Step.register("evaluation")
class EvaluationStep(Step)

This step evaluates a given model on a given dataset.

DETERMINISTIC

class EvaluationStep(Step):
 | ...
 | DETERMINISTIC = True

VERSION

class EvaluationStep(Step):
 | ...
 | VERSION = "002"

FORMAT

class EvaluationStep(Step):
 | ...
 | FORMAT: Format = JsonFormat(compress="gz")

EvaluationResult

@dataclasses.dataclass
class EvaluationResult

metrics

class EvaluationResult:
 | ...
 | metrics: Dict[str, Any] = None

predictions

class EvaluationResult:
 | ...
 | predictions: List[Dict[str, Any]] = None

run

class EvaluationStep(Step):
 | ...
 | def run(
 |     self,
 |     model: Model,
 |     dataset: DatasetDict,
 |     split: str = "validation",
 |     data_loader: Optional[Lazy[TangoDataLoader]] = None
 | ) -> EvaluationResult

Runs an evaluation on a dataset.

  • model is the model we want to evaluate.
  • dataset is the dataset we want to evaluate on.
  • split is the name of the split we want to evaluate on.
  • data_loader gives you the chance to choose a custom dataloader for the evaluation. By default this step evaluates on batches of 32 instances each.