Skip to content




The evaluate subcommand can be used to evaluate a trained model against a dataset and report any metrics calculated by the model.


class Evaluate(Subcommand)


class Evaluate(Subcommand):
 | ...
 | def add_subparser(
 |     self,
 |     parser: argparse._SubParsersAction
 | ) -> argparse.ArgumentParser


def evaluate_from_args(args: argparse.Namespace) -> Dict[str, Any]


def evaluate_from_archive(
    archive_file: Union[str, PathLike],
    input_file: str,
    metrics_output_file: Optional[str] = None,
    predictions_output_file: Optional[str] = None,
    batch_size: Optional[int] = None,
    cmd_overrides: Union[str, Dict[str, Any]] = "",
    cuda_device: int = -1,
    embedding_sources_mapping: str = None,
    extend_vocab: bool = False,
    weights_file: str = None,
    file_friendly_logging: bool = False,
    batch_weight_key: str = None,
    auto_names: str = "NONE"
) -> Dict[str, Any]


  • archive_file : Union[str, PathLike]
    Path to an archived trained model.

  • input_file : str
    path to the file containing the evaluation data (for multiple files, put ":" between filenames e.g., input1.txt:input2.txt)

  • metrics_output_file : str, optional (default = None)
    optional path to write the metrics to as JSON (for multiple files, put ":" between filenames e.g., output1.txt:output2.txt)

  • predictions_output_file : str, optional (default = None)
    "optional path to write the predictions to (for multiple files, put ":" between filenames e.g., output1.jsonl:output2.jsonl)

  • batch_size : int, optional (default = None)
    If non-empty, the batch size to use during evaluation.

  • cmd_overrides : str, optional (default = "")
    a json(net) structure used to override the experiment configuration, e.g., '{\"iterator.batch_size\": 16}'. Nested parameters can be specified either with nested dictionaries or with dot syntax.

  • cuda_device : int, optional (default = -1)
    id of GPU to use (if any)

  • embedding_sources_mapping : str, optional (default = None)
    a JSON dict defining mapping from embedding module path to embedding pretrained-file used during training. If not passed, and embedding needs to be extended, we will try to use the original file paths used during training. If they are not available we will use random vectors for embedding extension.

  • extend_vocab : bool, optional (default = False)
    if specified, we will use the instances in your new dataset to extend your vocabulary. If pretrained-file was used to initialize embedding layers, you may also need to pass --embedding-sources-mapping.

  • weights_file : str, optional (default = None)
    A path that overrides which weights file to use

  • file_friendly_logging : bool, optional (default = False)
    If True, we add newlines to tqdm output, even on an interactive terminal, and we slow down tqdm's output to only once every 10 seconds.

  • batch_weight_key : str, optional (default = None)
    If non-empty, name of metric used to weight the loss on a per-batch basis.

  • auto_names : str, optional (default = "NONE")
    Automatically create output names for each evaluation file.NONE will not automatically generate a file name for the neither the metrics nor the predictions. In this case you will need to pas in both metrics_output_file and predictions_output_file. METRICS will only automatically create a file name for the metrics file. PREDS will only automatically create a file name for the predictions outputs. ALL will create a filename for both the metrics and the predictions.


  • all_metrics : Dict[str, Any]
    The metrics from every evaluation file passed.