Skip to content

rouge

allennlp.training.metrics.rouge

[SOURCE]


ROUGE

@Metric.register("rouge")
class ROUGE(Metric):
 | def __init__(
 |     self,
 |     ngram_size: int = 2,
 |     exclude_indices: Set[int] = None
 | ) -> None

Recall-Oriented Understudy for Gisting Evaluation (ROUGE)

ROUGE is a metric for measuring the quality of summaries. It is based on calculating the recall between ngrams in the predicted summary and a set of reference summaries. See Lin, "ROUGE: A Package For Automatic Evaluation Of Summaries", 2004.

Parameters

  • ngram_size : int, optional (default = 2)
    ROUGE scores are calculate for ROUGE-1 .. ROUGE-ngram_size
  • exclude_indices : Set[int], optional (default = None)
    Indices to exclude when calculating ngrams. This should usually include the indices of the start, end, and pad tokens.

reset

class ROUGE(Metric):
 | ...
 | def reset(self) -> None

__call__

class ROUGE(Metric):
 | ...
 | def __call__(
 |     self,
 |     predictions: torch.LongTensor,
 |     gold_targets: torch.LongTensor,
 |     mask: Optional[torch.BoolTensor] = None
 | ) -> None

Update recall counts.

Parameters

  • predictions : torch.LongTensor
    Batched predicted tokens of shape (batch_size, max_sequence_length).
  • references : torch.LongTensor
    Batched reference (gold) sequences with shape (batch_size, max_gold_sequence_length).

Returns

  • None

get_metric

class ROUGE(Metric):
 | ...
 | def get_metric(self, reset: bool = False) -> Dict[str, float]

Parameters

  • reset : bool, optional (default = False)
    Reset any accumulators or internal state.

Returns

  • Dict[str, float]:
    A dictionary containing ROUGE-1 .. ROUGE-ngram_size scores.