rouge

allennlp.training.metrics.rouge

ROUGE#

@Metric.register("rouge")
class ROUGE(Metric):
 | def __init__(
 |     self,
 |     ngram_size: int = 2,
 |     exclude_indices: Set[int] = None
 | ) -> None

Recall-Oriented Understudy for Gisting Evaluation (ROUGE)

ROUGE is a metric for measuring the quality of summaries. It is based on calculating the recall between ngrams in the predicted summary and a set of reference summaries. See Lin, "ROUGE: A Package For Automatic Evaluation Of Summaries", 2004.

Parameters

ngram_size : int, optional (default = 2)
ROUGE scores are calculate for ROUGE-1 .. ROUGE-ngram_size
exclude_indices : Set[int], optional (default = None)
Indices to exclude when calculating ngrams. This should usually include the indices of the start, end, and pad tokens.

reset#

class ROUGE(Metric):
 | ...
 | @overrides
 | def reset(self) -> None

call#

class ROUGE(Metric):
 | ...
 | @overrides
 | def __call__(
 |     self,
 |     predictions: torch.LongTensor,
 |     gold_targets: torch.LongTensor
 | ) -> None

Update recall counts.

Parameters

predictions : torch.LongTensor
Batched predicted tokens of shape (batch_size, max_sequence_length).
references : torch.LongTensor
Batched reference (gold) sequences with shape (batch_size, max_gold_sequence_length).

Returns

None

get_metric#

class ROUGE(Metric):
 | ...
 | @overrides
 | def get_metric(self, reset: bool = False) -> Dict[str, float]

Parameters

reset : bool, optional (default = False)
Reset any accumulators or internal state.

Returns

Dict[str, float]:
A dictionary containing ROUGE-1 .. ROUGE-ngram_size scores.

rouge

ROUGE#

reset#

__call__#

get_metric#

call#