rouge
allennlp.training.metrics.rouge
ROUGE#
@Metric.register("rouge")
class ROUGE(Metric):
| def __init__(
| self,
| ngram_size: int = 2,
| exclude_indices: Set[int] = None
| ) -> None
Recall-Oriented Understudy for Gisting Evaluation (ROUGE)
ROUGE is a metric for measuring the quality of summaries. It is based on calculating the recall between ngrams in the predicted summary and a set of reference summaries. See Lin, "ROUGE: A Package For Automatic Evaluation Of Summaries", 2004.
Parameters
- ngram_size :
int
, optional (default =2
)
ROUGE scores are calculate for ROUGE-1 .. ROUGE-ngram_size
- exclude_indices :
Set[int]
, optional (default =None
)
Indices to exclude when calculating ngrams. This should usually include the indices of the start, end, and pad tokens.
reset#
class ROUGE(Metric):
| ...
| @overrides
| def reset(self) -> None
__call__#
class ROUGE(Metric):
| ...
| @overrides
| def __call__(
| self,
| predictions: torch.LongTensor,
| gold_targets: torch.LongTensor
| ) -> None
Update recall counts.
Parameters
- predictions :
torch.LongTensor
Batched predicted tokens of shape(batch_size, max_sequence_length)
. - references :
torch.LongTensor
Batched reference (gold) sequences with shape(batch_size, max_gold_sequence_length)
.
Returns
- None
get_metric#
class ROUGE(Metric):
| ...
| @overrides
| def get_metric(self, reset: bool = False) -> Dict[str, float]
Parameters
- reset :
bool
, optional (default =False
)
Reset any accumulators or internal state.
Returns
- Dict[str, float]:
A dictionary containingROUGE-1
..ROUGE-ngram_size
scores.