Skip to content

vqa

allennlp_models.vision.metrics.vqa

[SOURCE]


VqaMeasure#

@Metric.register("vqa")
class VqaMeasure(Metric):
 | def __init__(self) -> None

Compute the VQA metric, as described in https://www.semanticscholar.org/paper/VQA%3A-Visual-Question-Answering-Agrawal-Lu/97ad70a9fa3f99adf18030e5e38ebe3d90daa2db

In VQA, we take the answer with the highest score, and then we find out how often humans decided this was the right answer. The accuracy score for an answer is min(1.0, human_count / 3).

This metric takes the logits from the models, i.e., a score for each possible answer, and the labels for the question, together with their weights.

__call__#

class VqaMeasure(Metric):
 | ...
 | def __call__(
 |     self,
 |     logits: torch.Tensor,
 |     labels: torch.Tensor,
 |     label_weights: torch.Tensor
 | )

Parameters

  • logits : torch.Tensor
    A tensor of predictions of shape (batch_size, num_classes).
  • labels : torch.Tensor
    A tensor of integer class label of shape (batch_size, num_labels).
  • label_weights : torch.Tensor
    A tensor of floats of shape (batch_size, num_labels), giving a weight or score to every one of the labels.

get_metric#

class VqaMeasure(Metric):
 | ...
 | def get_metric(self, reset: bool = False)

reset#

class VqaMeasure(Metric):
 | ...
 | def reset(self) -> None