Skip to content

bias_metrics

allennlp.fairness.bias_metrics

[SOURCE]


A suite of metrics to quantify how much bias is encoded by word embeddings and determine the effectiveness of bias mitigation.

Bias metrics are based on:

  1. Caliskan, A., Bryson, J., & Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science, 356, 183 - 186.

  2. Dev, S., & Phillips, J.M. (2019). Attenuating Bias in Word Vectors. AISTATS.

  3. Dev, S., Li, T., Phillips, J.M., & Srikumar, V. (2020). On Measuring and Mitigating Biased Inferences of Word Embeddings. ArXiv, abs/1908.09369.

  4. Rathore, A., Dev, S., Phillips, J.M., Srikumar, V., Zheng, Y., Yeh, C.M., Wang, J., Zhang, W., & Wang, B. (2021). VERB: Visualizing and Interpreting Bias Mitigation Techniques for Word Representations. ArXiv, abs/2104.02797.

  5. Aka, O.; Burke, K.; Bäuerle, A.; Greer, C.; and Mitchell, M. 2021. Measuring model biases in the absence of ground truth. arXiv preprint arXiv:2103.03417.

WordEmbeddingAssociationTest

class WordEmbeddingAssociationTest

Word Embedding Association Test (WEAT) score measures the unlikelihood there is no difference between two sets of target words in terms of their relative similarity to two sets of attribute words by computing the probability that a random permutation of attribute words would produce the observed (or greater) difference in sample means. Analog of Implicit Association Test from psychology for word embeddings.

Based on: Caliskan, A., Bryson, J., & Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science, 356, 183 - 186.

__call__

class WordEmbeddingAssociationTest:
 | ...
 | def __call__(
 |     self,
 |     target_embeddings1: torch.Tensor,
 |     target_embeddings2: torch.Tensor,
 |     attribute_embeddings1: torch.Tensor,
 |     attribute_embeddings2: torch.Tensor
 | ) -> torch.FloatTensor

Parameters

Note

In the examples below, we treat gender identity as binary, which does not accurately characterize gender in real life.

  • target_embeddings1 : torch.Tensor
    A tensor of size (target_embeddings_batch_size, ..., dim) containing target word embeddings related to a concept group. For example, if the concept is gender, target_embeddings1 could contain embeddings for linguistically masculine words, e.g. "man", "king", "brother", etc. Represented as X.

  • target_embeddings2 : torch.Tensor
    A tensor of the same size as target_embeddings1 containing target word embeddings related to a different group for the same concept. For example, target_embeddings2 could contain embeddings for linguistically feminine words, e.g. "woman", "queen", "sister", etc. Represented as Y.

  • attribute_embeddings1 : torch.Tensor
    A tensor of size (attribute_embeddings1_batch_size, ..., dim) containing attribute word embeddings related to a concept group associated with the concept group for target_embeddings1. For example, if the concept is professions, attribute_embeddings1 could contain embeddings for stereotypically male professions, e.g. "doctor", "banker", "engineer", etc. Represented as A.

  • attribute_embeddings2 : torch.Tensor
    A tensor of size (attribute_embeddings2_batch_size, ..., dim) containing attribute word embeddings related to a concept group associated with the concept group for target_embeddings2. For example, if the concept is professions, attribute_embeddings2 could contain embeddings for stereotypically female professions, e.g. "nurse", "receptionist", "homemaker", etc. Represented as B.

Note

While target_embeddings1 and target_embeddings2 must be the same size, attribute_embeddings1 and attribute_embeddings2 need not be the same size.

Returns

  • weat_score : torch.FloatTensor
    The unlikelihood there is no difference between target_embeddings1 and target_embeddings2 in terms of their relative similarity to attribute_embeddings1 and attribute_embeddings2. Typical values are around [-1, 1], with values closer to 0 indicating less biased associations.

EmbeddingCoherenceTest

class EmbeddingCoherenceTest

Embedding Coherence Test (ECT) score measures if groups of words have stereotypical associations by computing the Spearman Coefficient of lists of attribute embeddings sorted based on their similarity to target embeddings.

Based on: Dev, S., & Phillips, J.M. (2019). Attenuating Bias in Word Vectors. AISTATS.

__call__

class EmbeddingCoherenceTest:
 | ...
 | def __call__(
 |     self,
 |     target_embeddings1: torch.Tensor,
 |     target_embeddings2: torch.Tensor,
 |     attribute_embeddings: torch.Tensor
 | ) -> torch.FloatTensor

Parameters

Note

In the examples below, we treat gender identity as binary, which does not accurately characterize gender in real life.

  • target_embeddings1 : torch.Tensor
    A tensor of size (target_embeddings_batch_size, ..., dim) containing target word embeddings related to a concept group. For example, if the concept is gender, target_embeddings1 could contain embeddings for linguistically masculine words, e.g. "man", "king", "brother", etc. Represented as X.

  • target_embeddings2 : torch.Tensor
    A tensor of the same size as target_embeddings1 containing target word embeddings related to a different group for the same concept. For example, target_embeddings2 could contain embeddings for linguistically feminine words, e.g. "woman", "queen", "sister", etc. Represented as Y.

  • attribute_embeddings : torch.Tensor
    A tensor of size (attribute_embeddings_batch_size, ..., dim) containing attribute word embeddings related to a concept associated with target_embeddings1 and target_embeddings2. For example, if the concept is professions, attribute_embeddings could contain embeddings for "doctor", "banker", "engineer", etc. Represented as AB.

Returns

  • ect_score : torch.FloatTensor
    The Spearman Coefficient measuring the similarity of lists of attribute embeddings sorted based on their similarity to the target embeddings. Ranges from [-1, 1], with values closer to 1 indicating less biased associations.

spearman_correlation

class EmbeddingCoherenceTest:
 | ...
 | def spearman_correlation(self, x: torch.Tensor, y: torch.Tensor)

NaturalLanguageInference

@Metric.register("nli")
class NaturalLanguageInference(Metric):
 | def __init__(
 |     self,
 |     neutral_label: int = 2,
 |     taus: List[float] = [0.5, 0.7]
 | )

Natural language inference scores measure the effect biased associations have on decisions made downstream, given neutrally-constructed pairs of sentences differing only in the subject.

  1. Net Neutral (NN): The average probability of the neutral label across all sentence pairs.

  2. Fraction Neutral (FN): The fraction of sentence pairs predicted neutral.

  3. Threshold:tau (T:tau): A parameterized measure that reports the fraction of examples whose probability of neutral is above tau.

Parameters

  • neutral_label : int, optional (default = 2)
    The discrete integer label corresponding to a neutral entailment prediction.
  • taus : List[float], optional (default = [0.5, 0.7])
    All the taus for which to compute Threshold:tau.

Based on: Dev, S., Li, T., Phillips, J.M., & Srikumar, V. (2020). On Measuring and Mitigating Biased Inferences of Word Embeddings. ArXiv, abs/1908.09369.

__call__

class NaturalLanguageInference(Metric):
 | ...
 | def __call__(self, nli_probabilities: torch.Tensor) -> None

Parameters

Note

In the examples below, we treat gender identity as binary, which does not accurately characterize gender in real life.

  • nli_probabilities : torch.Tensor
    A tensor of size (batch_size, ..., 3) containing natural language inference (i.e. entailment, contradiction, and neutral) probabilities for neutrally-constructed pairs of sentences differing only in the subject. For example, if the concept is gender, nli_probabilities could contain the natural language inference probabilities of:

    • "The driver owns a cabinet." -> "The man owns a cabinet."

    • "The driver owns a cabinet." -> "The woman owns a cabinet."

    • "The doctor eats an apple." -> "The man eats an apple."

    • "The doctor eats an apple." -> "The woman eats an apple."

get_metric

class NaturalLanguageInference(Metric):
 | ...
 | def get_metric(self, reset: bool = False)

Returns

  • nli_scores : Dict[str, float]
    Contains the following keys:

    1. "net_neutral" : The average probability of the neutral label across all sentence pairs. A value closer to 1 suggests lower bias, as bias will result in a higher probability of entailment or contradiction.

    2. "fraction_neutral" : The fraction of sentence pairs predicted neutral. A value closer to 1 suggests lower bias, as bias will result in a higher probability of entailment or contradiction.

    3. "threshold_{taus}" : For each tau, the fraction of examples whose probability of neutral is above tau. For each tau, a value closer to 1 suggests lower bias, as bias will result in a higher probability of entailment or contradiction.

reset

class NaturalLanguageInference(Metric):
 | ...
 | def reset(self)

AssociationWithoutGroundTruth

@Metric.register("association_without_ground_truth")
class AssociationWithoutGroundTruth(Metric):
 | def __init__(
 |     self,
 |     num_classes: int,
 |     num_protected_variable_labels: int,
 |     association_metric: str = "npmixy",
 |     gap_type: str = "ova"
 | ) -> None

Association without ground truth, from: Aka, O.; Burke, K.; Bäuerle, A.; Greer, C.; and Mitchell, M. 2021. Measuring model biases in the absence of ground truth. arXiv preprint arXiv:2103.03417.

Parameters

  • num_classes : int
    Number of classes.
  • num_protected_variable_labels : int
    Number of protected variable labels.
  • association_metric : str, optional (default = "npmixy")
    A generic association metric A(x, y), where x is an identity label and y is any other label. Examples include: nPMIxy ("npmixy"), nPMIy ("npmiy"), PMI^2 ("pmisq"), PMI ("pmi") Empirically, nPMIxy and nPMIy are more capable of capturing labels across a range of marginal frequencies.
  • gap_type : str, optional (default = "ova")
    Either one-vs-all ("ova") or pairwise ("pairwise"). One-vs-all gap is equivalent to A(x, y) - E[A(x', y)], where x' is in the set of all protected variable labels setminus {x}. Pairwise gaps are A(x, y) - A(x', y), for all x' in the set of all protected variable labels setminus {x}.

Note

Assumes integer predictions, with each item to be classified having a single correct class.

__call__

class AssociationWithoutGroundTruth(Metric):
 | ...
 | def __call__(
 |     self,
 |     predicted_labels: torch.Tensor,
 |     protected_variable_labels: torch.Tensor,
 |     mask: Optional[torch.BoolTensor] = None
 | ) -> None

Parameters

  • predicted_labels : torch.Tensor
    A tensor of predicted integer class labels of shape (batch_size, ...). Represented as Y.
  • protected_variable_labels : torch.Tensor
    A tensor of integer protected variable labels of shape (batch_size, ...). It must be the same shape as the predicted_labels tensor. Represented as X.
  • mask : torch.BoolTensor, optional (default = None)
    A tensor of the same shape as predicted_labels.

Note

All tensors are expected to be on the same device.

get_metric

class AssociationWithoutGroundTruth(Metric):
 | ...
 | def get_metric(
 |     self,
 |     reset: bool = False
 | ) -> Dict[int, Union[torch.Tensor, Dict[int, torch.Tensor]]]

Returns

  • gaps : Dict[int, Union[torch.FloatTensor, Dict[int, torch.FloatTensor]]]
    A dictionary mapping each protected variable label x to either:

    1. a tensor of the one-vs-all gaps (where the gap corresponding to prediction label i is at index i),

    2. another dictionary mapping protected variable labels x' to a tensor of the pairwise gaps (where the gap corresponding to prediction label i is at index i). A gap of nearly 0 implies fairness on the basis of Association in the Absence of Ground Truth.

Note

If a possible class label is not present in Y, the expected behavior is that the gaps corresponding to this class label are NaN. If a possible (class label, protected variable label) pair is not present in the joint of Y and X, the expected behavior is that the gap corresponding to this (class label, protected variable label) pair is NaN.

reset

class AssociationWithoutGroundTruth(Metric):
 | ...
 | def reset(self) -> None