Attacker(self, predictor:allennlp.predictors.predictor.Predictor) -> None
Attacker will modify an input (e.g., add or delete tokens) to try to change an AllenNLP
Predictor's output in a desired manner (e.g., make it incorrect).
Attacker.attack_from_json( self, inputs: Dict[str, Any], input_field_to_attack: str, grad_input_field: str, ignore_tokens: List[str], target: Dict[str, Any], ) -> Dict[str, Any]
This function finds a modification to the input text that would change the model's prediction in some desired manner (e.g., an adversarial attack).
- inputs :
JsonDictThe input you want to attack (the same as the argument to a Predictor, e.g., predict_json()).
- input_field_to_attack :
strThe key in the inputs JsonDict you want to attack, e.g.,
- grad_input_field :
strThe field in the gradients dictionary that contains the input gradients. For example,
grad_input_1will be the field for single input tasks. See get_gradients() in
Predictorfor more information on field names.
- target :
JsonDictIf given, this is a
targetedattack, trying to change the prediction to a particular value, instead of just changing it from its original prediction. Subclasses are not required to accept this argument, as not all attacks make sense as targeted attacks. Perhaps that means we should make the API more crisp, but adding another class is not worth it.
Contains the final, sanitized input after adversarial modification.
Initializes any components of the Attacker that are expensive to compute, so that they are
not created on init(). Default implementation is