attacker
allennlp.interpret.attackers.attacker
Attacker#
class Attacker(Registrable):
| def __init__(self, predictor: Predictor) -> None
An Attacker
will modify an input (e.g., add or delete tokens) to try to change an AllenNLP
Predictor's output in a desired manner (e.g., make it incorrect).
initialize#
class Attacker(Registrable):
| ...
| def initialize(self)
Initializes any components of the Attacker that are expensive to compute, so that they are
not created on init(). Default implementation is pass
.
attack_from_json#
class Attacker(Registrable):
| ...
| def attack_from_json(
| self,
| inputs: JsonDict,
| input_field_to_attack: str,
| grad_input_field: str,
| ignore_tokens: List[str],
| target: JsonDict
| ) -> JsonDict
This function finds a modification to the input text that would change the model's prediction in some desired manner (e.g., an adversarial attack).
Parameters
- inputs :
JsonDict
The input you want to attack (the same as the argument to a Predictor, e.g., predict_json()). - input_field_to_attack :
str
The key in the inputs JsonDict you want to attack, e.g.,tokens
. - grad_input_field :
str
The field in the gradients dictionary that contains the input gradients. For example,grad_input_1
will be the field for single input tasks. See get_gradients() inPredictor
for more information on field names. - target :
JsonDict
If given, this is atargeted
attack, trying to change the prediction to a particular value, instead of just changing it from its original prediction. Subclasses are not required to accept this argument, as not all attacks make sense as targeted attacks. Perhaps that means we should make the API more crisp, but adding another class is not worth it.
Returns
- reduced_input :
JsonDict
Contains the final, sanitized input after adversarial modification.