sentence_tagger

allennlp.predictors.sentence_tagger

SentenceTaggerPredictor#

@Predictor.register("sentence_tagger")
class SentenceTaggerPredictor(Predictor):
 | def __init__(
 |     self,
 |     model: Model,
 |     dataset_reader: DatasetReader,
 |     language: str = "en_core_web_sm"
 | ) -> None

Predictor for any model that takes in a sentence and returns a single set of tags for it. In particular, it can be used with the CrfTagger model and also the SimpleTagger model.

Registered as a Predictor with name "sentence_tagger".

predict#

class SentenceTaggerPredictor(Predictor):
 | ...
 | def predict(self, sentence: str) -> JsonDict

predictions_to_labeled_instances#

class SentenceTaggerPredictor(Predictor):
 | ...
 | @overrides
 | def predictions_to_labeled_instances(
 |     self,
 |     instance: Instance,
 |     outputs: Dict[str, numpy.ndarray]
 | ) -> List[Instance]

This function currently only handles BIOUL tags.

Imagine an NER model predicts three named entities (each one with potentially multiple tokens). For each individual entity, we create a new Instance that has the label set to only that entity and the rest of the tokens are labeled as outside. We then return a list of those Instances.

For example:

Mary  went to Seattle to visit Microsoft Research
U-Per  O    O   U-Loc  O   O     B-Org     L-Org

We create three instances.

Mary  went to Seattle to visit Microsoft Research
U-Per  O    O    O     O   O       O         O

Mary  went to Seattle to visit Microsoft Research
O      O    O   U-LOC  O   O       O         O

Mary  went to Seattle to visit Microsoft Research
O      O    O    O     O   O     B-Org     L-Org

We additionally add a flag to these instances to tell the model to only compute loss on non-O tags, so that we get gradients that are specific to the particular span prediction that each instance represents.