crf_tagger
allennlp_models.tagging.models.crf_tagger
CrfTagger#
@Model.register("crf_tagger")
class CrfTagger(Model):
| def __init__(
| self,
| vocab: Vocabulary,
| text_field_embedder: TextFieldEmbedder,
| encoder: Seq2SeqEncoder,
| label_namespace: str = "labels",
| feedforward: Optional[FeedForward] = None,
| label_encoding: Optional[str] = None,
| include_start_end_transitions: bool = True,
| constrain_crf_decoding: bool = None,
| calculate_span_f1: bool = None,
| dropout: Optional[float] = None,
| verbose_metrics: bool = False,
| initializer: InitializerApplicator = InitializerApplicator(),
| top_k: int = 1,
| ignore_loss_on_o_tags: bool = False,
| label_weights: Optional[Dict[str, float]] = None,
| weight_strategy: str = None,
| **kwargs
| ) -> None
The CrfTagger
encodes a sequence of text with a Seq2SeqEncoder
,
then uses a Conditional Random Field model to predict a tag for each token in the sequence.
Registered as a Model
with name "crf_tagger".
Parameters¶
- vocab :
Vocabulary
A Vocabulary, required in order to compute sizes for input/output projections. - text_field_embedder :
TextFieldEmbedder
Used to embed the tokensTextField
we get as input to the model. - encoder :
Seq2SeqEncoder
The encoder that we will use in between embedding tokens and predicting output tags. - label_namespace :
str
, optional (default =labels
)
This is needed to compute the SpanBasedF1Measure metric. Unless you did something unusual, the default value should be what you want. - feedforward :
FeedForward
, optional (default =None
)
An optional feedforward layer to apply after the encoder. - label_encoding :
str
, optional (default =None
)
Label encoding to use when calculating span f1 and constraining the CRF at decoding time . Valid options are "BIO", "BIOUL", "IOB1", "BMES". Required ifcalculate_span_f1
orconstrain_crf_decoding
is true. - include_start_end_transitions :
bool
, optional (default =True
)
Whether to include start and end transition parameters in the CRF. - constrain_crf_decoding :
bool
, optional (default =None
)
IfTrue
, the CRF is constrained at decoding time to produce valid sequences of tags. If this isTrue
, thenlabel_encoding
is required. IfNone
and label_encoding is specified, this is set toTrue
. IfNone
and label_encoding is not specified, it defaults toFalse
. - calculate_span_f1 :
bool
, optional (default =None
)
Calculate span-level F1 metrics during training. If this isTrue
, thenlabel_encoding
is required. IfNone
and label_encoding is specified, this is set toTrue
. IfNone
and label_encoding is not specified, it defaults toFalse
. - dropout :
float
, optional (default =None
)
Dropout probability. - verbose_metrics :
bool
, optional (default =False
)
If true, metrics will be returned per label class in addition to the overall statistics. - initializer :
InitializerApplicator
, optional (default =InitializerApplicator()
)
Used to initialize the model parameters. - top_k :
int
, optional (default =1
)
If provided, the number of parses to return from the crf in output_dict['top_k_tags']. Top k parses are returned as a list of dicts, where each dictionary is of the form: {"tags": List, "score": float}. The "tags" value for the first dict in the list for each data_item will be the top choice, and will equal the corresponding item in output_dict['tags'] - ignore_loss_on_o_tags :
bool
, optional (default =False
)
If True, we compute the loss only for actual spans intags
, and not onO
tokens. This is useful for computing gradients of the loss on a single span, for interpretation / attacking. - label_weights :
Dict[str, float]
, optional (default =None
)
A mapping {label : weight} to be used in the loss function in order to give different weights for each token depending on its label. This is useful to deal with highly unbalanced datasets. There are three available strategies to deal with weighted labels (see below). The default strategy is "emission". - weight_strategy :
str
, optional (default =None
)
Iflabel_weights
is given and this isNone
, then it is the same as "emission". It indicates which strategy is used to sample weighting. Valid options are: "emission", "emission_transition", "lannoy". If "emission" then the emission score of each tag is multiplied by the corresponding weight (as given bylabel_weights
). If "emission_transition", both emission and transition scores of each tag are multiplied by the corresponding weight. In this case, a transition scoret(i,j)
, between consecutive tokensi
andj
, is multiplied byw[tags[i]]
, i.e., the weight related to the tag of tokeni
. Ifweight_strategy
is "lannoy" then we use the strategy proposed by Lannoy et al. (2019). You can see an experimental comparison among these three strategies and a brief discussion of their differences here.
forward#
class CrfTagger(Model):
| ...
| def forward(
| self,
| tokens: TextFieldTensors,
| tags: torch.LongTensor = None,
| metadata: List[Dict[str, Any]] = None,
| ignore_loss_on_o_tags: Optional[bool] = None,
| **kwargs
| ) -> Dict[str, torch.Tensor]
Parameters¶
- tokens :
TextFieldTensors
The output ofTextField.as_array()
, which should typically be passed directly to aTextFieldEmbedder
. This output is a dictionary mapping keys toTokenIndexer
tensors. At its most basic, using aSingleIdTokenIndexer
this is :{"tokens": Tensor(batch_size, num_tokens)}
. This dictionary will have the same keys as were used for theTokenIndexers
when you created theTextField
representing your sequence. The dictionary is designed to be passed directly to aTextFieldEmbedder
, which knows how to combine different word representations into a single vector per token in your input. - tags :
torch.LongTensor
, optional (default =None
)
A torch tensor representing the sequence of integer gold class labels of shape(batch_size, num_tokens)
. - metadata :
List[Dict[str, Any]]
, optional (default =None
)
metadata containing the original words in the sentence to be tagged under a 'words' key. - ignore_loss_on_o_tags :
Optional[bool]
, optional (default =None
)
If True, we compute the loss only for actual spans intags
, and not onO
tokens. This is useful for computing gradients of the loss on a single span, for interpretation / attacking. IfNone
,self.ignore_loss_on_o_tags
is used instead.
Returns¶
-
An output dictionary consisting of:
-
logits :
torch.FloatTensor
The logits that are the output of thetag_projection_layer
- mask :
torch.BoolTensor
The text field mask for the input tokens - tags :
List[List[int]]
The predicted tags using the Viterbi algorithm. - loss :
torch.FloatTensor
, optional
A scalar loss to be optimised. Only computed if gold labeltags
are provided.
make_output_human_readable#
class CrfTagger(Model):
| ...
| def make_output_human_readable(
| self,
| output_dict: Dict[str, torch.Tensor]
| ) -> Dict[str, torch.Tensor]
Converts the tag ids to the actual tags.
output_dict["tags"]
is a list of lists of tag_ids,
so we use an ugly nested list comprehension.
get_metrics#
class CrfTagger(Model):
| ...
| def get_metrics(self, reset: bool = False) -> Dict[str, float]
default_predictor#
class CrfTagger(Model):
| ...
| default_predictor = "sentence_tagger"