util
allennlp_models.coref.util
make_coref_instance#
def make_coref_instance(
sentences: List[List[str]],
token_indexers: Dict[str, TokenIndexer],
max_span_width: int,
gold_clusters: Optional[List[List[Tuple[int, int]]]] = None,
wordpiece_modeling_tokenizer: PretrainedTransformerTokenizer = None,
max_sentences: int = None,
remove_singleton_clusters: bool = True
) -> Instance
Parameters¶
- sentences :
List[List[str]]
A list of lists representing the tokenised words and sentences in the document. - token_indexers :
Dict[str, TokenIndexer]
This is used to index the words in the document. SeeTokenIndexer
. - max_span_width :
int
The maximum width of candidate spans to consider. - gold_clusters :
Optional[List[List[Tuple[int, int]]]]
, optional (default =None
)
A list of all clusters in the document, represented as word spans with absolute indices in the entire document. Each cluster contains some number of spans, which can be nested and overlap. If there are exact matches between clusters, they will be resolved using_canonicalize_clusters
. - wordpiece_modeling_tokenizer :
PretrainedTransformerTokenizer
, optional (default =None
)
If not None, this dataset reader does subword tokenization using the supplied tokenizer and distribute the labels to the resulting wordpieces. All the modeling will be based on wordpieces. If this is set toFalse
(default), the user is expected to usePretrainedTransformerMismatchedIndexer
andPretrainedTransformerMismatchedEmbedder
, and the modeling will be on the word-level. - max_sentences :
int
, optional (default =None
)
The maximum number of sentences in each document to keep. By default keeps all sentences. - remove_singleton_clusters :
bool
, optional (default =True
)
Some datasets contain clusters that are singletons (i.e. no coreferents). This option allows the removal of them.
Returns¶
-
An
Instance
containing the followingFields
:text :
TextField
The text of the full document. spans :ListField[SpanField]
A ListField containing the spans represented asSpanFields
with respect to the document text. span_labels :SequenceLabelField
, optional The id of the cluster which each possible span belongs to, or -1 if it does not belong to a cluster. As these labels have variable length (it depends on how many spans we are considering), we represent this a as aSequenceLabelField
with respect to the spansListField
.