def make_coref_instance( sentences: List[List[str]], token_indexers: Dict[str, TokenIndexer], max_span_width: int, gold_clusters: Optional[List[List[Tuple[int, int]]]] = None, wordpiece_modeling_tokenizer: PretrainedTransformerTokenizer = None, max_sentences: int = None, remove_singleton_clusters: bool = True ) -> Instance
- sentences :
A list of lists representing the tokenised words and sentences in the document.
- token_indexers :
This is used to index the words in the document. See
- max_span_width :
The maximum width of candidate spans to consider.
- gold_clusters :
Optional[List[List[Tuple[int, int]]]], optional (default =
A list of all clusters in the document, represented as word spans with absolute indices in the entire document. Each cluster contains some number of spans, which can be nested and overlap. If there are exact matches between clusters, they will be resolved using
- wordpiece_modeling_tokenizer :
PretrainedTransformerTokenizer, optional (default =
If not None, this dataset reader does subword tokenization using the supplied tokenizer and distribute the labels to the resulting wordpieces. All the modeling will be based on wordpieces. If this is set to
False(default), the user is expected to use
PretrainedTransformerMismatchedEmbedder, and the modeling will be on the word-level.
- max_sentences :
int, optional (default =
The maximum number of sentences in each document to keep. By default keeps all sentences.
- remove_singleton_clusters :
bool, optional (default =
Some datasets contain clusters that are singletons (i.e. no coreferents). This option allows the removal of them.
Instancecontaining the following
TextFieldThe text of the full document. spans :
ListField[SpanField]A ListField containing the spans represented as
SpanFieldswith respect to the document text. span_labels :
SequenceLabelField, optional The id of the cluster which each possible span belongs to, or -1 if it does not belong to a cluster. As these labels have variable length (it depends on how many spans we are considering), we represent this a as a
SequenceLabelFieldwith respect to the spans