Skip to content





class QuACReader(DatasetReader):
 | def __init__(
 |     self,
 |     tokenizer: Tokenizer = None,
 |     token_indexers: Dict[str, TokenIndexer] = None,
 |     num_context_answers: int = 0,
 |     **kwargs
 | ) -> None

Reads a JSON-formatted Question Answering in Context (QuAC) data file and returns a Dataset where the Instances have four fields: question, a ListField, passage, another TextField, and span_start and span_end, both ListField composed of IndexFieldsinto thepassageTextField. TwoListField, composed ofLabelField,yesno_listandfollowup_listis added. We also add aMetadataFieldthat stores the instance's ID, the original passage text, gold answer strings, and token offsets into the original passage, accessible asmetadata['id'],metadata['original_passage'],metadata['answer_text_lists'] and metadata['token_offsets'].


  • tokenizer : Tokenizer, optional (default = SpacyTokenizer())
    We use this Tokenizer for both the question and the passage. See Tokenizer. Default is SpacyTokenizer().
  • token_indexers : Dict[str, TokenIndexer], optional
    We similarly use this for both the question and the passage. See TokenIndexer. Default is {"tokens": SingleIdTokenIndexer()}.
  • num_context_answers : int, optional
    How many previous question answers to consider in a context.


class QuACReader(DatasetReader):
 | ...
 | def text_to_instance(
 |     self,
 |     question_text_list: List[str],
 |     passage_text: str,
 |     start_span_list: List[List[int]] = None,
 |     end_span_list: List[List[int]] = None,
 |     passage_tokens: List[Token] = None,
 |     yesno_list: Union[List[int], List[str]] = None,
 |     followup_list: Union[List[int], List[str]] = None,
 |     additional_metadata: Dict[str, Any] = None
 | ) -> Instance

We need to convert character indices in passage_text to token indices in passage_tokens, as the latter is what we'll actually use for supervision.