Skip to content





class BoolQDatasetReader(DatasetReader):
 | def __init__(
 |     self,
 |     tokenizer: Tokenizer = None,
 |     token_indexers: Dict[str, TokenIndexer] = None,
 |     **kwargs
 | )

This DatasetReader is designed to read in the BoolQ data for binary QA task. It returns a dataset of instances with the following fields: The output of read is a list of Instance s with the fields: tokens : TextField and label : LabelField Registered as a DatasetReader with name "boolq".


  • tokenizer : Tokenizer, optional (default = WhitespaceTokenizer())
    Tokenizer to use to split the input sequences into words or other kinds of tokens.
  • token_indexers : Dict[str, TokenIndexer], optional (default = {"tokens": SingleIdTokenIndexer()})
    We use this to define the input representation for the text. See TokenIndexer.


class BoolQDatasetReader(DatasetReader):
 | ...
 | def text_to_instance(
 |     self,
 |     passage: str,
 |     question: str,
 |     label: Optional[bool] = None
 | ) -> Instance

We take the passage and the question as input, tokenize and concat them.


  • passage : str
    The passage in a given BoolQ record.
  • question : str
    The passage in a given BoolQ record.
  • label : bool, optional (default = None)
    The label for the passage and the question.


  • An Instance containing the following fields:
    tokens : TextField The tokens in the concatenation of the passage and the question. label : LabelField The answer to the question.


class BoolQDatasetReader(DatasetReader):
 | ...
 | def apply_token_indexers(self, instance: Instance) -> None