boolq

allennlp_models.classification.dataset_readers.boolq

BoolQDatasetReader#

@DatasetReader.register("boolq")
class BoolQDatasetReader(DatasetReader):
 | def __init__(
 |     self,
 |     tokenizer: Tokenizer = None,
 |     token_indexers: Dict[str, TokenIndexer] = None,
 |     **kwargs
 | )

This DatasetReader is designed to read in the BoolQ data for binary QA task. It returns a dataset of instances with the following fields: The output of read is a list of Instance s with the fields: tokens : TextField and label : LabelField Registered as a DatasetReader with name "boolq".

Parameters¶

tokenizer : Tokenizer, optional (default = WhitespaceTokenizer())
Tokenizer to use to split the input sequences into words or other kinds of tokens.
token_indexers : Dict[str, TokenIndexer], optional (default = {"tokens": SingleIdTokenIndexer()})
We use this to define the input representation for the text. See TokenIndexer.

text_to_instance#

class BoolQDatasetReader(DatasetReader):
 | ...
 | @overrides
 | def text_to_instance(
 |     self,
 |     passage: str,
 |     question: str,
 |     label: Optional[bool] = None
 | ) -> Instance

We take the passage and the question as input, tokenize and concat them.

Parameters¶

passage : str
The passage in a given BoolQ record.
question : str
The passage in a given BoolQ record.
label : bool, optional (default = None)
The label for the passage and the question.

Returns¶

An Instance containing the following fields:
tokens : TextField The tokens in the concatenation of the passage and the question. label : LabelField The answer to the question.

apply_token_indexers#

class BoolQDatasetReader(DatasetReader):
 | ...
 | def apply_token_indexers(self, instance: Instance) -> None