interleaving_dataset_reader
allennlp.data.dataset_readers.interleaving_dataset_reader
InterleavingDatasetReader¶
@DatasetReader.register("interleaving")
class InterleavingDatasetReader(DatasetReader):
| def __init__(
| self,
| readers: Dict[str, DatasetReader],
| dataset_field_name: str = "dataset",
| scheme: str = "round_robin",
| **kwargs
| ) -> None
A DatasetReader
that wraps multiple other dataset readers,
and interleaves their instances, adding a MetadataField
to
indicate the provenance of each instance.
Unlike most of our other dataset readers, here the file_path
passed into
read()
should be a JSON-serialized dictionary with one file_path
per wrapped dataset reader (and with corresponding keys).
Registered as a DatasetReader
with name "interleaving".
Parameters¶
- readers :
Dict[str, DatasetReader]
The dataset readers to wrap. The keys of this dictionary will be used as the values in the MetadataField indicating provenance. - dataset_field_name :
str
, optional (default ="dataset"
)
The name of the MetadataField indicating which dataset an instance came from. - scheme :
str
, optional (default ="round_robin"
)
Indicates how to interleave instances. Currently the two options are "round_robin", which repeatedly cycles through the datasets grabbing one instance from each; and "all_at_once", which yields all the instances from the first dataset, then all the instances from the second dataset, and so on. You could imagine also implementing some sort of over- or under-sampling, although hasn't been done.
text_to_instance¶
class InterleavingDatasetReader(DatasetReader):
| ...
| def text_to_instance(
| self,
| dataset_key: str,
| *args,
| **kwargs
| ) -> Instance
apply_token_indexers¶
class InterleavingDatasetReader(DatasetReader):
| ...
| def apply_token_indexers(self, instance: Instance) -> None