allennlp.data.dataset_readers.interleaving_dataset_reader¶
-
class
allennlp.data.dataset_readers.interleaving_dataset_reader.
InterleavingDatasetReader
(readers: Dict[str, allennlp.data.dataset_readers.dataset_reader.DatasetReader], dataset_field_name: str = 'dataset', scheme: str = 'round_robin', lazy: bool = False)[source]¶ Bases:
allennlp.data.dataset_readers.dataset_reader.DatasetReader
A
DatasetReader
that wraps multiple other dataset readers, and interleaves their instances, adding aMetadataField
to indicate the provenance of each instance.Unlike most of our other dataset readers, here the
file_path
passed intoread()
should be a JSON-serialized dictionary with one file_path per wrapped dataset reader (and with corresponding keys).- Parameters
- readers
Dict[str, DatasetReader]
The dataset readers to wrap. The keys of this dictionary will be used as the values in the MetadataField indicating provenance.
- dataset_field_namestr, optional (default = “dataset”)
The name of the MetadataField indicating which dataset an instance came from.
- schemestr, optional (default = “round_robin”)
Indicates how to interleave instances. Currently the two options are “round_robin”, which repeatedly cycles through the datasets grabbing one instance from each; and “all_at_once”, which yields all the instances from the first dataset, then all the instances from the second dataset, and so on. You could imagine also implementing some sort of over- or under-sampling, although hasn’t been done.
- lazybool, optional (default = False)
If this is true,
instances()
will return an object whose__iter__
method reloads the dataset each time it’s called. Otherwise,instances()
returns a list.
- readers
-
text_to_instance
(self) → allennlp.data.instance.Instance[source]¶ Does whatever tokenization or processing is necessary to go from textual input to an
Instance
. The primary intended use for this is with aPredictor
, which gets text input as a JSON object and needs to process it to be input to a model.The intent here is to share code between
_read()
and what happens at model serving time, or any other time you want to make a prediction from new data. We need to process the data in the same way it was done at training time. Allowing theDatasetReader
to process new text lets us accomplish this, as we can just callDatasetReader.text_to_instance
when serving predictions.The input type here is rather vaguely specified, unfortunately. The
Predictor
will have to make some assumptions about the kind ofDatasetReader
that it’s using, in order to pass it the right information.