sharded_dataset_reader
allennlp.data.dataset_readers.sharded_dataset_reader
ShardedDatasetReader#
@DatasetReader.register("sharded")
class ShardedDatasetReader(DatasetReader):
| def __init__(self, base_reader: DatasetReader, **kwargs) -> None
Wraps another dataset reader and uses it to read from multiple input files.
Note that in this case the file_path
passed to read()
should either be a glob path
or a path or URL to an archive file ('.zip' or '.tar.gz').
The dataset reader will return instances from all files matching the glob, or all files within the archive.
The order the files are processed in is deterministic to enable the instances to be filtered according to worker rank in the distributed case.
Registered as a DatasetReader
with name "sharded".
This class accepts all additional parameters of any DatasetReader
class via **kwargs
.
We give priority to the values set in the constructor for the instance of this class.
Optionally, we will automatically inherit attributes from the base_reader
when required.
Parameters
- base_reader :
DatasetReader
Reader with a read method that accepts a single file.
text_to_instance#
class ShardedDatasetReader(DatasetReader):
| ...
| def text_to_instance(self, *args, **kwargs) -> Instance
Just delegate to the base reader text_to_instance.