data_loader
allennlp.data.data_loaders.data_loader
TensorDict¶
TensorDict = Dict[str, Union[torch.Tensor, Dict[str, torch.Tensor]]]
TensorDict
is the type we use for batches.
DataLoader¶
class DataLoader(Registrable)
A DataLoader
is responsible for generating batches of instances from a
DatasetReader
,
or another source of data.
This is purely an abstract base class. All concrete subclasses must provide implementations of the following methods:
__iter__()
that creates an iterable ofTensorDict
s,iter_instances()
that creates an iterable ofInstance
s,index_with()
that should index the data with a vocabulary, andset_target_device()
, which updates the device that batch tensors should be put it when they are generated in__iter__()
.
Additionally, this class should also implement __len__()
when possible.
The default implementation is
MultiProcessDataLoader
.
default_implementation¶
class DataLoader(Registrable):
| ...
| default_implementation = "multiprocess"
__iter__¶
class DataLoader(Registrable):
| ...
| def __iter__(self) -> Iterator[TensorDict]
iter_instances¶
class DataLoader(Registrable):
| ...
| def iter_instances(self) -> Iterator[Instance]
index_with¶
class DataLoader(Registrable):
| ...
| def index_with(self, vocab: Vocabulary) -> None
set_target_device¶
class DataLoader(Registrable):
| ...
| def set_target_device(self, device: torch.device) -> None