data_loader

allennlp.data.data_loaders.data_loader

TensorDict¶

TensorDict = Dict[str, Union[torch.Tensor, Dict[str, torch.Tensor]]]

TensorDict is the type we use for batches.

class DataLoader(Registrable)

A DataLoader is responsible for generating batches of instances from a DatasetReader, or another source of data.

This is purely an abstract base class. All concrete subclasses must provide implementations of the following methods:

__iter__() that creates an iterable of TensorDicts,
iter_instances() that creates an iterable of Instances,
index_with() that should index the data with a vocabulary, and
set_target_device(), which updates the device that batch tensors should be put it when they are generated in __iter__().

Additionally, this class should also implement __len__() when possible.

The default implementation is MultiProcessDataLoader.

class DataLoader(Registrable):
 | ...
 | default_implementation = "multiprocess"

class DataLoader(Registrable):
 | ...
 | def __iter__(self) -> Iterator[TensorDict]

class DataLoader(Registrable):
 | ...
 | def iter_instances(self) -> Iterator[Instance]

class DataLoader(Registrable):
 | ...
 | def index_with(self, vocab: Vocabulary) -> None

class DataLoader(Registrable):
 | ...
 | def set_target_device(self, device: torch.device) -> None