Skip to content

data_loader

allennlp.data.data_loaders.data_loader

[SOURCE]


TensorDict

TensorDict = Dict[str, Union[torch.Tensor, Dict[str, torch.Tensor]]]

TensorDict is the type we use for batches.

DataLoader

class DataLoader(Registrable)

A DataLoader is responsible for generating batches of instances from a DatasetReader, or another source of data.

This is purely an abstract base class. All concrete subclasses must provide implementations of the following methods:

  • __iter__() that creates an iterable of TensorDicts,
  • iter_instances() that creates an iterable of Instances,
  • index_with() that should index the data with a vocabulary, and
  • set_target_device(), which updates the device that batch tensors should be put it when they are generated in __iter__().

Additionally, this class should also implement __len__() when possible.

The default implementation is MultiProcessDataLoader.

default_implementation

class DataLoader(Registrable):
 | ...
 | default_implementation = "multiprocess"

__iter__

class DataLoader(Registrable):
 | ...
 | def __iter__(self) -> Iterator[TensorDict]

iter_instances

class DataLoader(Registrable):
 | ...
 | def iter_instances(self) -> Iterator[Instance]

index_with

class DataLoader(Registrable):
 | ...
 | def index_with(self, vocab: Vocabulary) -> None

set_target_device

class DataLoader(Registrable):
 | ...
 | def set_target_device(self, device: torch.device) -> None