Skip to content

multitask_epoch_sampler

allennlp.data.data_loaders.multitask_epoch_sampler

[SOURCE]


MultiTaskEpochSampler

class MultiTaskEpochSampler(Registrable)

A class that determines with what proportion each dataset should be sampled for a given epoch. This is used by the MultiTaskDataLoader. The main output of this class is the task proportion dictionary returned by get_task_proportions, which specifies what percentage of the instances for the current epoch should come from each dataset. To control this behavior as training progresses, there is an update_from_epoch_metrics method, which should be called from a Callback during training.

get_task_proportions

class MultiTaskEpochSampler(Registrable):
 | ...
 | def get_task_proportions(
 |     self,
 |     data_loaders: Mapping[str, DataLoader]
 | ) -> Dict[str, float]

Given a dictionary of DataLoaders for each dataset, returns what percentage of the instances for the current epoch of training should come from each dataset. The input dictionary could be used to determine how many datasets there are (e.g., for uniform sampling) or how big each dataset is (e.g., for sampling based on size), or it could be ignored entirely.

update_from_epoch_metrics

class MultiTaskEpochSampler(Registrable):
 | ...
 | def update_from_epoch_metrics(
 |     self,
 |     epoch_metrics: Dict[str, Any]
 | ) -> None

Some implementations of EpochSamplers change their behavior based on current epoch metrics. This method is meant to be called from a Callback, to let the sampler update its sampling proportions. If your sampling technique does not depend on epoch metrics, you do not need to implement this method.

UniformSampler

@MultiTaskEpochSampler.register("uniform")
class UniformSampler(MultiTaskEpochSampler)

Returns a uniform distribution over datasets at every epoch.

Registered as a MultiTaskEpochSampler with name "uniform".

get_task_proportions

class UniformSampler(MultiTaskEpochSampler):
 | ...
 | def get_task_proportions(
 |     self,
 |     data_loaders: Mapping[str, DataLoader]
 | ) -> Dict[str, float]

WeightedSampler

@MultiTaskEpochSampler.register("weighted")
class WeightedSampler(MultiTaskEpochSampler):
 | def __init__(self, weights: Dict[str, float])

Returns a weighted distribution over datasets at every epoch, where every task has a weight.

Registered as a MultiTaskEpochSampler with name "weighted".

get_task_proportions

class WeightedSampler(MultiTaskEpochSampler):
 | ...
 | def get_task_proportions(
 |     self,
 |     data_loaders: Mapping[str, DataLoader]
 | ) -> Dict[str, float]

ProportionalSampler

@MultiTaskEpochSampler.register("proportional")
class ProportionalSampler(MultiTaskEpochSampler)

Samples from every dataset according to its size. This will have essentially the same effect as using all of the data at every epoch, but it lets you control for number of instances per epoch, if you want to do that. This requires that all data loaders have a __len__ (which means no lazy loading). If you need this functionality with lazy loading, implement your own sampler that takes dataset sizes as a constructor parameter.

Registered as a MultiTaskEpochSampler with name "proportional".

get_task_proportions

class ProportionalSampler(MultiTaskEpochSampler):
 | ...
 | def get_task_proportions(
 |     self,
 |     data_loaders: Mapping[str, DataLoader]
 | ) -> Dict[str, float]