class Instance(Mapping[str, Field]): | def __init__(self, fields: MutableMapping[str, Field]) -> None
Instance is a collection of
specifying the inputs and outputs to
some model. We don't make a distinction between inputs and outputs here, though - all
operations are done on all fields, and when we return arrays, we return them as dictionaries
keyed by field name. A model can then decide which fields it wants to use as inputs as which
Fields in an
Instance can start out either indexed or un-indexed. During the data
processing pipeline, all fields will be indexed, after which multiple instances can be combined
Batch and then converted into padded arrays.
- fields :
Fieldobjects that will be used to produce data arrays for this instance.
class Instance(Mapping[str, Field]): | ... | def add_field( | self, | field_name: str, | field: Field, | vocab: Vocabulary = None | ) -> None
Add the field to the existing fields mapping.
If we have already indexed the Instance, then we also index
it is necessary to supply the vocab.
class Instance(Mapping[str, Field]): | ... | def count_vocab_items(self, counter: Dict[str, Dict[str, int]])
Increments counts in the given
counter for all of the vocabulary items in all of the
Fields in this
class Instance(Mapping[str, Field]): | ... | def index_fields(self, vocab: Vocabulary) -> None
Indexes all fields in this
Instance using the provided
mutates the current object, it does not return a new
DataLoader will call this on each pass through a dataset; we use the
flag to make sure that indexing only happens once.
This means that if for some reason you modify your vocabulary after you've indexed your instances, you might get unexpected behavior.
class Instance(Mapping[str, Field]): | ... | def get_padding_lengths(self) -> Dict[str, Dict[str, int]]
Returns a dictionary of padding lengths, keyed by field name. Each
Field returns a
mapping from padding keys to actual lengths, and we just key that dictionary by field name.
class Instance(Mapping[str, Field]): | ... | def as_tensor_dict( | self, | padding_lengths: Dict[str, Dict[str, int]] = None | ) -> Dict[str, DataArray]
Field in this instance to the lengths given in
padding_lengths (which is
keyed by field name, then by padding key, the same as the return value in
get_padding_lengths), returning a list of torch tensors for each field.
padding_lengths is omitted, we will call
self.get_padding_lengths() to get the
sizes of the tensors to create.
class Instance(Mapping[str, Field]): | ... | def duplicate(self) -> "Instance"