instance

[ allennlp.data.instance ]

Instance Objects#

class Instance(Mapping[str, Field]):
 | def __init__(self, fields: MutableMapping[str, Field]) -> None

An Instance is a collection of Field objects, specifying the inputs and outputs to some model. We don't make a distinction between inputs and outputs here, though - all operations are done on all fields, and when we return arrays, we return them as dictionaries keyed by field name. A model can then decide which fields it wants to use as inputs as which as outputs.

The Fields in an Instance can start out either indexed or un-indexed. During the data processing pipeline, all fields will be indexed, after which multiple instances can be combined into a Batch and then converted into padded arrays.

Parameters

fields : Dict[str, Field]
The Field objects that will be used to produce data arrays for this instance.

add_field#

 | def add_field(
 |     self,
 |     field_name: str,
 |     field: Field,
 |     vocab: Vocabulary = None
 | ) -> None

Add the field to the existing fields mapping. If we have already indexed the Instance, then we also index field, so it is necessary to supply the vocab.

count_vocab_items#

 | def count_vocab_items(self, counter: Dict[str, Dict[str, int]])

Increments counts in the given counter for all of the vocabulary items in all of the Fields in this Instance.

index_fields#

 | def index_fields(self, vocab: Vocabulary) -> None

Indexes all fields in this Instance using the provided Vocabulary. This mutates the current object, it does not return a new Instance. A DataLoader will call this on each pass through a dataset; we use the indexed flag to make sure that indexing only happens once.

This means that if for some reason you modify your vocabulary after you've indexed your instances, you might get unexpected behavior.

get_padding_lengths#

 | def get_padding_lengths(self) -> Dict[str, Dict[str, int]]

Returns a dictionary of padding lengths, keyed by field name. Each Field returns a mapping from padding keys to actual lengths, and we just key that dictionary by field name.

as_tensor_dict#

 | def as_tensor_dict(
 |     self,
 |     padding_lengths: Dict[str, Dict[str, int]] = None
 | ) -> Dict[str, DataArray]

Pads each Field in this instance to the lengths given in padding_lengths (which is keyed by field name, then by padding key, the same as the return value in get_padding_lengths), returning a list of torch tensors for each field.

If padding_lengths is omitted, we will call self.get_padding_lengths() to get the sizes of the tensors to create.

duplicate#

 | def duplicate(self) -> "Instance"