class MutableMapping[str,])[source]

Bases:, typing.Generic

An Instance is a collection of Field objects, specifying the inputs and outputs to some model. We don’t make a distinction between inputs and outputs here, though - all operations are done on all fields, and when we return arrays, we return them as dictionaries keyed by field name. A model can then decide which fields it wants to use as inputs as which as outputs.

The Fields in an Instance can start out either indexed or un-indexed. During the data processing pipeline, all fields will be indexed, after which multiple instances can be combined into a Batch and then converted into padded arrays.

fieldsDict[str, Field]

The Field objects that will be used to produce data arrays for this instance.

add_field(self, field_name: str, field:, vocab: = None) → None[source]

Add the field to the existing fields mapping. If we have already indexed the Instance, then we also index field, so it is necessary to supply the vocab.

as_tensor_dict(self, padding_lengths: Dict[str, Dict[str, int]] = None) → Dict[str, ~DataArray][source]

Pads each Field in this instance to the lengths given in padding_lengths (which is keyed by field name, then by padding key, the same as the return value in get_padding_lengths()), returning a list of torch tensors for each field.

If padding_lengths is omitted, we will call self.get_padding_lengths() to get the sizes of the tensors to create.

count_vocab_items(self, counter: Dict[str, Dict[str, int]])[source]

Increments counts in the given counter for all of the vocabulary items in all of the Fields in this Instance.

get_padding_lengths(self) → Dict[str, Dict[str, int]][source]

Returns a dictionary of padding lengths, keyed by field name. Each Field returns a mapping from padding keys to actual lengths, and we just key that dictionary by field name.

index_fields(self, vocab: → None[source]

Indexes all fields in this Instance using the provided Vocabulary. This mutates the current object, it does not return a new Instance. A DataIterator will call this on each pass through a dataset; we use the indexed flag to make sure that indexing only happens once.

This means that if for some reason you modify your vocabulary after you’ve indexed your instances, you might get unexpected behavior.