instance
allennlp.data.instance
Instance#
class Instance(Mapping[str, Field]):
| def __init__(self, fields: MutableMapping[str, Field]) -> None
An Instance
is a collection of Field
objects,
specifying the inputs and outputs to
some model. We don't make a distinction between inputs and outputs here, though - all
operations are done on all fields, and when we return arrays, we return them as dictionaries
keyed by field name. A model can then decide which fields it wants to use as inputs as which
as outputs.
The Fields
in an Instance
can start out either indexed or un-indexed. During the data
processing pipeline, all fields will be indexed, after which multiple instances can be combined
into a Batch
and then converted into padded arrays.
Parameters
- fields :
Dict[str, Field]
TheField
objects that will be used to produce data arrays for this instance.
add_field#
class Instance(Mapping[str, Field]):
| ...
| def add_field(
| self,
| field_name: str,
| field: Field,
| vocab: Vocabulary = None
| ) -> None
Add the field to the existing fields mapping.
If we have already indexed the Instance, then we also index field
, so
it is necessary to supply the vocab.
count_vocab_items#
class Instance(Mapping[str, Field]):
| ...
| def count_vocab_items(self, counter: Dict[str, Dict[str, int]])
Increments counts in the given counter
for all of the vocabulary items in all of the
Fields
in this Instance
.
index_fields#
class Instance(Mapping[str, Field]):
| ...
| def index_fields(self, vocab: Vocabulary) -> None
Indexes all fields in this Instance
using the provided Vocabulary
.
This mutates
the current object, it does not return a new Instance
.
A DataLoader
will call this on each pass through a dataset; we use the indexed
flag to make sure that indexing only happens once.
This means that if for some reason you modify your vocabulary after you've indexed your instances, you might get unexpected behavior.
get_padding_lengths#
class Instance(Mapping[str, Field]):
| ...
| def get_padding_lengths(self) -> Dict[str, Dict[str, int]]
Returns a dictionary of padding lengths, keyed by field name. Each Field
returns a
mapping from padding keys to actual lengths, and we just key that dictionary by field name.
as_tensor_dict#
class Instance(Mapping[str, Field]):
| ...
| def as_tensor_dict(
| self,
| padding_lengths: Dict[str, Dict[str, int]] = None
| ) -> Dict[str, DataArray]
Pads each Field
in this instance to the lengths given in padding_lengths
(which is
keyed by field name, then by padding key, the same as the return value in
get_padding_lengths
), returning a list of torch tensors for each field.
If padding_lengths
is omitted, we will call self.get_padding_lengths()
to get the
sizes of the tensors to create.
duplicate#
class Instance(Mapping[str, Field]):
| ...
| def duplicate(self) -> "Instance"