text_field
[ allennlp.data.fields.text_field ]
A TextField represents a string of text, the kind that you might want to represent with
standard word vectors, or pass through an LSTM.
TextFieldTensors#
TextFieldTensors = Dict[str, Dict[str, torch.Tensor]]
TextField#
class TextField(SequenceField[TextFieldTensors]):
 | def __init__(
 |     self,
 |     tokens: List[Token],
 |     token_indexers: Dict[str, TokenIndexer]
 | ) -> None
This Field represents a list of string tokens.  Before constructing this object, you need
to tokenize raw strings using a Tokenizer.
Because string tokens can be represented as indexed arrays in a number of ways, we also take a
dictionary of TokenIndexer
objects that will be used to convert the tokens into indices.
Each TokenIndexer could represent each token as a single ID, or a list of character IDs, or
something else.
This field will get converted into a dictionary of arrays, one for each TokenIndexer.  A
SingleIdTokenIndexer produces an array of shape (num_tokens,), while a
TokenCharactersIndexer produces an array of shape (num_tokens, num_characters).
count_vocab_items#
class TextField(SequenceField[TextFieldTensors]):
 | ...
 | @overrides
 | def count_vocab_items(self, counter: Dict[str, Dict[str, int]])
index#
class TextField(SequenceField[TextFieldTensors]):
 | ...
 | @overrides
 | def index(self, vocab: Vocabulary)
get_padding_lengths#
class TextField(SequenceField[TextFieldTensors]):
 | ...
 | @overrides
 | def get_padding_lengths(self) -> Dict[str, int]
The TextField has a list of Tokens, and each Token gets converted into arrays by
(potentially) several TokenIndexers.  This method gets the max length (over tokens)
associated with each of these arrays.
sequence_length#
class TextField(SequenceField[TextFieldTensors]):
 | ...
 | @overrides
 | def sequence_length(self) -> int
as_tensor#
class TextField(SequenceField[TextFieldTensors]):
 | ...
 | @overrides
 | def as_tensor(
 |     self,
 |     padding_lengths: Dict[str, int]
 | ) -> TextFieldTensors
empty_field#
class TextField(SequenceField[TextFieldTensors]):
 | ...
 | @overrides
 | def empty_field(self)
batch_tensors#
class TextField(SequenceField[TextFieldTensors]):
 | ...
 | @overrides
 | def batch_tensors(
 |     self,
 |     tensor_list: List[TextFieldTensors]
 | ) -> TextFieldTensors
This is creating a dict of {token_indexer_name: {token_indexer_outputs: batched_tensor}} for each token indexer used to index this field.
duplicate#
class TextField(SequenceField[TextFieldTensors]):
 | ...
 | @overrides
 | def duplicate(self)
Overrides the behavior of duplicate so that self._token_indexers won't
actually be deep-copied.
Not only would it be extremely inefficient to deep-copy the token indexers, but it also fails in many cases since some tokenizers (like those used in the 'transformers' lib) cannot actually be deep-copied.