text_field

allennlp.data.fields.text_field

A TextField represents a string of text, the kind that you might want to represent with standard word vectors, or pass through an LSTM.

TextFieldTensors¶

TextFieldTensors = Dict[str, Dict[str, torch.Tensor]]

TextField¶

class TextField(SequenceField[TextFieldTensors]):
 | def __init__(
 |     self,
 |     tokens: List[Token],
 |     token_indexers: Optional[Dict[str, TokenIndexer]] = None
 | ) -> None

This Field represents a list of string tokens. Before constructing this object, you need to tokenize raw strings using a Tokenizer.

Because string tokens can be represented as indexed arrays in a number of ways, we also take a dictionary of TokenIndexer objects that will be used to convert the tokens into indices. Each TokenIndexer could represent each token as a single ID, or a list of character IDs, or something else.

This field will get converted into a dictionary of arrays, one for each TokenIndexer. A SingleIdTokenIndexer produces an array of shape (num_tokens,), while a TokenCharactersIndexer produces an array of shape (num_tokens, num_characters).

token_indexers¶

class TextField(SequenceField[TextFieldTensors]):
 | ...
 | @property
 | def token_indexers(self) -> Dict[str, TokenIndexer]

token_indexers¶

class TextField(SequenceField[TextFieldTensors]):
 | ...
 | @token_indexers.setter
 | def token_indexers(
 |     self,
 |     token_indexers: Dict[str, TokenIndexer]
 | ) -> None

count_vocab_items¶

class TextField(SequenceField[TextFieldTensors]):
 | ...
 | def count_vocab_items(self, counter: Dict[str, Dict[str, int]])

index¶

class TextField(SequenceField[TextFieldTensors]):
 | ...
 | def index(self, vocab: Vocabulary)

get_padding_lengths¶

class TextField(SequenceField[TextFieldTensors]):
 | ...
 | def get_padding_lengths(self) -> Dict[str, int]

The TextField has a list of Tokens, and each Token gets converted into arrays by (potentially) several TokenIndexers. This method gets the max length (over tokens) associated with each of these arrays.

sequence_length¶

class TextField(SequenceField[TextFieldTensors]):
 | ...
 | def sequence_length(self) -> int

as_tensor¶

class TextField(SequenceField[TextFieldTensors]):
 | ...
 | def as_tensor(
 |     self,
 |     padding_lengths: Dict[str, int]
 | ) -> TextFieldTensors

empty_field¶

class TextField(SequenceField[TextFieldTensors]):
 | ...
 | def empty_field(self)

batch_tensors¶

class TextField(SequenceField[TextFieldTensors]):
 | ...
 | def batch_tensors(
 |     self,
 |     tensor_list: List[TextFieldTensors]
 | ) -> TextFieldTensors

iter¶

class TextField(SequenceField[TextFieldTensors]):
 | ...
 | def __iter__(self) -> Iterator[Token]

duplicate¶

class TextField(SequenceField[TextFieldTensors]):
 | ...
 | def duplicate(self)

Overrides the behavior of duplicate so that self._token_indexers won't actually be deep-copied.

Not only would it be extremely inefficient to deep-copy the token indexers, but it also fails in many cases since some tokenizers (like those used in the 'transformers' lib) cannot actually be deep-copied.

human_readable_repr¶

class TextField(SequenceField[TextFieldTensors]):
 | ...
 | def human_readable_repr(self) -> List[str]

text_field

TextFieldTensors¶

TextField¶

token_indexers¶

token_indexers¶

count_vocab_items¶

index¶

get_padding_lengths¶

sequence_length¶

as_tensor¶

empty_field¶

batch_tensors¶

__iter__¶

duplicate¶

human_readable_repr¶

iter¶