text_field
allennlp.data.fields.text_field
A TextField
represents a string of text, the kind that you might want to represent with
standard word vectors, or pass through an LSTM.
TextFieldTensors¶
TextFieldTensors = Dict[str, Dict[str, torch.Tensor]]
TextField¶
class TextField(SequenceField[TextFieldTensors]):
| def __init__(
| self,
| tokens: List[Token],
| token_indexers: Optional[Dict[str, TokenIndexer]] = None
| ) -> None
This Field
represents a list of string tokens. Before constructing this object, you need
to tokenize raw strings using a Tokenizer
.
Because string tokens can be represented as indexed arrays in a number of ways, we also take a
dictionary of TokenIndexer
objects that will be used to convert the tokens into indices.
Each TokenIndexer
could represent each token as a single ID, or a list of character IDs, or
something else.
This field will get converted into a dictionary of arrays, one for each TokenIndexer
. A
SingleIdTokenIndexer
produces an array of shape (num_tokens,), while a
TokenCharactersIndexer
produces an array of shape (num_tokens, num_characters).
token_indexers¶
class TextField(SequenceField[TextFieldTensors]):
| ...
| @property
| def token_indexers(self) -> Dict[str, TokenIndexer]
token_indexers¶
class TextField(SequenceField[TextFieldTensors]):
| ...
| @token_indexers.setter
| def token_indexers(
| self,
| token_indexers: Dict[str, TokenIndexer]
| ) -> None
count_vocab_items¶
class TextField(SequenceField[TextFieldTensors]):
| ...
| def count_vocab_items(self, counter: Dict[str, Dict[str, int]])
index¶
class TextField(SequenceField[TextFieldTensors]):
| ...
| def index(self, vocab: Vocabulary)
get_padding_lengths¶
class TextField(SequenceField[TextFieldTensors]):
| ...
| def get_padding_lengths(self) -> Dict[str, int]
The TextField
has a list of Tokens
, and each Token
gets converted into arrays by
(potentially) several TokenIndexers
. This method gets the max length (over tokens)
associated with each of these arrays.
sequence_length¶
class TextField(SequenceField[TextFieldTensors]):
| ...
| def sequence_length(self) -> int
as_tensor¶
class TextField(SequenceField[TextFieldTensors]):
| ...
| def as_tensor(
| self,
| padding_lengths: Dict[str, int]
| ) -> TextFieldTensors
empty_field¶
class TextField(SequenceField[TextFieldTensors]):
| ...
| def empty_field(self)
batch_tensors¶
class TextField(SequenceField[TextFieldTensors]):
| ...
| def batch_tensors(
| self,
| tensor_list: List[TextFieldTensors]
| ) -> TextFieldTensors
__iter__¶
class TextField(SequenceField[TextFieldTensors]):
| ...
| def __iter__(self) -> Iterator[Token]
duplicate¶
class TextField(SequenceField[TextFieldTensors]):
| ...
| def duplicate(self)
Overrides the behavior of duplicate
so that self._token_indexers
won't
actually be deep-copied.
Not only would it be extremely inefficient to deep-copy the token indexers, but it also fails in many cases since some tokenizers (like those used in the 'transformers' lib) cannot actually be deep-copied.
human_readable_repr¶
class TextField(SequenceField[TextFieldTensors]):
| ...
| def human_readable_repr(self) -> List[str]