allennlp.modules.text_field_embedders¶
A TextFieldEmbedder
is a Module
that takes as input the dict
of NumPy arrays
produced by a TextField
and
returns as output an embedded representation of the tokens in that field.
-
class
allennlp.modules.text_field_embedders.text_field_embedder.
TextFieldEmbedder
[source]¶ Bases:
torch.nn.modules.module.Module
,allennlp.common.registrable.Registrable
A
TextFieldEmbedder
is aModule
that takes as input theDataArray
produced by aTextField
and returns as output an embedded representation of the tokens in that field.The
DataArrays
produced byTextFields
are dictionaries with named representations, like “words” and “characters”. When you create aTextField
, you pass in a dictionary ofTokenIndexer
objects, telling the field how exactly the tokens in the field should be represented. This class changes the type signature ofModule.forward
, restrictingTextFieldEmbedders
to take inputs corresponding to a singleTextField
, which is a dictionary of tensors with the same names as were passed to theTextField
.We also add a method to the basic
Module
API:get_output_dim()
. You might need this if you want to construct aLinear
layer using the output of this embedder, for instance.-
default_implementation
: str = 'basic'¶
-
forward
(self, text_field_input: Dict[str, torch.Tensor], num_wrapping_dims: int = 0, **kwargs) → torch.Tensor[source]¶ - Parameters
- text_field_input
Dict[str, torch.Tensor]
A dictionary that was the output of a call to
TextField.as_tensor
. Each tensor in here is assumed to have a shape roughly similar to(batch_size, sequence_length)
(perhaps with an extra trailing dimension for the characters in each token).- num_wrapping_dims
int
, optional (default=0) If you have a
ListField[TextField]
that created thetext_field_input
, you’ll end up with tensors of shape(batch_size, wrapping_dim1, wrapping_dim2, ..., sequence_length)
. This parameter tells us how many wrapping dimensions there are, so that we can correctlyTimeDistribute
the embedding of each named representation.
- text_field_input
-
-
class
allennlp.modules.text_field_embedders.basic_text_field_embedder.
BasicTextFieldEmbedder
(token_embedders: Dict[str, allennlp.modules.token_embedders.token_embedder.TokenEmbedder], embedder_to_indexer_map: Dict[str, Union[List[str], Dict[str, str]]] = None, allow_unmatched_keys: bool = False)[source]¶ Bases:
allennlp.modules.text_field_embedders.text_field_embedder.TextFieldEmbedder
This is a
TextFieldEmbedder
that wraps a collection ofTokenEmbedder
objects. EachTokenEmbedder
embeds or encodes the representation output from oneTokenIndexer
. As the data produced by aTextField
is a dictionary mapping names to these representations, we takeTokenEmbedders
with corresponding names. EachTokenEmbedders
embeds its input, and the result is concatenated in an arbitrary order.- Parameters
- token_embedders
Dict[str, TokenEmbedder]
, required. A dictionary mapping token embedder names to implementations. These names should match the corresponding indexer used to generate the tensor passed to the TokenEmbedder.
- embedder_to_indexer_map
Dict[str, Union[List[str], Dict[str, str]]]
, optional, (default = None) Optionally, you can provide a mapping between the names of the TokenEmbedders that you are using to embed your TextField and an ordered list of indexer names which are needed for running it, or a mapping between the parameters which the
TokenEmbedder.forward
takes and the indexer names which are viewed as arguments. In most cases, your TokenEmbedder will only require a single tensor, because it is designed to run on the output of a single TokenIndexer. For example, the ELMo Token Embedder can be used in two modes, one of which requires both character ids and word ids for the same text. Note that the list of token indexer names is ordered, meaning that the tensors produced by the indexers will be passed to the embedders in the order you specify in this list. You can also use null in the configuration to set some specified parameters to None.- allow_unmatched_keys
bool
, optional (default = False) If True, then don’t enforce the keys of the
text_field_input
to match those intoken_embedders
(useful if the mapping is specified viaembedder_to_indexer_map
).
- token_embedders
-
forward
(self, text_field_input: Dict[str, torch.Tensor], num_wrapping_dims: int = 0, **kwargs) → torch.Tensor[source]¶ - Parameters
- text_field_input
Dict[str, torch.Tensor]
A dictionary that was the output of a call to
TextField.as_tensor
. Each tensor in here is assumed to have a shape roughly similar to(batch_size, sequence_length)
(perhaps with an extra trailing dimension for the characters in each token).- num_wrapping_dims
int
, optional (default=0) If you have a
ListField[TextField]
that created thetext_field_input
, you’ll end up with tensors of shape(batch_size, wrapping_dim1, wrapping_dim2, ..., sequence_length)
. This parameter tells us how many wrapping dimensions there are, so that we can correctlyTimeDistribute
the embedding of each named representation.
- text_field_input
-
classmethod
from_params
(vocab: allennlp.data.vocabulary.Vocabulary, params: allennlp.common.params.Params) → 'BasicTextFieldEmbedder'[source]¶ This is the automatic implementation of from_params. Any class that subclasses FromParams (or Registrable, which itself subclasses FromParams) gets this implementation for free. If you want your class to be instantiated from params in the “obvious” way – pop off parameters and hand them to your constructor with the same names – this provides that functionality.
If you need more complex logic in your from from_params method, you’ll have to implement your own method that overrides this one.