allennlp.modules.text_field_embedders¶
A TextFieldEmbedder
is a Module that takes as input the dict of NumPy arrays
produced by a TextField and
returns as output an embedded representation of the tokens in that field.
-
class
allennlp.modules.text_field_embedders.text_field_embedder.TextFieldEmbedder[source]¶ Bases:
torch.nn.modules.module.Module,allennlp.common.registrable.RegistrableA
TextFieldEmbedderis aModulethat takes as input theDataArrayproduced by aTextFieldand returns as output an embedded representation of the tokens in that field.The
DataArraysproduced byTextFieldsare dictionaries with named representations, like “words” and “characters”. When you create aTextField, you pass in a dictionary ofTokenIndexerobjects, telling the field how exactly the tokens in the field should be represented. This class changes the type signature ofModule.forward, restrictingTextFieldEmbeddersto take inputs corresponding to a singleTextField, which is a dictionary of tensors with the same names as were passed to theTextField.We also add a method to the basic
ModuleAPI:get_output_dim(). You might need this if you want to construct aLinearlayer using the output of this embedder, for instance.-
default_implementation: str = 'basic'¶
-
forward(self, text_field_input: Dict[str, torch.Tensor], num_wrapping_dims: int = 0, **kwargs) → torch.Tensor[source]¶ - Parameters
- text_field_input
Dict[str, torch.Tensor] A dictionary that was the output of a call to
TextField.as_tensor. Each tensor in here is assumed to have a shape roughly similar to(batch_size, sequence_length)(perhaps with an extra trailing dimension for the characters in each token).- num_wrapping_dims
int, optional (default=0) If you have a
ListField[TextField]that created thetext_field_input, you’ll end up with tensors of shape(batch_size, wrapping_dim1, wrapping_dim2, ..., sequence_length). This parameter tells us how many wrapping dimensions there are, so that we can correctlyTimeDistributethe embedding of each named representation.
- text_field_input
-
-
class
allennlp.modules.text_field_embedders.basic_text_field_embedder.BasicTextFieldEmbedder(token_embedders: Dict[str, allennlp.modules.token_embedders.token_embedder.TokenEmbedder], embedder_to_indexer_map: Dict[str, Union[List[str], Dict[str, str]]] = None, allow_unmatched_keys: bool = False)[source]¶ Bases:
allennlp.modules.text_field_embedders.text_field_embedder.TextFieldEmbedderThis is a
TextFieldEmbedderthat wraps a collection ofTokenEmbedderobjects. EachTokenEmbedderembeds or encodes the representation output from oneTokenIndexer. As the data produced by aTextFieldis a dictionary mapping names to these representations, we takeTokenEmbedderswith corresponding names. EachTokenEmbeddersembeds its input, and the result is concatenated in an arbitrary order.- Parameters
- token_embedders
Dict[str, TokenEmbedder], required. A dictionary mapping token embedder names to implementations. These names should match the corresponding indexer used to generate the tensor passed to the TokenEmbedder.
- embedder_to_indexer_map
Dict[str, Union[List[str], Dict[str, str]]], optional, (default = None) Optionally, you can provide a mapping between the names of the TokenEmbedders that you are using to embed your TextField and an ordered list of indexer names which are needed for running it, or a mapping between the parameters which the
TokenEmbedder.forwardtakes and the indexer names which are viewed as arguments. In most cases, your TokenEmbedder will only require a single tensor, because it is designed to run on the output of a single TokenIndexer. For example, the ELMo Token Embedder can be used in two modes, one of which requires both character ids and word ids for the same text. Note that the list of token indexer names is ordered, meaning that the tensors produced by the indexers will be passed to the embedders in the order you specify in this list. You can also use null in the configuration to set some specified parameters to None.- allow_unmatched_keys
bool, optional (default = False) If True, then don’t enforce the keys of the
text_field_inputto match those intoken_embedders(useful if the mapping is specified viaembedder_to_indexer_map).
- token_embedders
-
forward(self, text_field_input: Dict[str, torch.Tensor], num_wrapping_dims: int = 0, **kwargs) → torch.Tensor[source]¶ - Parameters
- text_field_input
Dict[str, torch.Tensor] A dictionary that was the output of a call to
TextField.as_tensor. Each tensor in here is assumed to have a shape roughly similar to(batch_size, sequence_length)(perhaps with an extra trailing dimension for the characters in each token).- num_wrapping_dims
int, optional (default=0) If you have a
ListField[TextField]that created thetext_field_input, you’ll end up with tensors of shape(batch_size, wrapping_dim1, wrapping_dim2, ..., sequence_length). This parameter tells us how many wrapping dimensions there are, so that we can correctlyTimeDistributethe embedding of each named representation.
- text_field_input
-
classmethod
from_params(vocab: allennlp.data.vocabulary.Vocabulary, params: allennlp.common.params.Params) → 'BasicTextFieldEmbedder'[source]¶ This is the automatic implementation of from_params. Any class that subclasses FromParams (or Registrable, which itself subclasses FromParams) gets this implementation for free. If you want your class to be instantiated from params in the “obvious” way – pop off parameters and hand them to your constructor with the same names – this provides that functionality.
If you need more complex logic in your from from_params method, you’ll have to implement your own method that overrides this one.