allennlp.modules.text_field_embedders

A TextFieldEmbedder is a Module that takes as input the dict of NumPy arrays produced by a TextField and returns as output an embedded representation of the tokens in that field.

class allennlp.modules.text_field_embedders.text_field_embedder.TextFieldEmbedder[source]

Bases: torch.nn.modules.module.Module, allennlp.common.registrable.Registrable

A TextFieldEmbedder is a Module that takes as input the DataArray produced by a TextField and returns as output an embedded representation of the tokens in that field.

The DataArrays produced by TextFields are dictionaries with named representations, like “words” and “characters”. When you create a TextField, you pass in a dictionary of TokenIndexer objects, telling the field how exactly the tokens in the field should be represented. This class changes the type signature of Module.forward, restricting TextFieldEmbedders to take inputs corresponding to a single TextField, which is a dictionary of tensors with the same names as were passed to the TextField.

We also add a method to the basic Module API: get_output_dim(). You might need this if you want to construct a Linear layer using the output of this embedder, for instance.

default_implementation: str = 'basic'
forward(self, text_field_input: Dict[str, torch.Tensor], num_wrapping_dims: int = 0, **kwargs) → torch.Tensor[source]
Parameters
text_field_inputDict[str, torch.Tensor]

A dictionary that was the output of a call to TextField.as_tensor. Each tensor in here is assumed to have a shape roughly similar to (batch_size, sequence_length) (perhaps with an extra trailing dimension for the characters in each token).

num_wrapping_dimsint, optional (default=0)

If you have a ListField[TextField] that created the text_field_input, you’ll end up with tensors of shape (batch_size, wrapping_dim1, wrapping_dim2, ..., sequence_length). This parameter tells us how many wrapping dimensions there are, so that we can correctly TimeDistribute the embedding of each named representation.

get_output_dim(self) → int[source]

Returns the dimension of the vector representing each token in the output of this TextFieldEmbedder. This is not the shape of the returned tensor, but the last element of that shape.

class allennlp.modules.text_field_embedders.basic_text_field_embedder.BasicTextFieldEmbedder(token_embedders: Dict[str, allennlp.modules.token_embedders.token_embedder.TokenEmbedder], embedder_to_indexer_map: Dict[str, Union[List[str], Dict[str, str]]] = None, allow_unmatched_keys: bool = False)[source]

Bases: allennlp.modules.text_field_embedders.text_field_embedder.TextFieldEmbedder

This is a TextFieldEmbedder that wraps a collection of TokenEmbedder objects. Each TokenEmbedder embeds or encodes the representation output from one TokenIndexer. As the data produced by a TextField is a dictionary mapping names to these representations, we take TokenEmbedders with corresponding names. Each TokenEmbedders embeds its input, and the result is concatenated in an arbitrary order.

Parameters
token_embeddersDict[str, TokenEmbedder], required.

A dictionary mapping token embedder names to implementations. These names should match the corresponding indexer used to generate the tensor passed to the TokenEmbedder.

embedder_to_indexer_mapDict[str, Union[List[str], Dict[str, str]]], optional, (default = None)

Optionally, you can provide a mapping between the names of the TokenEmbedders that you are using to embed your TextField and an ordered list of indexer names which are needed for running it, or a mapping between the parameters which the TokenEmbedder.forward takes and the indexer names which are viewed as arguments. In most cases, your TokenEmbedder will only require a single tensor, because it is designed to run on the output of a single TokenIndexer. For example, the ELMo Token Embedder can be used in two modes, one of which requires both character ids and word ids for the same text. Note that the list of token indexer names is ordered, meaning that the tensors produced by the indexers will be passed to the embedders in the order you specify in this list. You can also use null in the configuration to set some specified parameters to None.

allow_unmatched_keysbool, optional (default = False)

If True, then don’t enforce the keys of the text_field_input to match those in token_embedders (useful if the mapping is specified via embedder_to_indexer_map).

forward(self, text_field_input: Dict[str, torch.Tensor], num_wrapping_dims: int = 0, **kwargs) → torch.Tensor[source]
Parameters
text_field_inputDict[str, torch.Tensor]

A dictionary that was the output of a call to TextField.as_tensor. Each tensor in here is assumed to have a shape roughly similar to (batch_size, sequence_length) (perhaps with an extra trailing dimension for the characters in each token).

num_wrapping_dimsint, optional (default=0)

If you have a ListField[TextField] that created the text_field_input, you’ll end up with tensors of shape (batch_size, wrapping_dim1, wrapping_dim2, ..., sequence_length). This parameter tells us how many wrapping dimensions there are, so that we can correctly TimeDistribute the embedding of each named representation.

classmethod from_params(vocab: allennlp.data.vocabulary.Vocabulary, params: allennlp.common.params.Params) → 'BasicTextFieldEmbedder'[source]

This is the automatic implementation of from_params. Any class that subclasses FromParams (or Registrable, which itself subclasses FromParams) gets this implementation for free. If you want your class to be instantiated from params in the “obvious” way – pop off parameters and hand them to your constructor with the same names – this provides that functionality.

If you need more complex logic in your from from_params method, you’ll have to implement your own method that overrides this one.

get_output_dim(self) → int[source]

Returns the dimension of the vector representing each token in the output of this TextFieldEmbedder. This is not the shape of the returned tensor, but the last element of that shape.