[ allennlp.modules.token_embedders.elmo_token_embedder ]
class ElmoTokenEmbedder(TokenEmbedder): | def __init__( | self, | options_file: str = "https://allennlp.s3.amazonaws.com/models/elmo/2x4096_512_2048cnn_2xhighway/" | + "elmo_2x4096_512_2048cnn_2xhighway_options.json", | weight_file: str = "https://allennlp.s3.amazonaws.com/models/elmo/2x4096_512_2048cnn_2xhighway/" | + "elmo_2x4096_512_2048cnn_2xhighway_weights.hdf5", | do_layer_norm: bool = False, | dropout: float = 0.5, | requires_grad: bool = False, | projection_dim: int = None, | vocab_to_cache: List[str] = None, | scalar_mix_parameters: List[float] = None | ) -> None
Compute a single layer of ELMo representations.
This class serves as a convenience when you only want to use one layer of ELMo representations at the input of your network. It's essentially a wrapper around Elmo(num_output_representations=1, ...)
Registered as a
TokenEmbedder with name "elmo_token_embedder".
- options_file :
An ELMo JSON options file.
- weight_file :
An ELMo hdf5 weight file.
- do_layer_norm :
Should we apply layer normalization (passed to
- dropout :
float, optional (default =
The dropout value to be applied to the ELMo representations.
- requires_grad :
If True, compute gradient of ELMo parameters for fine tuning.
- projection_dim :
If given, we will project the ELMo embedding down to this dimension. We recommend that you try using ELMo with a lot of dropout and no projection first, but we have found a few cases where projection helps (particularly where there is very limited training data).
- vocab_to_cache :
A list of words to pre-compute and cache character convolutions for. If you use this option, the ElmoTokenEmbedder expects that you pass word indices of shape (batch_size, timesteps) to forward, instead of character indices. If you use this option and pass a word which wasn't pre-cached, this will break.
- scalar_mix_parameters :
List[int], optional (default =
None, use these scalar mix parameters to weight the representations produced by different layers. These mixing weights are not updated during training. The mixing weights here should be the unnormalized (i.e., pre-softmax) weights. So, if you wanted to use only the 1st layer of a 2-layer ELMo, you can set this to [-9e10, 1, -9e10 ].
| def get_output_dim(self) -> int
| def forward( | self, | elmo_tokens: torch.Tensor, | word_inputs: torch.Tensor = None | ) -> torch.Tensor
- elmo_tokens :
(batch_size, timesteps, 50)of character ids representing the current batch.
- word_inputs :
If you passed a cached vocab, you can in addition pass a tensor of shape
(batch_size, timesteps), which represent word ids which have been pre-cached.
The ELMo representations for the input sequence, shape
(batch_size, timesteps, embedding_dim)