embedding
allennlp.modules.token_embedders.embedding
Embedding#
@TokenEmbedder.register("embedding")
class Embedding(TokenEmbedder):
 | def __init__(
 |     self,
 |     embedding_dim: int,
 |     num_embeddings: int = None,
 |     projection_dim: int = None,
 |     weight: torch.FloatTensor = None,
 |     padding_index: int = None,
 |     trainable: bool = True,
 |     max_norm: float = None,
 |     norm_type: float = 2.0,
 |     scale_grad_by_freq: bool = False,
 |     sparse: bool = False,
 |     vocab_namespace: str = "tokens",
 |     pretrained_file: str = None,
 |     vocab: Vocabulary = None
 | ) -> None
A more featureful embedding module than the default in Pytorch. Adds the ability to:
1. embed higher-order inputs
2. pre-specify the weight matrix
3. use a non-trainable embedding
4. project the resultant embeddings to some other dimension (which only makes sense with
   non-trainable embeddings).
Note that if you are using our data API and are trying to embed a
TextField, you should use a
TextFieldEmbedder instead of using this directly.
Registered as a TokenEmbedder with name "embedding".
Parameters
- num_embeddings : 
int
Size of the dictionary of embeddings (vocabulary size). - embedding_dim : 
int
The size of each embedding vector. - projection_dim : 
int, optional (default =None)
If given, we add a projection layer after the embedding layer. This really only makes sense iftrainableisFalse. - weight : 
torch.FloatTensor, optional (default =None)
A pre-initialised weight matrix for the embedding lookup, allowing the use of pretrained vectors. - padding_index : 
int, optional (default =None)
If given, pads the output with zeros whenever it encounters the index. - trainable : 
bool, optional (default =True)
Whether or not to optimize the embedding parameters. - max_norm : 
float, optional (default =None)
If given, will renormalize the embeddings to always have a norm lesser than this - norm_type : 
float, optional (default =2)
The p of the p-norm to compute for the max_norm option - scale_grad_by_freq : 
bool, optional (default =False)
If given, this will scale gradients by the frequency of the words in the mini-batch. - sparse : 
bool, optional (default =False)
Whether or not the Pytorch backend should use a sparse representation of the embedding weight. - vocab_namespace : 
str, optional (default =None)
In case of fine-tuning/transfer learning, the model's embedding matrix needs to be extended according to the size of extended-vocabulary. To be able to know how much to extend the embedding-matrix, it's necessary to know which vocab_namspace was used to construct it in the original training. We store vocab_namespace used during the original training as an attribute, so that it can be retrieved during fine-tuning. - pretrained_file : 
str, optional (default =None)
Path to a file of word vectors to initialize the embedding matrix. It can be the path to a local file or a URL of a (cached) remote file. Two formats are supported: * hdf5 file - containing an embedding matrix in the form of a torch.Tensor; * text file - an utf-8 encoded text file with space separated fields. - 
vocab :
Vocabulary, optional (default =None)
Used to construct an embedding from a pretrained file.In a typical AllenNLP configuration file, this parameter does not get an entry under the "embedding", it gets specified as a top-level parameter, then is passed in to this module separately.
 
Returns
- An Embedding module. 
 
get_output_dim#
class Embedding(TokenEmbedder):
 | ...
 | @overrides
 | def get_output_dim(self) -> int
forward#
class Embedding(TokenEmbedder):
 | ...
 | @overrides
 | def forward(self, tokens: torch.Tensor) -> torch.Tensor
tokens may have extra dimensions (batch_size, d1, ..., dn, sequence_length), but embedding expects (batch_size, sequence_length), so pass tokens to util.combine_initial_dims (which is a no-op if there are no extra dimensions). Remember the original size.
extend_vocab#
class Embedding(TokenEmbedder):
 | ...
 | def extend_vocab(
 |     self,
 |     extended_vocab: Vocabulary,
 |     vocab_namespace: str = None,
 |     extension_pretrained_file: str = None,
 |     model_path: str = None
 | )
Extends the embedding matrix according to the extended vocabulary. If extension_pretrained_file is available, it will be used for initializing the new words embeddings in the extended vocabulary; otherwise we will check if _pretrained_file attribute is already available. If none is available, they will be initialized with xavier uniform.
Parameters
- extended_vocab : 
Vocabulary
Vocabulary extended from original vocabulary used to construct thisEmbedding. - vocab_namespace : 
str, optional (default =None)
In case you know what vocab_namespace should be used for extension, you can pass it. If not passed, it will check if vocab_namespace used at the time ofEmbeddingconstruction is available. If so, this namespace will be used or else extend_vocab will be a no-op. - extension_pretrained_file : 
str, optional (default =None)
A file containing pretrained embeddings can be specified here. It can be the path to a local file or an URL of a (cached) remote file. Check format details infrom_paramsofEmbeddingclass. - model_path : 
str, optional (default =None)
Path traversing the model attributes upto this embedding module. Eg. "_text_field_embedder.token_embedder_tokens". This is only useful to give a helpful error message when extend_vocab is implicitly called by train or any other command. 
format_embeddings_file_uri#
def format_embeddings_file_uri(
    main_file_path_or_url: str,
    path_inside_archive: Optional[str] = None
) -> str
EmbeddingsFileURI#
class EmbeddingsFileURI(NamedTuple)
main_file_uri#
class EmbeddingsFileURI(NamedTuple):
 | ...
 | main_file_uri: str = None
path_inside_archive#
class EmbeddingsFileURI(NamedTuple):
 | ...
 | path_inside_archive: Optional[str] = None
parse_embeddings_file_uri#
def parse_embeddings_file_uri(uri: str) -> "EmbeddingsFileURI"
EmbeddingsTextFile#
class EmbeddingsTextFile(Iterator[str]):
 | def __init__(
 |     self,
 |     file_uri: str,
 |     encoding: str = DEFAULT_ENCODING,
 |     cache_dir: str = None
 | ) -> None
Utility class for opening embeddings text files. Handles various compression formats, as well as context management.
Parameters
- 
file_uri :
str
It can be:- a file system path or a URL of an eventually compressed text file or a zip/tar archive containing a single file.
 - URI of the type 
(archive_path_or_url)#file_path_inside_archiveif the text file is contained in a multi-file archive. 
 - 
encoding :
str - cache_dir : 
str 
DEFAULT_ENCODING#
class EmbeddingsTextFile(Iterator[str]):
 | ...
 | DEFAULT_ENCODING = "utf-8"
read#
class EmbeddingsTextFile(Iterator[str]):
 | ...
 | def read(self) -> str
readline#
class EmbeddingsTextFile(Iterator[str]):
 | ...
 | def readline(self) -> str
close#
class EmbeddingsTextFile(Iterator[str]):
 | ...
 | def close(self) -> None