embedding
[ allennlp.modules.token_embedders.embedding ]
Embedding#
@TokenEmbedder.register("embedding")
class Embedding(TokenEmbedder):
 | def __init__(
 |     self,
 |     embedding_dim: int,
 |     num_embeddings: int = None,
 |     projection_dim: int = None,
 |     weight: torch.FloatTensor = None,
 |     padding_index: int = None,
 |     trainable: bool = True,
 |     max_norm: float = None,
 |     norm_type: float = 2.0,
 |     scale_grad_by_freq: bool = False,
 |     sparse: bool = False,
 |     vocab_namespace: str = "tokens",
 |     pretrained_file: str = None,
 |     vocab: Vocabulary = None
 | ) -> None
A more featureful embedding module than the default in Pytorch. Adds the ability to:
1. embed higher-order inputs
2. pre-specify the weight matrix
3. use a non-trainable embedding
4. project the resultant embeddings to some other dimension (which only makes sense with
   non-trainable embeddings).
Note that if you are using our data API and are trying to embed a
TextField, you should use a
TextFieldEmbedder instead of using this directly.
Registered as a TokenEmbedder with name "embedding".
Parameters
- num_embeddings : int
 Size of the dictionary of embeddings (vocabulary size).
- embedding_dim : int
 The size of each embedding vector.
- projection_dim : int, optional (default =None)
 If given, we add a projection layer after the embedding layer. This really only makes sense iftrainableisFalse.
- weight : torch.FloatTensor, optional (default =None)
 A pre-initialised weight matrix for the embedding lookup, allowing the use of pretrained vectors.
- padding_index : int, optional (default =None)
 If given, pads the output with zeros whenever it encounters the index.
- trainable : bool, optional (default =True)
 Whether or not to optimize the embedding parameters.
- max_norm : float, optional (default =None)
 If given, will renormalize the embeddings to always have a norm lesser than this
- norm_type : float, optional (default =2)
 The p of the p-norm to compute for the max_norm option
- scale_grad_by_freq : bool, optional (default =False)
 If given, this will scale gradients by the frequency of the words in the mini-batch.
- sparse : bool, optional (default =False)
 Whether or not the Pytorch backend should use a sparse representation of the embedding weight.
- vocab_namespace : str, optional (default =None)
 In case of fine-tuning/transfer learning, the model's embedding matrix needs to be extended according to the size of extended-vocabulary. To be able to know how much to extend the embedding-matrix, it's necessary to know which vocab_namspace was used to construct it in the original training. We store vocab_namespace used during the original training as an attribute, so that it can be retrieved during fine-tuning.
- pretrained_file : str, optional (default =None)
 Path to a file of word vectors to initialize the embedding matrix. It can be the path to a local file or a URL of a (cached) remote file. Two formats are supported: * hdf5 file - containing an embedding matrix in the form of a torch.Tensor; * text file - an utf-8 encoded text file with space separated fields.
- 
vocab : Vocabulary, optional (default =None)
 Used to construct an embedding from a pretrained file.In a typical AllenNLP configuration file, this parameter does not get an entry under the "embedding", it gets specified as a top-level parameter, then is passed in to this module separately. 
Returns
- An Embedding module. 
get_output_dim#
class Embedding(TokenEmbedder):
 | ...
 | @overrides
 | def get_output_dim(self) -> int
forward#
class Embedding(TokenEmbedder):
 | ...
 | @overrides
 | def forward(self, tokens: torch.Tensor) -> torch.Tensor
tokens may have extra dimensions (batch_size, d1, ..., dn, sequence_length), but embedding expects (batch_size, sequence_length), so pass tokens to util.combine_initial_dims (which is a no-op if there are no extra dimensions). Remember the original size.
extend_vocab#
class Embedding(TokenEmbedder):
 | ...
 | def extend_vocab(
 |     self,
 |     extended_vocab: Vocabulary,
 |     vocab_namespace: str = None,
 |     extension_pretrained_file: str = None,
 |     model_path: str = None
 | )
Extends the embedding matrix according to the extended vocabulary. If extension_pretrained_file is available, it will be used for initializing the new words embeddings in the extended vocabulary; otherwise we will check if _pretrained_file attribute is already available. If none is available, they will be initialized with xavier uniform.
Parameters
- extended_vocab : Vocabulary
 Vocabulary extended from original vocabulary used to construct thisEmbedding.
- vocab_namespace : str, optional (default =None)
 In case you know what vocab_namespace should be used for extension, you can pass it. If not passed, it will check if vocab_namespace used at the time ofEmbeddingconstruction is available. If so, this namespace will be used or else extend_vocab will be a no-op.
- extension_pretrained_file : str, optional (default =None)
 A file containing pretrained embeddings can be specified here. It can be the path to a local file or an URL of a (cached) remote file. Check format details infrom_paramsofEmbeddingclass.
- model_path : str, optional (default =None)
 Path traversing the model attributes upto this embedding module. Eg. "_text_field_embedder.token_embedder_tokens". This is only useful to give a helpful error message when extend_vocab is implicitly called by train or any other command.
format_embeddings_file_uri#
def format_embeddings_file_uri(
    main_file_path_or_url: str,
    path_inside_archive: Optional[str] = None
) -> str
EmbeddingsFileURI#
class EmbeddingsFileURI(NamedTuple)
main_file_uri#
class EmbeddingsFileURI(NamedTuple):
 | ...
 | main_file_uri: str = None
path_inside_archive#
class EmbeddingsFileURI(NamedTuple):
 | ...
 | path_inside_archive: Optional[str] = None
parse_embeddings_file_uri#
def parse_embeddings_file_uri(uri: str) -> "EmbeddingsFileURI"
EmbeddingsTextFile#
class EmbeddingsTextFile(Iterator[str]):
 | def __init__(
 |     self,
 |     file_uri: str,
 |     encoding: str = DEFAULT_ENCODING,
 |     cache_dir: str = None
 | ) -> None
Utility class for opening embeddings text files. Handles various compression formats, as well as context management.
Parameters
- 
file_uri : str
 It can be:- a file system path or a URL of an eventually compressed text file or a zip/tar archive containing a single file.
- URI of the type (archive_path_or_url)#file_path_inside_archiveif the text file is contained in a multi-file archive.
 
- 
encoding : str
- cache_dir : str
DEFAULT_ENCODING#
class EmbeddingsTextFile(Iterator[str]):
 | ...
 | DEFAULT_ENCODING = "utf-8"
read#
class EmbeddingsTextFile(Iterator[str]):
 | ...
 | def read(self) -> str
readline#
class EmbeddingsTextFile(Iterator[str]):
 | ...
 | def readline(self) -> str
close#
class EmbeddingsTextFile(Iterator[str]):
 | ...
 | def close(self) -> None