Skip to content

bag_of_word_counts_token_embedder

allennlp.modules.token_embedders.bag_of_word_counts_token_embedder

[SOURCE]


BagOfWordCountsTokenEmbedder

@TokenEmbedder.register("bag_of_word_counts")
class BagOfWordCountsTokenEmbedder(TokenEmbedder):
 | def __init__(
 |     self,
 |     vocab: Vocabulary,
 |     vocab_namespace: str = "tokens",
 |     projection_dim: int = None,
 |     ignore_oov: bool = False
 | ) -> None

Represents a sequence of tokens as a bag of (discrete) word ids, as it was done in the pre-neural days.

Each sequence gets a vector of length vocabulary size, where the i'th entry in the vector corresponds to number of times the i'th token in the vocabulary appears in the sequence.

By default, we ignore padding tokens.

Registered as a TokenEmbedder with name "bag_of_word_counts".

Parameters

  • vocab : Vocabulary
  • vocab_namespace : str, optional (default = "tokens")
    namespace of vocabulary to embed
  • projection_dim : int, optional (default = None)
    if specified, will project the resulting bag of words representation to specified dimension.
  • ignore_oov : bool, optional (default = False)
    If true, we ignore the OOV token.

get_output_dim

class BagOfWordCountsTokenEmbedder(TokenEmbedder):
 | ...
 | def get_output_dim(self)

forward

class BagOfWordCountsTokenEmbedder(TokenEmbedder):
 | ...
 | def forward(self, inputs: torch.Tensor) -> torch.Tensor

Parameters

  • inputs : torch.Tensor
    Shape (batch_size, timesteps, sequence_length) of word ids representing the current batch.

Returns

  • torch.Tensor
    The bag-of-words representations for the input sequence, shape (batch_size, vocab_size)