Skip to content


[ allennlp.modules.token_embedders.bag_of_word_counts_token_embedder ]


class BagOfWordCountsTokenEmbedder(TokenEmbedder):
 | def __init__(
 |     self,
 |     vocab: Vocabulary,
 |     vocab_namespace: str = "tokens",
 |     projection_dim: int = None,
 |     ignore_oov: bool = False
 | ) -> None

Represents a sequence of tokens as a bag of (discrete) word ids, as it was done in the pre-neural days.

Each sequence gets a vector of length vocabulary size, where the i'th entry in the vector corresponds to number of times the i'th token in the vocabulary appears in the sequence.

By default, we ignore padding tokens.

Registered as a TokenEmbedder with name "bag_of_word_counts".


  • vocab : Vocabulary
  • vocab_namespace : str, optional (default = "tokens")
    namespace of vocabulary to embed
  • projection_dim : int, optional (default = None)
    if specified, will project the resulting bag of words representation to specified dimension.
  • ignore_oov : bool, optional (default = False)
    If true, we ignore the OOV token.


class BagOfWordCountsTokenEmbedder(TokenEmbedder):
 | ...
 | def get_output_dim(self)


class BagOfWordCountsTokenEmbedder(TokenEmbedder):
 | ...
 | def forward(self, inputs: torch.Tensor) -> torch.Tensor


  • inputs : torch.Tensor
    Shape (batch_size, timesteps, sequence_length) of word ids representing the current batch.


  • torch.Tensor
    The bag-of-words representations for the input sequence, shape (batch_size, vocab_size)