bag_of_word_counts_token_embedder
allennlp.modules.token_embedders.bag_of_word_counts_token_embedder
BagOfWordCountsTokenEmbedder#
@TokenEmbedder.register("bag_of_word_counts")
class BagOfWordCountsTokenEmbedder(TokenEmbedder):
| def __init__(
| self,
| vocab: Vocabulary,
| vocab_namespace: str = "tokens",
| projection_dim: int = None,
| ignore_oov: bool = False
| ) -> None
Represents a sequence of tokens as a bag of (discrete) word ids, as it was done in the pre-neural days.
Each sequence gets a vector of length vocabulary size, where the i'th entry in the vector corresponds to number of times the i'th token in the vocabulary appears in the sequence.
By default, we ignore padding tokens.
Registered as a TokenEmbedder
with name "bag_of_word_counts".
Parameters
- vocab :
Vocabulary
- vocab_namespace :
str
, optional (default ="tokens"
)
namespace of vocabulary to embed - projection_dim :
int
, optional (default =None
)
if specified, will project the resulting bag of words representation to specified dimension. - ignore_oov :
bool
, optional (default =False
)
If true, we ignore the OOV token.
get_output_dim#
class BagOfWordCountsTokenEmbedder(TokenEmbedder):
| ...
| def get_output_dim(self)
forward#
class BagOfWordCountsTokenEmbedder(TokenEmbedder):
| ...
| def forward(self, inputs: torch.Tensor) -> torch.Tensor
Parameters
- inputs :
torch.Tensor
Shape(batch_size, timesteps, sequence_length)
of word ids representing the current batch.
Returns
torch.Tensor
The bag-of-words representations for the input sequence, shape(batch_size, vocab_size)