sampled_softmax_loss

allennlp.modules.sampled_softmax_loss

https://github.com/tensorflow/tensorflow/blob/r1.4/tensorflow/python/ops/nn_impl.py#L885

SampledSoftmaxLoss#

class SampledSoftmaxLoss(torch.nn.Module):
 | def __init__(
 |     self,
 |     num_words: int,
 |     embedding_dim: int,
 |     num_samples: int,
 |     sparse: bool = False,
 |     unk_id: int = None,
 |     use_character_inputs: bool = True,
 |     use_fast_sampler: bool = False
 | ) -> None

Based on the default log_uniform_candidate_sampler in tensorflow.

Note

num_words DOES NOT include padding id.

Note

In all cases except (tie_embeddings=True and use_character_inputs=False) the weights are dimensioned as num_words and do not include an entry for the padding (0) id. For the (tie_embeddings=True and use_character_inputs=False) case, then the embeddings DO include the extra 0 padding, to be consistent with the word embedding layer.

Parameters

num_words, int, required The number of words in the vocabulary embedding_dim, int, required The dimension to softmax over num_samples, int, required During training take this many samples. Must be less than num_words. sparse, bool, optional (default = False) If this is true, we use a sparse embedding matrix. unk_id, int, optional (default = None) If provided, the id that represents unknown characters. use_character_inputs, bool, optional (default = True) Whether to use character inputs use_fast_sampler, bool, optional (default = False) Whether to use the fast cython sampler.

initialize_num_words#

class SampledSoftmaxLoss(torch.nn.Module):
 | ...
 | def initialize_num_words(self)

forward#

class SampledSoftmaxLoss(torch.nn.Module):
 | ...
 | def forward(
 |     self,
 |     embeddings: torch.Tensor,
 |     targets: torch.Tensor,
 |     target_token_embedding: torch.Tensor = None
 | ) -> torch.Tensor

embeddings is size (n, embedding_dim) targets is (n_words, ) with the index of the actual target when tieing weights, target_token_embedding is required. it is size (n_words, embedding_dim) returns log likelihood loss (batch_size, ) Does not do any count normalization / divide by batch size

log_uniform_candidate_sampler#

class SampledSoftmaxLoss(torch.nn.Module):
 | ...
 | def log_uniform_candidate_sampler(
 |     self,
 |     targets,
 |     choice_func=_choice
 | )

algorithm: keep track of number of tries when doing sampling, then expected count is -expm1(num_tries * log1p(-p)) = (1 - (1-p)^num_tries) where p is self._probs[id]