sampled_softmax_loss
allennlp.modules.sampled_softmax_loss
https://github.com/tensorflow/tensorflow/blob/r1.4/tensorflow/python/ops/nn_impl.py#L885
SampledSoftmaxLoss#
class SampledSoftmaxLoss(torch.nn.Module):
| def __init__(
| self,
| num_words: int,
| embedding_dim: int,
| num_samples: int,
| sparse: bool = False,
| unk_id: int = None,
| use_character_inputs: bool = True,
| use_fast_sampler: bool = False
| ) -> None
Based on the default log_uniform_candidate_sampler in tensorflow.
Note
num_words DOES NOT include padding id.
Note
In all cases except (tie_embeddings=True and use_character_inputs=False) the weights are dimensioned as num_words and do not include an entry for the padding (0) id. For the (tie_embeddings=True and use_character_inputs=False) case, then the embeddings DO include the extra 0 padding, to be consistent with the word embedding layer.
Parameters
num_words, int
, required
The number of words in the vocabulary
embedding_dim, int
, required
The dimension to softmax over
num_samples, int
, required
During training take this many samples. Must be less than num_words.
sparse, bool
, optional (default = False
)
If this is true, we use a sparse embedding matrix.
unk_id, int
, optional (default = None
)
If provided, the id that represents unknown characters.
use_character_inputs, bool
, optional (default = True
)
Whether to use character inputs
use_fast_sampler, bool
, optional (default = False
)
Whether to use the fast cython sampler.
initialize_num_words#
class SampledSoftmaxLoss(torch.nn.Module):
| ...
| def initialize_num_words(self)
forward#
class SampledSoftmaxLoss(torch.nn.Module):
| ...
| def forward(
| self,
| embeddings: torch.Tensor,
| targets: torch.Tensor,
| target_token_embedding: torch.Tensor = None
| ) -> torch.Tensor
embeddings is size (n, embedding_dim) targets is (n_words, ) with the index of the actual target when tieing weights, target_token_embedding is required. it is size (n_words, embedding_dim) returns log likelihood loss (batch_size, ) Does not do any count normalization / divide by batch size
log_uniform_candidate_sampler#
class SampledSoftmaxLoss(torch.nn.Module):
| ...
| def log_uniform_candidate_sampler(
| self,
| targets,
| choice_func=_choice
| )
algorithm: keep track of number of tries when doing sampling, then expected count is -expm1(num_tries * log1p(-p)) = (1 - (1-p)^num_tries) where p is self._probs[id]