Skip to content

single_id_token_indexer

allennlp.data.token_indexers.single_id_token_indexer

[SOURCE]


SingleIdTokenIndexer#

@TokenIndexer.register("single_id")
class SingleIdTokenIndexer(TokenIndexer):
 | def __init__(
 |     self,
 |     namespace: Optional[str] = "tokens",
 |     lowercase_tokens: bool = False,
 |     start_tokens: List[str] = None,
 |     end_tokens: List[str] = None,
 |     feature_name: str = "text",
 |     default_value: str = _DEFAULT_VALUE,
 |     token_min_padding_length: int = 0
 | ) -> None

This TokenIndexer represents tokens as single integers.

Registered as a TokenIndexer with name "single_id".

Parameters

  • namespace : Optional[str], optional (default = "tokens")
    We will use this namespace in the Vocabulary to map strings to indices. If you explicitly pass in None here, we will skip indexing and vocabulary lookups. This means that the feature_name you use must correspond to an integer value (like text_id, for instance, which gets set by some tokenizers, such as when using byte encoding).
  • lowercase_tokens : bool, optional (default = False)
    If True, we will call token.lower() before getting an index for the token from the vocabulary.
  • start_tokens : List[str], optional (default = None)
    These are prepended to the tokens provided to tokens_to_indices.
  • end_tokens : List[str], optional (default = None)
    These are appended to the tokens provided to tokens_to_indices.
  • feature_name : str, optional (default = "text")
    We will use the Token attribute with this name as input. This is potentially useful, e.g., for using NER tags instead of (or in addition to) surface forms as your inputs (passing ent_type_ here would do that). If you use a non-default value here, you almost certainly want to also change the namespace parameter, and you might want to give a default_value.
  • default_value : str, optional
    When you want to use a non-default feature_name, you sometimes want to have a default value to go with it, e.g., in case you don't have an NER tag for a particular token, for some reason. This value will get used if we don't find a value in feature_name. If this is not given, we will crash if a token doesn't have a value for the given feature_name, so that you don't get weird, silent errors by default.
  • token_min_padding_length : int, optional (default = 0)
    See TokenIndexer.

count_vocab_items#

class SingleIdTokenIndexer(TokenIndexer):
 | ...
 | @overrides
 | def count_vocab_items(
 |     self,
 |     token: Token,
 |     counter: Dict[str, Dict[str, int]]
 | )

tokens_to_indices#

class SingleIdTokenIndexer(TokenIndexer):
 | ...
 | @overrides
 | def tokens_to_indices(
 |     self,
 |     tokens: List[Token],
 |     vocabulary: Vocabulary
 | ) -> Dict[str, List[int]]

get_empty_token_list#

class SingleIdTokenIndexer(TokenIndexer):
 | ...
 | @overrides
 | def get_empty_token_list(self) -> IndexedTokenList