single_id_token_indexer
allennlp.data.token_indexers.single_id_token_indexer
SingleIdTokenIndexer#
@TokenIndexer.register("single_id")
class SingleIdTokenIndexer(TokenIndexer):
| def __init__(
| self,
| namespace: Optional[str] = "tokens",
| lowercase_tokens: bool = False,
| start_tokens: List[str] = None,
| end_tokens: List[str] = None,
| feature_name: str = "text",
| default_value: str = _DEFAULT_VALUE,
| token_min_padding_length: int = 0
| ) -> None
This TokenIndexer
represents tokens as single integers.
Registered as a TokenIndexer
with name "single_id".
Parameters
- namespace :
Optional[str]
, optional (default ="tokens"
)
We will use this namespace in theVocabulary
to map strings to indices. If you explicitly pass inNone
here, we will skip indexing and vocabulary lookups. This means that thefeature_name
you use must correspond to an integer value (liketext_id
, for instance, which gets set by some tokenizers, such as when using byte encoding). - lowercase_tokens :
bool
, optional (default =False
)
IfTrue
, we will calltoken.lower()
before getting an index for the token from the vocabulary. - start_tokens :
List[str]
, optional (default =None
)
These are prepended to the tokens provided totokens_to_indices
. - end_tokens :
List[str]
, optional (default =None
)
These are appended to the tokens provided totokens_to_indices
. - feature_name :
str
, optional (default ="text"
)
We will use theToken
attribute with this name as input. This is potentially useful, e.g., for using NER tags instead of (or in addition to) surface forms as your inputs (passingent_type_
here would do that). If you use a non-default value here, you almost certainly want to also change thenamespace
parameter, and you might want to give adefault_value
. - default_value :
str
, optional
When you want to use a non-defaultfeature_name
, you sometimes want to have a default value to go with it, e.g., in case you don't have an NER tag for a particular token, for some reason. This value will get used if we don't find a value infeature_name
. If this is not given, we will crash if a token doesn't have a value for the givenfeature_name
, so that you don't get weird, silent errors by default. - token_min_padding_length :
int
, optional (default =0
)
SeeTokenIndexer
.
count_vocab_items#
class SingleIdTokenIndexer(TokenIndexer):
| ...
| @overrides
| def count_vocab_items(
| self,
| token: Token,
| counter: Dict[str, Dict[str, int]]
| )
tokens_to_indices#
class SingleIdTokenIndexer(TokenIndexer):
| ...
| @overrides
| def tokens_to_indices(
| self,
| tokens: List[Token],
| vocabulary: Vocabulary
| ) -> Dict[str, List[int]]
get_empty_token_list#
class SingleIdTokenIndexer(TokenIndexer):
| ...
| @overrides
| def get_empty_token_list(self) -> IndexedTokenList