token
[ allennlp.data.tokenizers.token ]
Token#
@dataclass(init=False, repr=False)
class Token:
 | def __init__(
 |     self,
 |     text: str = None,
 |     idx: int = None,
 |     idx_end: int = None,
 |     lemma_: str = None,
 |     pos_: str = None,
 |     tag_: str = None,
 |     dep_: str = None,
 |     ent_type_: str = None,
 |     text_id: int = None,
 |     type_id: int = None
 | ) -> None
A simple token representation, keeping track of the token's text, offset in the passage it was taken from, POS tag, dependency relation, and similar information. These fields match spacy's exactly, so we can just use a spacy token for this.
Parameters
- text : str, optional
 The original text represented by this token.
- idx : int, optional
 The character offset of this token into the tokenized passage.
- idx_end : int, optional
 The character offset one past the last character in the tokenized passage.
- lemma_ : str, optional
 The lemma of this token.
- pos_ : str, optional
 The coarse-grained part of speech of this token.
- tag_ : str, optional
 The fine-grained part of speech of this token.
- dep_ : str, optional
 The dependency relation for this token.
- ent_type_ : str, optional
 The entity type (i.e., the NER tag) for this token.
- text_id : int, optional
 If your tokenizer returns integers instead of strings (e.g., because you're doing byte encoding, or some hash-based embedding), set this with the integer. If this is set, we will bypass the vocabulary when indexing this token, regardless of whethertextis also set. You canalsosettextwith the original text, if you want, so that you can still use a character-level representation in addition to a hash-based word embedding.
- 
type_id : int, optional
 Token type id used by some pretrained language models like original BERTThe other fields on Tokenfollow the fields on spacy'sTokenobject; this is one we added, similar to spacy'slex_id.
text#
class Token:
 | ...
 | text: Optional[str] = None
idx#
class Token:
 | ...
 | idx: Optional[int] = None
idx_end#
class Token:
 | ...
 | idx_end: Optional[int] = None
lemma_#
class Token:
 | ...
 | lemma_: Optional[str] = None
pos_#
class Token:
 | ...
 | pos_: Optional[str] = None
tag_#
class Token:
 | ...
 | tag_: Optional[str] = None
dep_#
class Token:
 | ...
 | dep_: Optional[str] = None
ent_type_#
class Token:
 | ...
 | ent_type_: Optional[str] = None
text_id#
class Token:
 | ...
 | text_id: Optional[int] = None
type_id#
class Token:
 | ...
 | type_id: Optional[int] = None
ensure_text#
class Token:
 | ...
 | def ensure_text(self) -> str
Return the text field, raising an exception if it's None.
show_token#
def show_token(token: Token) -> str