bias_utils
allennlp.fairness.bias_utils
load_words¶
def load_words(
fname: Union[str, PathLike],
tokenizer: Tokenizer,
vocab: Optional[Vocabulary] = None,
namespace: str = "tokens",
all_cases: bool = True
) -> List[torch.Tensor]
This function loads a list of words from a file, tokenizes each word into subword tokens, and converts the tokens into IDs.
Parameters¶
- fname :
Union[str, PathLike]
Name of file containing list of words to load. - tokenizer :
Tokenizer
Tokenizer to tokenize words in file. - vocab :
Vocabulary
, optional (default =None
)
Vocabulary of tokenizer. IfNone
, assumes tokenizer is of typePreTrainedTokenizer
and uses tokenizer'svocab
attribute. - namespace :
str
Namespace of vocab to use when tokenizing. - all_cases :
bool
, optional (default =True
)
Whether to tokenize lower, title, and upper cases of each word.
Returns¶
- word_ids :
List[torch.Tensor]
List of tensors containing the IDs of subword tokens for each word in the file.
load_word_pairs¶
def load_word_pairs(
fname: Union[str, PathLike],
tokenizer: Tokenizer,
vocab: Optional[Vocabulary] = None,
namespace: str = "token",
all_cases: bool = True
) -> Tuple[List[torch.Tensor], List[torch.Tensor]]
This function loads a list of pairs of words from a file, tokenizes each word into subword tokens, and converts the tokens into IDs.
Parameters¶
- fname :
Union[str, PathLike]
Name of file containing list of pairs of words to load. - tokenizer :
Tokenizer
Tokenizer to tokenize words in file. - vocab :
Vocabulary
, optional (default =None
)
Vocabulary of tokenizer. IfNone
, assumes tokenizer is of typePreTrainedTokenizer
and uses tokenizer'svocab
attribute. - namespace :
str
Namespace of vocab to use when tokenizing. - all_cases :
bool
, optional (default =True
)
Whether to tokenize lower, title, and upper cases of each word.
Returns¶
- word_ids :
Tuple[List[torch.Tensor], List[torch.Tensor]]
Pair of lists of tensors containing the IDs of subword tokens for words in the file.