allennlp.commands.elmo¶
The elmo
subcommand allows you to make bulk ELMo predictions.
Given a pre-processed input text file, this command outputs the internal layers used to compute ELMo representations to a single (potentially large) file.
The input file is previously tokenized, whitespace separated text, one sentence per line. The output is a hdf5 file (<https://h5py.readthedocs.io/en/latest/>) where, with the –all flag, each sentence is a size (3, num_tokens, 1024) array with the biLM representations.
For information, see “Deep contextualized word representations”, Peters et al 2018. https://arxiv.org/abs/1802.05365
$ allennlp elmo --help
usage: allennlp elmo [-h] (--all | --top | --average)
[--vocab-path VOCAB_PATH] [--options-file OPTIONS_FILE]
[--weight-file WEIGHT_FILE] [--batch-size BATCH_SIZE]
[--file-friendly-logging] [--cuda-device CUDA_DEVICE]
[--forget-sentences] [--use-sentence-keys]
[--include-package INCLUDE_PACKAGE]
input_file output_file
Create word vectors using ELMo.
positional arguments:
input_file The path to the input file.
output_file The path to the output file.
optional arguments:
-h, --help show this help message and exit
--all Output all three ELMo vectors.
--top Output the top ELMo vector.
--average Output the average of the ELMo vectors.
--vocab-path VOCAB_PATH
A path to a vocabulary file to generate.
--options-file OPTIONS_FILE
The path to the ELMo options file. (default = https://
allennlp.s3.amazonaws.com/models/elmo/2x4096_512_2048c
nn_2xhighway/elmo_2x4096_512_2048cnn_2xhighway_options
.json)
--weight-file WEIGHT_FILE
The path to the ELMo weight file. (default = https://a
llennlp.s3.amazonaws.com/models/elmo/2x4096_512_2048cn
n_2xhighway/elmo_2x4096_512_2048cnn_2xhighway_weights.
hdf5)
--batch-size BATCH_SIZE
The batch size to use. (default = 64)
--file-friendly-logging
outputs tqdm status on separate lines and slows tqdm
refresh rate.
--cuda-device CUDA_DEVICE
The cuda_device to run on. (default = -1)
--forget-sentences If this flag is specified, and --use-sentence-keys is
not, remove the string serialized JSON dictionary that
associates sentences with their line number (its HDF5
key) that is normally placed in the
"sentence_to_index" HDF5 key.
--use-sentence-keys Normally a sentence's line number is used as the HDF5
key for its embedding. If this flag is specified, the
sentence itself will be used as the key.
--include-package INCLUDE_PACKAGE
additional packages to include