allennlp.commands.elmo

The elmo subcommand allows you to make bulk ELMo predictions.

Given a pre-processed input text file, this command outputs the internal layers used to compute ELMo representations to a single (potentially large) file.

The input file is previously tokenized, whitespace separated text, one sentence per line. The output is a hdf5 file (<https://h5py.readthedocs.io/en/latest/>) where, with the –all flag, each sentence is a size (3, num_tokens, 1024) array with the biLM representations.

For information, see “Deep contextualized word representations”, Peters et al 2018. https://arxiv.org/abs/1802.05365

$ allennlp elmo --help
 usage: allennlp elmo [-h] (--all | --top | --average)
                      [--vocab-path VOCAB_PATH] [--options-file OPTIONS_FILE]
                      [--weight-file WEIGHT_FILE] [--batch-size BATCH_SIZE]
                      [--file-friendly-logging] [--cuda-device CUDA_DEVICE]
                      [--forget-sentences] [--use-sentence-keys]
                      [--include-package INCLUDE_PACKAGE]
                      input_file output_file

 Create word vectors using ELMo.

 positional arguments:
   input_file            The path to the input file.
   output_file           The path to the output file.

 optional arguments:
   -h, --help            show this help message and exit
   --all                 Output all three ELMo vectors.
   --top                 Output the top ELMo vector.
   --average             Output the average of the ELMo vectors.
   --vocab-path VOCAB_PATH
                         A path to a vocabulary file to generate.
   --options-file OPTIONS_FILE
                         The path to the ELMo options file. (default = https://
                         allennlp.s3.amazonaws.com/models/elmo/2x4096_512_2048c
                         nn_2xhighway/elmo_2x4096_512_2048cnn_2xhighway_options
                         .json)
   --weight-file WEIGHT_FILE
                         The path to the ELMo weight file. (default = https://a
                         llennlp.s3.amazonaws.com/models/elmo/2x4096_512_2048cn
                         n_2xhighway/elmo_2x4096_512_2048cnn_2xhighway_weights.
                         hdf5)
   --batch-size BATCH_SIZE
                         The batch size to use. (default = 64)
   --file-friendly-logging
                         outputs tqdm status on separate lines and slows tqdm
                         refresh rate.
   --cuda-device CUDA_DEVICE
                         The cuda_device to run on. (default = -1)
   --forget-sentences    If this flag is specified, and --use-sentence-keys is
                         not, remove the string serialized JSON dictionary that
                         associates sentences with their line number (its HDF5
                         key) that is normally placed in the
                         "sentence_to_index" HDF5 key.
   --use-sentence-keys   Normally a sentence's line number is used as the HDF5
                         key for its embedding. If this flag is specified, the
                         sentence itself will be used as the key.
   --include-package INCLUDE_PACKAGE
                         additional packages to include