allennlp.modules.openai_transformer

An implementation of the OpenAI Transformer Language Model.

Mostly just a slightly modified version of https://github.com/huggingface/pytorch-openai-transformer-lm so thanks to them!

Some of these modules duplicate code elsewhere in AllenNLP, but the serialized weights depend on the exact parameter setup here, so it’s easiest to just reimplement them.

class allennlp.modules.openai_transformer.Attention(nx: int, n_ctx: int, config: allennlp.modules.openai_transformer.TransformerConfig, scale: bool = False)[source]

Bases: torch.nn.modules.module.Module

forward(self, x: torch.Tensor) → torch.Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

merge_heads(self, x: torch.Tensor)[source]
split_heads(self, x: torch.Tensor, k: bool = False)[source]
class allennlp.modules.openai_transformer.Block(n_ctx: int, config: allennlp.modules.openai_transformer.TransformerConfig, scale: bool = False)[source]

Bases: torch.nn.modules.module.Module

forward(self, x: torch.Tensor) → torch.Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class allennlp.modules.openai_transformer.Conv1D(nf: int, rf: int, nx: int)[source]

Bases: torch.nn.modules.module.Module

forward(self, x: torch.Tensor) → torch.Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class allennlp.modules.openai_transformer.LayerNorm(n_state, e=1e-05)[source]

Bases: torch.nn.modules.module.Module

Construct a layernorm module in the OpenAI style (epsilon inside the square root).

forward(self, x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class allennlp.modules.openai_transformer.MLP(n_state: int, config: allennlp.modules.openai_transformer.TransformerConfig)[source]

Bases: torch.nn.modules.module.Module

forward(self, x: torch.Tensor) → torch.Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class allennlp.modules.openai_transformer.OpenaiTransformer(vocab_size: int = 40478, n_ctx: int = 512, embedding_dim: int = 768, num_heads: int = 12, num_layers: int = 12, embedding_dropout_probability: float = 0.1, attention_dropout_probability: float = 0.1, residual_dropout_probability: float = 0.1, activation_function: str = 'gelu', model_path: str = None, requires_grad: bool = False, n_special: int = -1)[source]

Bases: torch.nn.modules.module.Module, allennlp.common.from_params.FromParams

Openai transformer, as per https://blog.openai.com/language-unsupervised/. Default parameters are the ones for their pretrained model.

Parameters
vocab_size: ``int`` (optional, default: 40478)

The size of the vocabulary (number of byte pair embeddings) excluding the n_special embeddings (if any), and the positional embeddings.

n_ctx: ``int`` (optional, default: 512)

The number of positional encodings to use for evaluation.

embedding_dim: ``int`` (optional, default: 768)

The dimension of the output embeddings.

num_heads: ``int`` (optional, default: 12)

How many “heads” the attention has.

num_layers: ``int`` (optional, default: 12)

How many layers of “blocks” the transformer has.

embedding_dropout_probability: ``float`` (optional, default: 0.1)

Dropout for the embedding.

attention_dropout_probability: ``float`` (optional, default: 0.1)

Dropout for attention.

residual_dropout_probability: ``float`` (optional, default: 0.1)

Dropout for residual

activation_function: ``str`` (optional, default: ``’gelu’``)

Activation function for the multi-layer perceptron.

model_path: ``str`` (optional, default: ``None``)

A tar.gz file containing serialized model weights. If supplied, the weights will be loaded from that file.

requires_grad: ``bool`` (optional, default: ``False``)

If true, the transformer will be fine-tuneable.

n_special: ``int`` (optional, default: ``-1``)

The number of special tokens added to the byte pair vocabulary (via OpenaiTransformerBytePairIndexer).

dump_weights(self, output_dir: str, num_pieces: int = 10) → None[source]
forward(self, x: torch.Tensor) → List[torch.Tensor][source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

load_weights(self, transformer_model_path: str, n_ctx: int = -1, n_special: int = -1, n_transfer: int = 12, n_embd: int = 768, names: List[str] = ['model/we:0', 'model/h0/attn/c_attn/w:0', 'model/h0/attn/c_attn/b:0', 'model/h0/attn/c_proj/w:0', 'model/h0/attn/c_proj/b:0', 'model/h0/ln_1/g:0', 'model/h0/ln_1/b:0', 'model/h0/mlp/c_fc/w:0', 'model/h0/mlp/c_fc/b:0', 'model/h0/mlp/c_proj/w:0', 'model/h0/mlp/c_proj/b:0', 'model/h0/ln_2/g:0', 'model/h0/ln_2/b:0', 'model/h1/attn/c_attn/w:0', 'model/h1/attn/c_attn/b:0', 'model/h1/attn/c_proj/w:0', 'model/h1/attn/c_proj/b:0', 'model/h1/ln_1/g:0', 'model/h1/ln_1/b:0', 'model/h1/mlp/c_fc/w:0', 'model/h1/mlp/c_fc/b:0', 'model/h1/mlp/c_proj/w:0', 'model/h1/mlp/c_proj/b:0', 'model/h1/ln_2/g:0', 'model/h1/ln_2/b:0', 'model/h2/attn/c_attn/w:0', 'model/h2/attn/c_attn/b:0', 'model/h2/attn/c_proj/w:0', 'model/h2/attn/c_proj/b:0', 'model/h2/ln_1/g:0', 'model/h2/ln_1/b:0', 'model/h2/mlp/c_fc/w:0', 'model/h2/mlp/c_fc/b:0', 'model/h2/mlp/c_proj/w:0', 'model/h2/mlp/c_proj/b:0', 'model/h2/ln_2/g:0', 'model/h2/ln_2/b:0', 'model/h3/attn/c_attn/w:0', 'model/h3/attn/c_attn/b:0', 'model/h3/attn/c_proj/w:0', 'model/h3/attn/c_proj/b:0', 'model/h3/ln_1/g:0', 'model/h3/ln_1/b:0', 'model/h3/mlp/c_fc/w:0', 'model/h3/mlp/c_fc/b:0', 'model/h3/mlp/c_proj/w:0', 'model/h3/mlp/c_proj/b:0', 'model/h3/ln_2/g:0', 'model/h3/ln_2/b:0', 'model/h4/attn/c_attn/w:0', 'model/h4/attn/c_attn/b:0', 'model/h4/attn/c_proj/w:0', 'model/h4/attn/c_proj/b:0', 'model/h4/ln_1/g:0', 'model/h4/ln_1/b:0', 'model/h4/mlp/c_fc/w:0', 'model/h4/mlp/c_fc/b:0', 'model/h4/mlp/c_proj/w:0', 'model/h4/mlp/c_proj/b:0', 'model/h4/ln_2/g:0', 'model/h4/ln_2/b:0', 'model/h5/attn/c_attn/w:0', 'model/h5/attn/c_attn/b:0', 'model/h5/attn/c_proj/w:0', 'model/h5/attn/c_proj/b:0', 'model/h5/ln_1/g:0', 'model/h5/ln_1/b:0', 'model/h5/mlp/c_fc/w:0', 'model/h5/mlp/c_fc/b:0', 'model/h5/mlp/c_proj/w:0', 'model/h5/mlp/c_proj/b:0', 'model/h5/ln_2/g:0', 'model/h5/ln_2/b:0', 'model/h6/attn/c_attn/w:0', 'model/h6/attn/c_attn/b:0', 'model/h6/attn/c_proj/w:0', 'model/h6/attn/c_proj/b:0', 'model/h6/ln_1/g:0', 'model/h6/ln_1/b:0', 'model/h6/mlp/c_fc/w:0', 'model/h6/mlp/c_fc/b:0', 'model/h6/mlp/c_proj/w:0', 'model/h6/mlp/c_proj/b:0', 'model/h6/ln_2/g:0', 'model/h6/ln_2/b:0', 'model/h7/attn/c_attn/w:0', 'model/h7/attn/c_attn/b:0', 'model/h7/attn/c_proj/w:0', 'model/h7/attn/c_proj/b:0', 'model/h7/ln_1/g:0', 'model/h7/ln_1/b:0', 'model/h7/mlp/c_fc/w:0', 'model/h7/mlp/c_fc/b:0', 'model/h7/mlp/c_proj/w:0', 'model/h7/mlp/c_proj/b:0', 'model/h7/ln_2/g:0', 'model/h7/ln_2/b:0', 'model/h8/attn/c_attn/w:0', 'model/h8/attn/c_attn/b:0', 'model/h8/attn/c_proj/w:0', 'model/h8/attn/c_proj/b:0', 'model/h8/ln_1/g:0', 'model/h8/ln_1/b:0', 'model/h8/mlp/c_fc/w:0', 'model/h8/mlp/c_fc/b:0', 'model/h8/mlp/c_proj/w:0', 'model/h8/mlp/c_proj/b:0', 'model/h8/ln_2/g:0', 'model/h8/ln_2/b:0', 'model/h9/attn/c_attn/w:0', 'model/h9/attn/c_attn/b:0', 'model/h9/attn/c_proj/w:0', 'model/h9/attn/c_proj/b:0', 'model/h9/ln_1/g:0', 'model/h9/ln_1/b:0', 'model/h9/mlp/c_fc/w:0', 'model/h9/mlp/c_fc/b:0', 'model/h9/mlp/c_proj/w:0', 'model/h9/mlp/c_proj/b:0', 'model/h9/ln_2/g:0', 'model/h9/ln_2/b:0', 'model/h10/attn/c_attn/w:0', 'model/h10/attn/c_attn/b:0', 'model/h10/attn/c_proj/w:0', 'model/h10/attn/c_proj/b:0', 'model/h10/ln_1/g:0', 'model/h10/ln_1/b:0', 'model/h10/mlp/c_fc/w:0', 'model/h10/mlp/c_fc/b:0', 'model/h10/mlp/c_proj/w:0', 'model/h10/mlp/c_proj/b:0', 'model/h10/ln_2/g:0', 'model/h10/ln_2/b:0', 'model/h11/attn/c_attn/w:0', 'model/h11/attn/c_attn/b:0', 'model/h11/attn/c_proj/w:0', 'model/h11/attn/c_proj/b:0', 'model/h11/ln_1/g:0', 'model/h11/ln_1/b:0', 'model/h11/mlp/c_fc/w:0', 'model/h11/mlp/c_fc/b:0', 'model/h11/mlp/c_proj/w:0', 'model/h11/mlp/c_proj/b:0', 'model/h11/ln_2/g:0', 'model/h11/ln_2/b:0', 'model/clf/w:0', 'model/clf/b:0']) → None[source]
class allennlp.modules.openai_transformer.TransformerConfig[source]

Bases: tuple

The transformer has to pass a bunch of params to its submodules, this bundles them together to make things easier.

property activation_function

Alias for field number 5

property attention_dropout_probability

Alias for field number 3

property embedding_dim

Alias for field number 0

property embedding_dropout_probability

Alias for field number 2

property num_heads

Alias for field number 1

property residual_dropout_probability

Alias for field number 4

allennlp.modules.openai_transformer.gelu(x: torch.Tensor) → torch.Tensor[source]
allennlp.modules.openai_transformer.swish(x: torch.Tensor) → torch.Tensor[source]