allennlp.modules.openai_transformer¶
An implementation of the OpenAI Transformer Language Model.
Mostly just a slightly modified version of https://github.com/huggingface/pytorch-openai-transformer-lm so thanks to them!
Some of these modules duplicate code elsewhere in AllenNLP, but the serialized weights depend on the exact parameter setup here, so it’s easiest to just reimplement them.
- 
class 
allennlp.modules.openai_transformer.Attention(nx: int, n_ctx: int, config: allennlp.modules.openai_transformer.TransformerConfig, scale: bool = False)[source]¶ Bases:
torch.nn.modules.module.Module- 
forward(self, x: torch.Tensor) → torch.Tensor[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- 
 
- 
class 
allennlp.modules.openai_transformer.Block(n_ctx: int, config: allennlp.modules.openai_transformer.TransformerConfig, scale: bool = False)[source]¶ Bases:
torch.nn.modules.module.Module- 
forward(self, x: torch.Tensor) → torch.Tensor[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- 
 
- 
class 
allennlp.modules.openai_transformer.Conv1D(nf: int, rf: int, nx: int)[source]¶ Bases:
torch.nn.modules.module.Module- 
forward(self, x: torch.Tensor) → torch.Tensor[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- 
 
- 
class 
allennlp.modules.openai_transformer.LayerNorm(n_state, e=1e-05)[source]¶ Bases:
torch.nn.modules.module.ModuleConstruct a layernorm module in the OpenAI style (epsilon inside the square root).
- 
forward(self, x)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- 
 
- 
class 
allennlp.modules.openai_transformer.MLP(n_state: int, config: allennlp.modules.openai_transformer.TransformerConfig)[source]¶ Bases:
torch.nn.modules.module.Module- 
forward(self, x: torch.Tensor) → torch.Tensor[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- 
 
- 
class 
allennlp.modules.openai_transformer.OpenaiTransformer(vocab_size: int = 40478, n_ctx: int = 512, embedding_dim: int = 768, num_heads: int = 12, num_layers: int = 12, embedding_dropout_probability: float = 0.1, attention_dropout_probability: float = 0.1, residual_dropout_probability: float = 0.1, activation_function: str = 'gelu', model_path: str = None, requires_grad: bool = False, n_special: int = -1)[source]¶ Bases:
torch.nn.modules.module.Module,allennlp.common.from_params.FromParamsOpenai transformer, as per https://blog.openai.com/language-unsupervised/. Default parameters are the ones for their pretrained model.
- Parameters
 - vocab_size: ``int`` (optional, default: 40478)
 The size of the vocabulary (number of byte pair embeddings) excluding the n_special embeddings (if any), and the positional embeddings.
- n_ctx: ``int`` (optional, default: 512)
 The number of positional encodings to use for evaluation.
- embedding_dim: ``int`` (optional, default: 768)
 The dimension of the output embeddings.
- num_heads: ``int`` (optional, default: 12)
 How many “heads” the attention has.
- num_layers: ``int`` (optional, default: 12)
 How many layers of “blocks” the transformer has.
- embedding_dropout_probability: ``float`` (optional, default: 0.1)
 Dropout for the embedding.
- attention_dropout_probability: ``float`` (optional, default: 0.1)
 Dropout for attention.
- residual_dropout_probability: ``float`` (optional, default: 0.1)
 Dropout for residual
- activation_function: ``str`` (optional, default: ``’gelu’``)
 Activation function for the multi-layer perceptron.
- model_path: ``str`` (optional, default: ``None``)
 A tar.gz file containing serialized model weights. If supplied, the weights will be loaded from that file.
- requires_grad: ``bool`` (optional, default: ``False``)
 If true, the transformer will be fine-tuneable.
- n_special: ``int`` (optional, default: ``-1``)
 The number of special tokens added to the byte pair vocabulary (via
OpenaiTransformerBytePairIndexer).
- 
forward(self, x: torch.Tensor) → List[torch.Tensor][source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- 
load_weights(self, transformer_model_path: str, n_ctx: int = -1, n_special: int = -1, n_transfer: int = 12, n_embd: int = 768, names: List[str] = ['model/we:0', 'model/h0/attn/c_attn/w:0', 'model/h0/attn/c_attn/b:0', 'model/h0/attn/c_proj/w:0', 'model/h0/attn/c_proj/b:0', 'model/h0/ln_1/g:0', 'model/h0/ln_1/b:0', 'model/h0/mlp/c_fc/w:0', 'model/h0/mlp/c_fc/b:0', 'model/h0/mlp/c_proj/w:0', 'model/h0/mlp/c_proj/b:0', 'model/h0/ln_2/g:0', 'model/h0/ln_2/b:0', 'model/h1/attn/c_attn/w:0', 'model/h1/attn/c_attn/b:0', 'model/h1/attn/c_proj/w:0', 'model/h1/attn/c_proj/b:0', 'model/h1/ln_1/g:0', 'model/h1/ln_1/b:0', 'model/h1/mlp/c_fc/w:0', 'model/h1/mlp/c_fc/b:0', 'model/h1/mlp/c_proj/w:0', 'model/h1/mlp/c_proj/b:0', 'model/h1/ln_2/g:0', 'model/h1/ln_2/b:0', 'model/h2/attn/c_attn/w:0', 'model/h2/attn/c_attn/b:0', 'model/h2/attn/c_proj/w:0', 'model/h2/attn/c_proj/b:0', 'model/h2/ln_1/g:0', 'model/h2/ln_1/b:0', 'model/h2/mlp/c_fc/w:0', 'model/h2/mlp/c_fc/b:0', 'model/h2/mlp/c_proj/w:0', 'model/h2/mlp/c_proj/b:0', 'model/h2/ln_2/g:0', 'model/h2/ln_2/b:0', 'model/h3/attn/c_attn/w:0', 'model/h3/attn/c_attn/b:0', 'model/h3/attn/c_proj/w:0', 'model/h3/attn/c_proj/b:0', 'model/h3/ln_1/g:0', 'model/h3/ln_1/b:0', 'model/h3/mlp/c_fc/w:0', 'model/h3/mlp/c_fc/b:0', 'model/h3/mlp/c_proj/w:0', 'model/h3/mlp/c_proj/b:0', 'model/h3/ln_2/g:0', 'model/h3/ln_2/b:0', 'model/h4/attn/c_attn/w:0', 'model/h4/attn/c_attn/b:0', 'model/h4/attn/c_proj/w:0', 'model/h4/attn/c_proj/b:0', 'model/h4/ln_1/g:0', 'model/h4/ln_1/b:0', 'model/h4/mlp/c_fc/w:0', 'model/h4/mlp/c_fc/b:0', 'model/h4/mlp/c_proj/w:0', 'model/h4/mlp/c_proj/b:0', 'model/h4/ln_2/g:0', 'model/h4/ln_2/b:0', 'model/h5/attn/c_attn/w:0', 'model/h5/attn/c_attn/b:0', 'model/h5/attn/c_proj/w:0', 'model/h5/attn/c_proj/b:0', 'model/h5/ln_1/g:0', 'model/h5/ln_1/b:0', 'model/h5/mlp/c_fc/w:0', 'model/h5/mlp/c_fc/b:0', 'model/h5/mlp/c_proj/w:0', 'model/h5/mlp/c_proj/b:0', 'model/h5/ln_2/g:0', 'model/h5/ln_2/b:0', 'model/h6/attn/c_attn/w:0', 'model/h6/attn/c_attn/b:0', 'model/h6/attn/c_proj/w:0', 'model/h6/attn/c_proj/b:0', 'model/h6/ln_1/g:0', 'model/h6/ln_1/b:0', 'model/h6/mlp/c_fc/w:0', 'model/h6/mlp/c_fc/b:0', 'model/h6/mlp/c_proj/w:0', 'model/h6/mlp/c_proj/b:0', 'model/h6/ln_2/g:0', 'model/h6/ln_2/b:0', 'model/h7/attn/c_attn/w:0', 'model/h7/attn/c_attn/b:0', 'model/h7/attn/c_proj/w:0', 'model/h7/attn/c_proj/b:0', 'model/h7/ln_1/g:0', 'model/h7/ln_1/b:0', 'model/h7/mlp/c_fc/w:0', 'model/h7/mlp/c_fc/b:0', 'model/h7/mlp/c_proj/w:0', 'model/h7/mlp/c_proj/b:0', 'model/h7/ln_2/g:0', 'model/h7/ln_2/b:0', 'model/h8/attn/c_attn/w:0', 'model/h8/attn/c_attn/b:0', 'model/h8/attn/c_proj/w:0', 'model/h8/attn/c_proj/b:0', 'model/h8/ln_1/g:0', 'model/h8/ln_1/b:0', 'model/h8/mlp/c_fc/w:0', 'model/h8/mlp/c_fc/b:0', 'model/h8/mlp/c_proj/w:0', 'model/h8/mlp/c_proj/b:0', 'model/h8/ln_2/g:0', 'model/h8/ln_2/b:0', 'model/h9/attn/c_attn/w:0', 'model/h9/attn/c_attn/b:0', 'model/h9/attn/c_proj/w:0', 'model/h9/attn/c_proj/b:0', 'model/h9/ln_1/g:0', 'model/h9/ln_1/b:0', 'model/h9/mlp/c_fc/w:0', 'model/h9/mlp/c_fc/b:0', 'model/h9/mlp/c_proj/w:0', 'model/h9/mlp/c_proj/b:0', 'model/h9/ln_2/g:0', 'model/h9/ln_2/b:0', 'model/h10/attn/c_attn/w:0', 'model/h10/attn/c_attn/b:0', 'model/h10/attn/c_proj/w:0', 'model/h10/attn/c_proj/b:0', 'model/h10/ln_1/g:0', 'model/h10/ln_1/b:0', 'model/h10/mlp/c_fc/w:0', 'model/h10/mlp/c_fc/b:0', 'model/h10/mlp/c_proj/w:0', 'model/h10/mlp/c_proj/b:0', 'model/h10/ln_2/g:0', 'model/h10/ln_2/b:0', 'model/h11/attn/c_attn/w:0', 'model/h11/attn/c_attn/b:0', 'model/h11/attn/c_proj/w:0', 'model/h11/attn/c_proj/b:0', 'model/h11/ln_1/g:0', 'model/h11/ln_1/b:0', 'model/h11/mlp/c_fc/w:0', 'model/h11/mlp/c_fc/b:0', 'model/h11/mlp/c_proj/w:0', 'model/h11/mlp/c_proj/b:0', 'model/h11/ln_2/g:0', 'model/h11/ln_2/b:0', 'model/clf/w:0', 'model/clf/b:0']) → None[source]¶ 
- 
class 
allennlp.modules.openai_transformer.TransformerConfig[source]¶ Bases:
tupleThe transformer has to pass a bunch of params to its submodules, this bundles them together to make things easier.
- 
property 
activation_function¶ Alias for field number 5
- 
property 
attention_dropout_probability¶ Alias for field number 3
- 
property 
embedding_dim¶ Alias for field number 0
- 
property 
embedding_dropout_probability¶ Alias for field number 2
- 
property 
num_heads¶ Alias for field number 1
- 
property 
residual_dropout_probability¶ Alias for field number 4
- 
property