allennlp.modules.openai_transformer¶
An implementation of the OpenAI Transformer Language Model.
Mostly just a slightly modified version of https://github.com/huggingface/pytorch-openai-transformer-lm so thanks to them!
Some of these modules duplicate code elsewhere in AllenNLP, but the serialized weights depend on the exact parameter setup here, so it’s easiest to just reimplement them.
-
class
allennlp.modules.openai_transformer.
Attention
(nx: int, n_ctx: int, config: allennlp.modules.openai_transformer.TransformerConfig, scale: bool = False)[source]¶ Bases:
torch.nn.modules.module.Module
-
forward
(self, x: torch.Tensor) → torch.Tensor[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
-
class
allennlp.modules.openai_transformer.
Block
(n_ctx: int, config: allennlp.modules.openai_transformer.TransformerConfig, scale: bool = False)[source]¶ Bases:
torch.nn.modules.module.Module
-
forward
(self, x: torch.Tensor) → torch.Tensor[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
-
class
allennlp.modules.openai_transformer.
Conv1D
(nf: int, rf: int, nx: int)[source]¶ Bases:
torch.nn.modules.module.Module
-
forward
(self, x: torch.Tensor) → torch.Tensor[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
-
class
allennlp.modules.openai_transformer.
LayerNorm
(n_state, e=1e-05)[source]¶ Bases:
torch.nn.modules.module.Module
Construct a layernorm module in the OpenAI style (epsilon inside the square root).
-
forward
(self, x)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
-
class
allennlp.modules.openai_transformer.
MLP
(n_state: int, config: allennlp.modules.openai_transformer.TransformerConfig)[source]¶ Bases:
torch.nn.modules.module.Module
-
forward
(self, x: torch.Tensor) → torch.Tensor[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
-
class
allennlp.modules.openai_transformer.
OpenaiTransformer
(vocab_size: int = 40478, n_ctx: int = 512, embedding_dim: int = 768, num_heads: int = 12, num_layers: int = 12, embedding_dropout_probability: float = 0.1, attention_dropout_probability: float = 0.1, residual_dropout_probability: float = 0.1, activation_function: str = 'gelu', model_path: str = None, requires_grad: bool = False, n_special: int = -1)[source]¶ Bases:
torch.nn.modules.module.Module
,allennlp.common.from_params.FromParams
Openai transformer, as per https://blog.openai.com/language-unsupervised/. Default parameters are the ones for their pretrained model.
- Parameters
- vocab_size: ``int`` (optional, default: 40478)
The size of the vocabulary (number of byte pair embeddings) excluding the n_special embeddings (if any), and the positional embeddings.
- n_ctx: ``int`` (optional, default: 512)
The number of positional encodings to use for evaluation.
- embedding_dim: ``int`` (optional, default: 768)
The dimension of the output embeddings.
- num_heads: ``int`` (optional, default: 12)
How many “heads” the attention has.
- num_layers: ``int`` (optional, default: 12)
How many layers of “blocks” the transformer has.
- embedding_dropout_probability: ``float`` (optional, default: 0.1)
Dropout for the embedding.
- attention_dropout_probability: ``float`` (optional, default: 0.1)
Dropout for attention.
- residual_dropout_probability: ``float`` (optional, default: 0.1)
Dropout for residual
- activation_function: ``str`` (optional, default: ``’gelu’``)
Activation function for the multi-layer perceptron.
- model_path: ``str`` (optional, default: ``None``)
A tar.gz file containing serialized model weights. If supplied, the weights will be loaded from that file.
- requires_grad: ``bool`` (optional, default: ``False``)
If true, the transformer will be fine-tuneable.
- n_special: ``int`` (optional, default: ``-1``)
The number of special tokens added to the byte pair vocabulary (via
OpenaiTransformerBytePairIndexer
).
-
forward
(self, x: torch.Tensor) → List[torch.Tensor][source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
load_weights
(self, transformer_model_path: str, n_ctx: int = -1, n_special: int = -1, n_transfer: int = 12, n_embd: int = 768, names: List[str] = ['model/we:0', 'model/h0/attn/c_attn/w:0', 'model/h0/attn/c_attn/b:0', 'model/h0/attn/c_proj/w:0', 'model/h0/attn/c_proj/b:0', 'model/h0/ln_1/g:0', 'model/h0/ln_1/b:0', 'model/h0/mlp/c_fc/w:0', 'model/h0/mlp/c_fc/b:0', 'model/h0/mlp/c_proj/w:0', 'model/h0/mlp/c_proj/b:0', 'model/h0/ln_2/g:0', 'model/h0/ln_2/b:0', 'model/h1/attn/c_attn/w:0', 'model/h1/attn/c_attn/b:0', 'model/h1/attn/c_proj/w:0', 'model/h1/attn/c_proj/b:0', 'model/h1/ln_1/g:0', 'model/h1/ln_1/b:0', 'model/h1/mlp/c_fc/w:0', 'model/h1/mlp/c_fc/b:0', 'model/h1/mlp/c_proj/w:0', 'model/h1/mlp/c_proj/b:0', 'model/h1/ln_2/g:0', 'model/h1/ln_2/b:0', 'model/h2/attn/c_attn/w:0', 'model/h2/attn/c_attn/b:0', 'model/h2/attn/c_proj/w:0', 'model/h2/attn/c_proj/b:0', 'model/h2/ln_1/g:0', 'model/h2/ln_1/b:0', 'model/h2/mlp/c_fc/w:0', 'model/h2/mlp/c_fc/b:0', 'model/h2/mlp/c_proj/w:0', 'model/h2/mlp/c_proj/b:0', 'model/h2/ln_2/g:0', 'model/h2/ln_2/b:0', 'model/h3/attn/c_attn/w:0', 'model/h3/attn/c_attn/b:0', 'model/h3/attn/c_proj/w:0', 'model/h3/attn/c_proj/b:0', 'model/h3/ln_1/g:0', 'model/h3/ln_1/b:0', 'model/h3/mlp/c_fc/w:0', 'model/h3/mlp/c_fc/b:0', 'model/h3/mlp/c_proj/w:0', 'model/h3/mlp/c_proj/b:0', 'model/h3/ln_2/g:0', 'model/h3/ln_2/b:0', 'model/h4/attn/c_attn/w:0', 'model/h4/attn/c_attn/b:0', 'model/h4/attn/c_proj/w:0', 'model/h4/attn/c_proj/b:0', 'model/h4/ln_1/g:0', 'model/h4/ln_1/b:0', 'model/h4/mlp/c_fc/w:0', 'model/h4/mlp/c_fc/b:0', 'model/h4/mlp/c_proj/w:0', 'model/h4/mlp/c_proj/b:0', 'model/h4/ln_2/g:0', 'model/h4/ln_2/b:0', 'model/h5/attn/c_attn/w:0', 'model/h5/attn/c_attn/b:0', 'model/h5/attn/c_proj/w:0', 'model/h5/attn/c_proj/b:0', 'model/h5/ln_1/g:0', 'model/h5/ln_1/b:0', 'model/h5/mlp/c_fc/w:0', 'model/h5/mlp/c_fc/b:0', 'model/h5/mlp/c_proj/w:0', 'model/h5/mlp/c_proj/b:0', 'model/h5/ln_2/g:0', 'model/h5/ln_2/b:0', 'model/h6/attn/c_attn/w:0', 'model/h6/attn/c_attn/b:0', 'model/h6/attn/c_proj/w:0', 'model/h6/attn/c_proj/b:0', 'model/h6/ln_1/g:0', 'model/h6/ln_1/b:0', 'model/h6/mlp/c_fc/w:0', 'model/h6/mlp/c_fc/b:0', 'model/h6/mlp/c_proj/w:0', 'model/h6/mlp/c_proj/b:0', 'model/h6/ln_2/g:0', 'model/h6/ln_2/b:0', 'model/h7/attn/c_attn/w:0', 'model/h7/attn/c_attn/b:0', 'model/h7/attn/c_proj/w:0', 'model/h7/attn/c_proj/b:0', 'model/h7/ln_1/g:0', 'model/h7/ln_1/b:0', 'model/h7/mlp/c_fc/w:0', 'model/h7/mlp/c_fc/b:0', 'model/h7/mlp/c_proj/w:0', 'model/h7/mlp/c_proj/b:0', 'model/h7/ln_2/g:0', 'model/h7/ln_2/b:0', 'model/h8/attn/c_attn/w:0', 'model/h8/attn/c_attn/b:0', 'model/h8/attn/c_proj/w:0', 'model/h8/attn/c_proj/b:0', 'model/h8/ln_1/g:0', 'model/h8/ln_1/b:0', 'model/h8/mlp/c_fc/w:0', 'model/h8/mlp/c_fc/b:0', 'model/h8/mlp/c_proj/w:0', 'model/h8/mlp/c_proj/b:0', 'model/h8/ln_2/g:0', 'model/h8/ln_2/b:0', 'model/h9/attn/c_attn/w:0', 'model/h9/attn/c_attn/b:0', 'model/h9/attn/c_proj/w:0', 'model/h9/attn/c_proj/b:0', 'model/h9/ln_1/g:0', 'model/h9/ln_1/b:0', 'model/h9/mlp/c_fc/w:0', 'model/h9/mlp/c_fc/b:0', 'model/h9/mlp/c_proj/w:0', 'model/h9/mlp/c_proj/b:0', 'model/h9/ln_2/g:0', 'model/h9/ln_2/b:0', 'model/h10/attn/c_attn/w:0', 'model/h10/attn/c_attn/b:0', 'model/h10/attn/c_proj/w:0', 'model/h10/attn/c_proj/b:0', 'model/h10/ln_1/g:0', 'model/h10/ln_1/b:0', 'model/h10/mlp/c_fc/w:0', 'model/h10/mlp/c_fc/b:0', 'model/h10/mlp/c_proj/w:0', 'model/h10/mlp/c_proj/b:0', 'model/h10/ln_2/g:0', 'model/h10/ln_2/b:0', 'model/h11/attn/c_attn/w:0', 'model/h11/attn/c_attn/b:0', 'model/h11/attn/c_proj/w:0', 'model/h11/attn/c_proj/b:0', 'model/h11/ln_1/g:0', 'model/h11/ln_1/b:0', 'model/h11/mlp/c_fc/w:0', 'model/h11/mlp/c_fc/b:0', 'model/h11/mlp/c_proj/w:0', 'model/h11/mlp/c_proj/b:0', 'model/h11/ln_2/g:0', 'model/h11/ln_2/b:0', 'model/clf/w:0', 'model/clf/b:0']) → None[source]¶
-
class
allennlp.modules.openai_transformer.
TransformerConfig
[source]¶ Bases:
tuple
The transformer has to pass a bunch of params to its submodules, this bundles them together to make things easier.
-
property
activation_function
¶ Alias for field number 5
-
property
attention_dropout_probability
¶ Alias for field number 3
-
property
embedding_dim
¶ Alias for field number 0
-
property
embedding_dropout_probability
¶ Alias for field number 2
-
property
num_heads
¶ Alias for field number 1
-
property
residual_dropout_probability
¶ Alias for field number 4
-
property