Skip to content

model_card

allennlp.common.model_card

[SOURCE]


A specification for defining model cards as described in Model Cards for Model Reporting (Mitchell et al, 2019)

The descriptions of the fields and some examples are taken from the paper.

The specification is provided to prompt model developers to think about the various aspects that should ideally be reported. The information filled should adhere to the spirit of transparency rather than the letter; i.e., it should not be filled for the sake of being filled. If the information cannot be inferred, it should be left empty.

get_description#

def get_description(model_class)

Returns the model's description from the docstring.

ModelCardInfo#

class ModelCardInfo(FromParams)

to_dict#

class ModelCardInfo(FromParams):
 | ...
 | def to_dict(self)

Only the non-empty attributes are returned, to minimize empty values.

ModelDetails#

@dataclass(frozen=True)
class ModelDetails(ModelCardInfo)

This provides the basic information about the model.

Parameters

  • description : str
    A high-level overview of the model. Eg. The model implements a reading comprehension model patterned after the proposed model in Devlin et al, 2018, with improvements borrowed from the SQuAD model in the transformers project. It predicts start tokens and end tokens with a linear layer on top of word piece embeddings.

  • short_description : str
    A one-line description of the model. Eg. A reading comprehension model patterned after RoBERTa, with improvements borrowed from the SQuAD model in the transformers project.

  • developed_by : str
    Person/organization that developed the model. This can be used by all stakeholders to infer details pertaining to model development and potential conflicts of interest.

  • contributed_by : str
    Person that contributed the model to the repository.

  • date : str
    The date on which the model was contributed. This is useful for all stakeholders to become further informed on what techniques and data sources were likely to be available during model development. Format example: 2020-09-23

  • version : str
    The version of the model, and how it differs from previous versions. This is useful for all stakeholders to track whether the model is the latest version, associate known bugs to the correct model versions, and aid in model comparisons.

  • model_type : str
    The type of the model; the basic architecture. This is likely to be particularly relevant for software and model developers, as well as individuals knowledgeable about machine learning, to highlight what kinds of assumptions are encoded in the system. Eg. Naive Bayes Classifier.

  • paper : str
    The paper on which the model is based. Format example: Model Cards for Model Reporting (Mitchell et al, 2019)

  • citation : str
    The BibTex for the paper.

  • license : str
    License information for the model.

  • contact : str
    The email address to reach out to the relevant developers/contributors for questions/feedback about the model.

  • training_config : str
    Link to training configuration.

description#

class ModelDetails(ModelCardInfo):
 | ...
 | description: Optional[str] = None

short_description#

class ModelDetails(ModelCardInfo):
 | ...
 | short_description: Optional[str] = None

developed_by#

class ModelDetails(ModelCardInfo):
 | ...
 | developed_by: Optional[str] = None

contributed_by#

class ModelDetails(ModelCardInfo):
 | ...
 | contributed_by: Optional[str] = None

date#

class ModelDetails(ModelCardInfo):
 | ...
 | date: Optional[str] = None

version#

class ModelDetails(ModelCardInfo):
 | ...
 | version: Optional[str] = None

model_type#

class ModelDetails(ModelCardInfo):
 | ...
 | model_type: Optional[str] = None

paper#

class ModelDetails(ModelCardInfo):
 | ...
 | paper: Optional[str] = None

citation#

class ModelDetails(ModelCardInfo):
 | ...
 | citation: Optional[str] = None

license#

class ModelDetails(ModelCardInfo):
 | ...
 | license: Optional[str] = None

contact#

class ModelDetails(ModelCardInfo):
 | ...
 | contact: Optional[str] = None

training_config#

class ModelDetails(ModelCardInfo):
 | ...
 | training_config: Optional[str] = None

IntendedUse#

@dataclass(frozen=True)
class IntendedUse(ModelCardInfo)

This determines what the model should and should not be used for.

Parameters

  • primary_uses : str
    Details the primary intended uses of the model; whether it was developed for general or specific tasks. Eg. The toxic text identifier model was developed to identify toxic comments on online platforms. An example use case is to provide feedback to comment authors.

  • primary_users : str
    The primary intended users. For example, was the model developed for entertainment purposes, for hobbyists, or enterprise solutions? This helps users gain insight into how robust the model may be to different kinds of inputs.

  • out_of_scope_use_cases : str
    Highlights the technology that the model might easily be confused with, or related contexts that users could try to apply the model to. Eg. the toxic text identifier model is not intended for fully automated moderation, or to make judgements about specific individuals.

    Also recommends a related or similar model that was designed to better meet a particular need, where possible. Eg. not for use on text examples longer than 100 tokens; please use the bigger-toxic-text-identifier instead.

primary_uses#

class IntendedUse(ModelCardInfo):
 | ...
 | primary_uses: Optional[str] = None

primary_users#

class IntendedUse(ModelCardInfo):
 | ...
 | primary_users: Optional[str] = None

out_of_scope_use_cases#

class IntendedUse(ModelCardInfo):
 | ...
 | out_of_scope_use_cases: Optional[str] = None

Factors#

@dataclass(frozen=True)
class Factors(ModelCardInfo)

This provides a summary of relevant factors such as demographics, instrumentation used, etc. for which the model performance may vary.

Parameters

  • relevant_factors : str
    The foreseeable salient factors for which model performance may vary, and how these were determined. Eg. the model performance may vary for variations in dialects of English.

  • evaluation_factors : str
    Mentions the factors that are being reported, and the reasons for why they were chosen. Also includes the reasons for choosing different evaluation factors than relevant factors.

    Eg. While dialect variation is a relevant factor, dialect-specific annotations were not available, and hence, the performance was not evaluated on different dialects.

relevant_factors#

class Factors(ModelCardInfo):
 | ...
 | relevant_factors: Optional[str] = None

evaluation_factors#

class Factors(ModelCardInfo):
 | ...
 | evaluation_factors: Optional[str] = None

Metrics#

@dataclass(frozen=True)
class Metrics(ModelCardInfo)

This lists the reported metrics and the reasons for choosing them.

Parameters

  • model_performance_measures : str
    Which model performance measures were selected and the reasons for selecting them.
  • decision_thresholds : str
    If decision thresholds are used, what are they, and the reasons for choosing them.
  • variation_approaches : str
    How are the measurements and estimations of these metrics calculated? Eg. standard deviation, variance, confidence intervals, KL divergence. Details of how these values are approximated should also be included. Eg. average of 5 runs, 10-fold cross-validation, etc.

model_performance_measures#

class Metrics(ModelCardInfo):
 | ...
 | model_performance_measures: Optional[str] = None

decision_thresholds#

class Metrics(ModelCardInfo):
 | ...
 | decision_thresholds: Optional[str] = None

variation_approaches#

class Metrics(ModelCardInfo):
 | ...
 | variation_approaches: Optional[str] = None

EvaluationData#

@dataclass(frozen=True)
class EvaluationData(ModelCardInfo)

This provides information about the evaluation data.

Parameters

  • dataset : str
    The name(s) (and link(s), if available) of the dataset(s) used to evaluate the model. Optionally, provide a link to the relevant datasheet(s) as well.
  • motivation : str
    The reasons for selecting the dataset(s). Eg. For the BERT model, document-level corpora were used rather than a shuffled sentence-level corpus in order to extract long contiguous sequences.
  • preprocessing : str
    How was the data preprocessed for evaluation? Eg. tokenization of sentences, filtering of paragraphs by length, etc.

dataset#

class EvaluationData(ModelCardInfo):
 | ...
 | dataset: Optional[str] = None

motivation#

class EvaluationData(ModelCardInfo):
 | ...
 | motivation: Optional[str] = None

preprocessing#

class EvaluationData(ModelCardInfo):
 | ...
 | preprocessing: Optional[str] = None

to_dict#

class EvaluationData(ModelCardInfo):
 | ...
 | def to_dict(self)

TrainingData#

@dataclass(frozen=True)
class TrainingData(ModelCardInfo)

This provides information about the training data. If the model was initialized from pretrained weights, a link to the pretrained model's model card/training data can additionally be provided, if available. Any relevant definitions should also be included.

Parameters

  • dataset : str
    The name(s) (and link(s), if available) of the dataset(s) used to train the model. Optionally, provide a link to the relevant datasheet(s) as well. Eg. * Proprietary data from Perspective API; includes comments from online forums such as Wikipedia and New York Times, with crowdsourced labels of whether the comment is "toxic". * "Toxic" is defined as "a rude, disrespectful, or unreasonable comment that is likely to make you leave a discussion."
  • motivation : str
    The reasons for selecting the dataset(s). Eg. For the BERT model, document-level corpora were used rather than a shuffled sentence-level corpus in order to extract long contiguous sequences.
  • preprocessing : str
    Eg. Only the text passages were extracted from English Wikipedia; lists, tables, and headers were ignored.

dataset#

class TrainingData(ModelCardInfo):
 | ...
 | dataset: Optional[str] = None

motivation#

class TrainingData(ModelCardInfo):
 | ...
 | motivation: Optional[str] = None

preprocessing#

class TrainingData(ModelCardInfo):
 | ...
 | preprocessing: Optional[str] = None

to_dict#

class TrainingData(ModelCardInfo):
 | ...
 | def to_dict(self)

QuantitativeAnalyses#

@dataclass(frozen=True)
class QuantitativeAnalyses(ModelCardInfo)

This provides disaggregated evaluation of how the model performed based on chosen metrics, with confidence intervals, if possible. Links to plots/figures showing the metrics can also be provided.

Parameters

  • unitary_results : str
    The performance of the model with respect to each chosen factor.
  • intersectional_results : str
    The performance of the model with respect to the intersection of the evaluated factors.

unitary_results#

class QuantitativeAnalyses(ModelCardInfo):
 | ...
 | unitary_results: Optional[str] = None

intersectional_results#

class QuantitativeAnalyses(ModelCardInfo):
 | ...
 | intersectional_results: Optional[str] = None

EthicalConsiderations#

@dataclass(frozen=True)
class EthicalConsiderations(ModelCardInfo)

This highlights any ethical considerations to keep in mind when using the model. Eg. Is the model intended to be used for informing decisions on human life? Does it use sensitive data? What kind of risks are possible, and what mitigation strategies were used to address them? Eg. The model does not take into account user history when making judgments about toxicity, due to privacy concerns.

ethical_considerations#

class EthicalConsiderations(ModelCardInfo):
 | ...
 | ethical_considerations: Optional[str] = None

CaveatsAndRecommendations#

@dataclass(frozen=True)
class CaveatsAndRecommendations(ModelCardInfo)

This lists any additional concerns. For instance, were any relevant groups not present in the evaluation data? Eg. The evaluation data is synthetically designed to be representative of common use cases and concerns, but may not be comprehensive.

caveats_and_recommendations#

class CaveatsAndRecommendations(ModelCardInfo):
 | ...
 | caveats_and_recommendations: Optional[str] = None

ModelCard#

class ModelCard(ModelCardInfo):
 | def __init__(
 |     self,
 |     id: str,
 |     registered_model_name: Optional[str] = None,
 |     model_class: Optional[Callable[..., Model]] = None,
 |     registered_predictor_name: Optional[str] = None,
 |     display_name: Optional[str] = None,
 |     task_id: Optional[str] = None,
 |     archive_file: Optional[str] = None,
 |     overrides: Optional[Dict] = None,
 |     model_details: Optional[Union[str, ModelDetails]] = None,
 |     intended_use: Optional[Union[str, IntendedUse]] = None,
 |     factors: Optional[Union[str, Factors]] = None,
 |     metrics: Optional[Union[str, Metrics]] = None,
 |     evaluation_data: Optional[Union[str, EvaluationData]] = None,
 |     training_data: Optional[Union[str, TrainingData]] = None,
 |     quantitative_analyses: Optional[Union[str, QuantitativeAnalyses]] = None,
 |     ethical_considerations: Optional[Union[str, EthicalConsiderations]] = None,
 |     caveats_and_recommendations: Optional[Union[str, CaveatsAndRecommendations]] = None
 | )

The model card stores the recommended attributes for model reporting.

Parameters

  • id : str
    Model's id, following the convention of task-model-relevant-details. Example: rc-bidaf-elmo for a reading comprehension BiDAF model using ELMo embeddings.
  • registered_model_name : str, optional
    The model's registered name. If model_class is not given, this will be used to find any available Model registered with this name.
  • model_class : type, optional
    If given, the ModelCard will pull some default information from the class.
  • registered_predictor_name : str, optional
    The registered name of the corresponding predictor.
  • display_name : str, optional
    The pretrained model's display name.
  • archive_file : str, optional
    The location of model's pretrained weights.
  • overrides : Dict, optional
    Optional overrides for the model's architecture.
  • model_details : Union[ModelDetails, str], optional
  • intended_use : Union[IntendedUse, str], optional
  • factors : Union[Factors, str], optional
  • metrics : Union[Metrics, str], optional
  • evaluation_data : Union[EvaluationData, str], optional
  • quantitative_analyses : Union[QuantitativeAnalyses, str], optional
  • ethical_considerations : Union[EthicalConsiderations, str], optional
  • caveats_and_recommendations : Union[CaveatsAndRecommendations, str], optional

Note

For all the fields that are Union[ModelCardInfo, str], a str input will be treated as the first argument of the relevant constructor.

to_dict#

class ModelCard(ModelCardInfo):
 | ...
 | def to_dict(self) -> Dict[str, Any]

Converts the ModelCard to a flat dictionary object. This can be converted to json and passed to any front-end.