Skip to content

model_card

allennlp.common.model_card

[SOURCE]


A specification for defining model cards as described in Model Cards for Model Reporting (Mitchell et al, 2019)

The descriptions of the fields and some examples are taken from the paper.

The specification is provided to prompt model developers to think about the various aspects that should ideally be reported. The information filled should adhere to the spirit of transparency rather than the letter; i.e., it should not be filled for the sake of being filled. If the information cannot be inferred, it should be left empty.

get_description

def get_description(model_class)

Returns the model's description from the docstring.

ModelCardInfo

class ModelCardInfo(FromParams)

to_dict

class ModelCardInfo(FromParams):
 | ...
 | def to_dict(self)

Only the non-empty attributes are returned, to minimize empty values.

Paper

@dataclass(frozen=True)
class Paper(ModelCardInfo)

This provides information about the paper.

Parameters

  • title : str
    The name of the paper.

  • url : str
    A web link to the paper.

  • citation : str
    The BibTex for the paper.

title

class Paper(ModelCardInfo):
 | ...
 | title: Optional[str] = None

url

class Paper(ModelCardInfo):
 | ...
 | url: Optional[str] = None

citation

class Paper(ModelCardInfo):
 | ...
 | citation: Optional[str] = None

ModelDetails

class ModelDetails(ModelCardInfo):
 | def __init__(
 |     self,
 |     description: Optional[str] = None,
 |     short_description: Optional[str] = None,
 |     developed_by: Optional[str] = None,
 |     contributed_by: Optional[str] = None,
 |     date: Optional[str] = None,
 |     version: Optional[str] = None,
 |     model_type: Optional[str] = None,
 |     paper: Optional[Union[str, Dict, Paper]] = None,
 |     license: Optional[str] = None,
 |     contact: Optional[str] = None
 | )

This provides the basic information about the model.

Parameters

  • description : str
    A high-level overview of the model. Eg. The model implements a reading comprehension model patterned after the proposed model in Devlin et al, 2018, with improvements borrowed from the SQuAD model in the transformers project. It predicts start tokens and end tokens with a linear layer on top of word piece embeddings.

  • short_description : str
    A one-line description of the model. Eg. A reading comprehension model patterned after RoBERTa, with improvements borrowed from the SQuAD model in the transformers project.

  • developed_by : str
    Person/organization that developed the model. This can be used by all stakeholders to infer details pertaining to model development and potential conflicts of interest.

  • contributed_by : str
    Person that contributed the model to the repository.

  • date : str
    The date on which the model was contributed. This is useful for all stakeholders to become further informed on what techniques and data sources were likely to be available during model development. Format example: 2020-09-23

  • version : str
    The version of the model, and how it differs from previous versions. This is useful for all stakeholders to track whether the model is the latest version, associate known bugs to the correct model versions, and aid in model comparisons.

  • model_type : str
    The type of the model; the basic architecture. This is likely to be particularly relevant for software and model developers, as well as individuals knowledgeable about machine learning, to highlight what kinds of assumptions are encoded in the system. Eg. Naive Bayes Classifier.

  • paper : Union[str, Dict, Paper]
    The paper on which the model is based. Format example: { "title": "Model Cards for Model Reporting (Mitchell et al, 2019)", "url": "https://api.semanticscholar.org/CorpusID:52946140", "citation": "", }

  • license : str
    License information for the model.

  • contact : str
    The email address to reach out to the relevant developers/contributors for questions/feedback about the model.

IntendedUse

@dataclass(frozen=True)
class IntendedUse(ModelCardInfo)

This determines what the model should and should not be used for.

Parameters

  • primary_uses : str
    Details the primary intended uses of the model; whether it was developed for general or specific tasks. Eg. The toxic text identifier model was developed to identify toxic comments on online platforms. An example use case is to provide feedback to comment authors.

  • primary_users : str
    The primary intended users. For example, was the model developed for entertainment purposes, for hobbyists, or enterprise solutions? This helps users gain insight into how robust the model may be to different kinds of inputs.

  • out_of_scope_use_cases : str
    Highlights the technology that the model might easily be confused with, or related contexts that users could try to apply the model to. Eg. the toxic text identifier model is not intended for fully automated moderation, or to make judgements about specific individuals.

    Also recommends a related or similar model that was designed to better meet a particular need, where possible. Eg. not for use on text examples longer than 100 tokens; please use the bigger-toxic-text-identifier instead.

primary_uses

class IntendedUse(ModelCardInfo):
 | ...
 | primary_uses: Optional[str] = None

primary_users

class IntendedUse(ModelCardInfo):
 | ...
 | primary_users: Optional[str] = None

out_of_scope_use_cases

class IntendedUse(ModelCardInfo):
 | ...
 | out_of_scope_use_cases: Optional[str] = None

Factors

@dataclass(frozen=True)
class Factors(ModelCardInfo)

This provides a summary of relevant factors such as demographics, instrumentation used, etc. for which the model performance may vary.

Parameters

  • relevant_factors : str
    The foreseeable salient factors for which model performance may vary, and how these were determined. Eg. the model performance may vary for variations in dialects of English.

  • evaluation_factors : str
    Mentions the factors that are being reported, and the reasons for why they were chosen. Also includes the reasons for choosing different evaluation factors than relevant factors.

    Eg. While dialect variation is a relevant factor, dialect-specific annotations were not available, and hence, the performance was not evaluated on different dialects.

relevant_factors

class Factors(ModelCardInfo):
 | ...
 | relevant_factors: Optional[str] = None

evaluation_factors

class Factors(ModelCardInfo):
 | ...
 | evaluation_factors: Optional[str] = None

Metrics

@dataclass(frozen=True)
class Metrics(ModelCardInfo)

This lists the reported metrics and the reasons for choosing them.

Parameters

  • model_performance_measures : str
    Which model performance measures were selected and the reasons for selecting them.
  • decision_thresholds : str
    If decision thresholds are used, what are they, and the reasons for choosing them.
  • variation_approaches : str
    How are the measurements and estimations of these metrics calculated? Eg. standard deviation, variance, confidence intervals, KL divergence. Details of how these values are approximated should also be included. Eg. average of 5 runs, 10-fold cross-validation, etc.

model_performance_measures

class Metrics(ModelCardInfo):
 | ...
 | model_performance_measures: Optional[str] = None

decision_thresholds

class Metrics(ModelCardInfo):
 | ...
 | decision_thresholds: Optional[str] = None

variation_approaches

class Metrics(ModelCardInfo):
 | ...
 | variation_approaches: Optional[str] = None

Dataset

@dataclass(frozen=True)
class Dataset(ModelCardInfo)

This provides basic information about the dataset.

Parameters

  • name : str
    The name of the dataset.

  • url : str
    A web link to the dataset information/datasheet.

  • processed_url : str
    A web link to a downloadable/directly usable version of the dataset, if available.

  • notes : str
    Any other notes on downloading/processing the data.

name

class Dataset(ModelCardInfo):
 | ...
 | name: Optional[str] = None

url

class Dataset(ModelCardInfo):
 | ...
 | url: Optional[str] = None

processed_url

class Dataset(ModelCardInfo):
 | ...
 | processed_url: Optional[str] = None

notes

class Dataset(ModelCardInfo):
 | ...
 | notes: Optional[str] = None

EvaluationData

class EvaluationData(ModelCardInfo):
 | def __init__(
 |     self,
 |     dataset: Optional[Union[str, Dict, Dataset]] = None,
 |     motivation: Optional[str] = None,
 |     preprocessing: Optional[str] = None
 | )

This provides information about the evaluation data.

Parameters

  • dataset : Union[str, Dict, Dataset]
    The name(s) (and link(s), if available) of the dataset(s) used to evaluate the model. Optionally, provide a link to the relevant datasheet(s) as well.
  • motivation : str
    The reasons for selecting the dataset(s). Eg. For the BERT model, document-level corpora were used rather than a shuffled sentence-level corpus in order to extract long contiguous sequences.
  • preprocessing : str
    How was the data preprocessed for evaluation? Eg. tokenization of sentences, filtering of paragraphs by length, etc.

to_dict

class EvaluationData(ModelCardInfo):
 | ...
 | def to_dict(self)

TrainingData

class TrainingData(ModelCardInfo):
 | def __init__(
 |     self,
 |     dataset: Optional[Union[str, Dict, Dataset]] = None,
 |     motivation: Optional[str] = None,
 |     preprocessing: Optional[str] = None
 | )

This provides information about the training data. If the model was initialized from pretrained weights, a link to the pretrained model's model card/training data can additionally be provided, if available. Any relevant definitions should also be included.

Parameters

  • dataset : Union[str, Dict, Dataset]
    The name(s) (and link(s), if available) of the dataset(s) used to train the model. Optionally, provide a link to the relevant datasheet(s) as well. Eg. * Proprietary data from Perspective API; includes comments from online forums such as Wikipedia and New York Times, with crowdsourced labels of whether the comment is "toxic". * "Toxic" is defined as "a rude, disrespectful, or unreasonable comment that is likely to make you leave a discussion."
  • motivation : str
    The reasons for selecting the dataset(s). Eg. For the BERT model, document-level corpora were used rather than a shuffled sentence-level corpus in order to extract long contiguous sequences.
  • preprocessing : str
    Eg. Only the text passages were extracted from English Wikipedia; lists, tables, and headers were ignored.

to_dict

class TrainingData(ModelCardInfo):
 | ...
 | def to_dict(self)

QuantitativeAnalyses

@dataclass(frozen=True)
class QuantitativeAnalyses(ModelCardInfo)

This provides disaggregated evaluation of how the model performed based on chosen metrics, with confidence intervals, if possible. Links to plots/figures showing the metrics can also be provided.

Parameters

  • unitary_results : str
    The performance of the model with respect to each chosen factor.
  • intersectional_results : str
    The performance of the model with respect to the intersection of the evaluated factors.

unitary_results

class QuantitativeAnalyses(ModelCardInfo):
 | ...
 | unitary_results: Optional[str] = None

intersectional_results

class QuantitativeAnalyses(ModelCardInfo):
 | ...
 | intersectional_results: Optional[str] = None

ModelEthicalConsiderations

@dataclass(frozen=True)
class ModelEthicalConsiderations(ModelCardInfo)

This highlights any ethical considerations to keep in mind when using the model. Eg. Is the model intended to be used for informing decisions on human life? Does it use sensitive data? What kind of risks are possible, and what mitigation strategies were used to address them? Eg. The model does not take into account user history when making judgments about toxicity, due to privacy concerns.

ethical_considerations

class ModelEthicalConsiderations(ModelCardInfo):
 | ...
 | ethical_considerations: Optional[str] = None

ModelCaveatsAndRecommendations

@dataclass(frozen=True)
class ModelCaveatsAndRecommendations(ModelCardInfo)

This lists any additional concerns. For instance, were any relevant groups not present in the evaluation data? Eg. The evaluation data is synthetically designed to be representative of common use cases and concerns, but may not be comprehensive.

caveats_and_recommendations

class ModelCaveatsAndRecommendations(ModelCardInfo):
 | ...
 | caveats_and_recommendations: Optional[str] = None

ModelUsage

class ModelUsage(ModelCardInfo):
 | def __init__(
 |     self,
 |     archive_file: Optional[str] = None,
 |     training_config: Optional[str] = None,
 |     install_instructions: Optional[str] = None,
 |     overrides: Optional[Dict] = None
 | )

archive_file : str, optional The location of model's pretrained weights. training_config : str, optional A url to the training config. install_instructions : str, optional Any additional instructions for installations. overrides : Dict, optional Optional overrides for the model's architecture.

ModelCard

class ModelCard(ModelCardInfo):
 | def __init__(
 |     self,
 |     id: str,
 |     registered_model_name: Optional[str] = None,
 |     model_class: Optional[Callable[..., Model]] = None,
 |     registered_predictor_name: Optional[str] = None,
 |     display_name: Optional[str] = None,
 |     task_id: Optional[str] = None,
 |     model_usage: Optional[Union[str, ModelUsage]] = None,
 |     model_details: Optional[Union[str, ModelDetails]] = None,
 |     intended_use: Optional[Union[str, IntendedUse]] = None,
 |     factors: Optional[Union[str, Factors]] = None,
 |     metrics: Optional[Union[str, Metrics]] = None,
 |     evaluation_data: Optional[Union[str, EvaluationData]] = None,
 |     training_data: Optional[Union[str, TrainingData]] = None,
 |     quantitative_analyses: Optional[Union[str, QuantitativeAnalyses]] = None,
 |     model_ethical_considerations: Optional[Union[str, ModelEthicalConsiderations]] = None,
 |     model_caveats_and_recommendations: Optional[
 |             Union[str, ModelCaveatsAndRecommendations]
 |         ] = None
 | )

The model card stores the recommended attributes for model reporting.

Parameters

  • id : str
    Model's id, following the convention of task-model-relevant-details. Example: rc-bidaf-elmo for a reading comprehension BiDAF model using ELMo embeddings.
  • registered_model_name : str, optional
    The model's registered name. If model_class is not given, this will be used to find any available Model registered with this name.
  • model_class : type, optional
    If given, the ModelCard will pull some default information from the class.
  • registered_predictor_name : str, optional
    The registered name of the corresponding predictor.
  • display_name : str, optional
    The pretrained model's display name.
  • task_id : str, optional
    The id of the task for which the model was built.
  • model_usage : Union[ModelUsage, str], optional
  • model_details : Union[ModelDetails, str], optional
  • intended_use : Union[IntendedUse, str], optional
  • factors : Union[Factors, str], optional
  • metrics : Union[Metrics, str], optional
  • evaluation_data : Union[EvaluationData, str], optional
  • quantitative_analyses : Union[QuantitativeAnalyses, str], optional
  • ethical_considerations : Union[ModelEthicalConsiderations, str], optional
  • caveats_and_recommendations : Union[ModelCaveatsAndRecommendations, str], optional

Note

For all the fields that are Union[ModelCardInfo, str], a str input will be treated as the first argument of the relevant constructor.

to_dict

class ModelCard(ModelCardInfo):
 | ...
 | def to_dict(self) -> Dict[str, Any]

Converts the ModelCard to a flat dictionary object. This can be converted to json and passed to any front-end.