0.1a1

This document describes the Algorithm Management Toolkit (AMT) Reporting Standard.

For reproducibility, governance, auditing and sharing of algorithmic systems it is essential to have a reporting standard so that information about an algorithmic system can be shared. This reporting standard describes how information about the different phases of an algorithm's life cycle can be reported. It contains, among other things, descriptive information combined with information about the technical tests and assessments applied.

Disclaimer

The AMT Reporting Standard is work in progress. This means that the current standard is probably suboptimal and will change significantly in future versions.

Introduction

Inspired by Model Cards for Model Reporting and Papers with Code Model Index this standard almost ¹ ² ³ ⁴ extends the Hugging Face model card metadata specification to allow for:

More fine-grained information on performance metrics, by extending the metrics_field from the Hugging Face metadata specification.
Capturing additional measurements on fairness and bias, which can be partitioned into bar plot like measurements (such as mean absolute SHAP values) and graph plot like measurements (such as partial dependence). This is achieved by defining a new field measurements.
Capturing assessments (such as IAMA and ALTAI). This is achieved by defining a new field assessments.

Following Hugging Face, this proposed standard will be written in yaml.

This standard does not contain all fields present in the Hugging Face metadata specification. The fields that are optional in the Hugging Face specification and are specific to the Hugging Face interface are omitted.

Another difference is that we divide our implementation into three separate parts.

system_card, containing information about a group of ML-models which accomplish a specific task.
model_card, containing information about a specific data science model.
assessment_card, containing information about a regulatory assessment.

Include statements

These model_cards and assessment_cards can be included verbatim into a system_card, or referenced with an !include statement, allowing for minimal cards to be compact in a single file. Extensive cards can be split up for readability and maintainability. Our standard allows for the !include to be used anywhere.

Specification of the standard

The standard will be written in yaml. Example yaml files are given in the next section. The standard defines three cards: a system_card, a model_card and an assessment_card. A system_card contains information about an algorithmic system. It can have multiple models and each of these models should have a model_card. Regulatory assessments can be processed in an assessment_card. Note that model_card's and assessment_card's can be included directly into the system_card or can be included as separate yaml files with help of a yaml-include mechanism. For clarity the latter is preferred and is also used in the examples in the next section.

`system_card`

A system_card contains the following information.

schema_version (REQUIRED, string). Version of the schema used, for example "0.1a1".
name (OPTIONAL, string). Name used to describe the system.
upl (OPTIONAL, string). If this algorithm is part of a product offered by the Dutch Government, it should contain a URI from the Uniform Product List.
owners (list). There can be multiple owners. For each owner the following fields are present.
1. oin (OPTIONAL, string). If applicable the Organisatie-identificatienummer (OIN).
2. organization (OPTIONAL, string). Name of the organization that owns the model. If ion is NOT provided this field is REQUIRED.
3. name (OPTIONAL, string). Name of a contact person within the organization.
4. email (OPTIONAL, string). Email address of the contact person or organization.
5. role (OPTIONAL, string). Role of the contact person. This field should only be set when the name field is set.
description (OPTIONAL, string). A short description of the system.
labels (OPTIONAL, list). This fields allows to store meta information about a system. There can be multiple labels. For each label the following fields are present.
1. name (OPTIONAL, string). Name of the label.
2. value (OPTIONAL, string). Value of the label.
status (OPTIONAL, string). The status of the system. For example the status can be "production".
publication_category (OPTIONAL, enum[string]). The publication category of the algorithm should be chosen from ["high_risk", other"].
begin_date (OPTIONAL, string). The first date the system was used. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD.
end_date (OPTIONAL, string). The last date the system was used. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD.
goal_and_impact (OPTIONAL, string). The purpose of the system and the impact it has on citizens and companies.
considerations (OPTIONAL, string). The pro's and con's of using the system.
risk_management (OPTIONAL, string). Description of the risks associated with the system.
human_intervention (OPTIONAL, string). A description to want extend there is human involvement in the system.
legal_base (OPTIONAL, list). If there exists a legal base for the process the system is embedded in, this field can be filled in with the relevant laws. There can be multiple legal bases. For each legal base the following fields are present.
1. name (OPTIONAL, string). Name of the law.
2. link (OPTIONAL, string). URI pointing towards the contents of the law.
used_data (OPTIONAL, string). An overview of the data that is used in the system.
technical_design (OPTIONAL, string). Description on how the system works.
external_providers (OPTIONAL, list[string]). Name of an external provider, if relevant. There can be multiple external providers.
references (OPTIONAL, list[string]). Additional reference URI's that point information about the system and are relevant.

1. Models

models (OPTIONAL, list[ModelCard]). A list of model cards (as defined below) or !includes of a yaml file containing a model card. This model card can for example be a model card described in the next section or a model card from Hugging Face. There can be multiple model cards, meaning multiple models are used.

2. Assessments

assessments (OPTIONAL, list[AssessmentCard]). A list of assessment cards (as defined below) or !includes of a yaml file containing a assessment card. This assessment card is an assessment card described in the next section. There can be multiple assessment cards, meaning multiple assessment were performed.

`model_card`

A model_card contains the following information.

language (OPTIONAL, list[string]). If relevant, the natural languages the model supports in ISO 639. There can be multiple languages.
license(REQUIRED, string). Any license from the open source license list ¹. If the license is NOT present in the license list this field must be set to 'other' and the following two fields will be REQUIRED.
1. license_name (string). An id for the license.
2. license_link (string). A link to a file of that name inside the repo, or a URL to a remote file containing the license contents.
tags (OPTIONAL, list[string]). Tags with keywords to describe the project. There can be multiple tags.
owners (list). There can be multiple owners. For each owner the following fields are present.
1. oin (OPTIONAL, string). If applicable the Organisatie-identificatienummer (OIN).
2. organization (OPTIONAL, string). Name of the organization that owns the model. If ion is NOT provided this field is REQUIRED.
3. name (OPTIONAL, string). Name of a contact person within the organization.
4. email (OPTIONAL, string). Email address of the contact person or organization.
5. role (OPTIONAL, string). Role of the contact person. This field should only be set when the name field is set.

1. Model Index

There can be multiple models. For each model the following fields are present.

name (REQUIRED, string). The name of the model.
model (REQUIRED, string). A URI pointing to a repository containing the model file.
artifacts (OPTIONAL, list[string]). A list of URI's where each URI refers to a relevant model artifact, that cannot be captured by any other field, but are relevant to model.
parameters (list). There can be multiple parameters. For each parameter the following fields are present.
1. name (REQUIRED, string). The name of the parameter, for example "epochs".
2. dtype (OPTIONAL, string). The datatype of the parameter, for example "int".
3. value (OPTIONAL, string). The value of the parameter, for example 100.
4. labels (list). This field allows to store meta information about a parameter. There can be multiple labels. For each label the following fields are present.
  1. name (OPTIONAL, string). The name of the label.
  2. dtype (OPTIONAL, string). The datatype of the feature. If name is set, this field is REQUIRED.
  3. value (OPTIONAL, string). The value of the feature. If name is set, this field is REQUIRED.
results (list). There can be multiple results. For each result the following fields are present.
1. task (OPTIONAL, list).
  1. task_type (REQUIRED, string). The task of the model, for example "object-classification".
  2. task_name (OPTIONAL, string). A pretty name for the model tasks, for example "Object Classification".
2. datasets (list). There can be multiple datasets ². For each dataset the following fields are present.
  1. type (REQUIRED, string). The type of the dataset, can be a dataset id from Hugging Face datasets or any other link to a repository containing the dataset³, for example "common_voice".
  2. name (REQUIRED, string). Name pretty name for the dataset, for example "Common Voice (French)".
  3. split (OPTIONAL, string). The split of the dataset, for example "train".
  4. features (OPTIONAL, list[string]). List of feature names.
  5. revision (OPTIONAL, string). Version of the dataset, for example 5503434ddd753f426f4b38109466949a1217c2bb.
3. metrics (list). There can be multiple metrics. For each metric the following fields are present.
  1. type (REQUIRED, string). A metric-id from Hugging Face metrics⁴, for example accuracy.
  2. name (REQUIRED, string). A descriptive name of the metric. For example "false positive rate" is not a descriptive name, but "training false positive rate w.r.t class x" is.
  3. dtype (REQUIRED, string). The data type of the metric, for example float.
  4. value (REQUIRED, string). The value of the metric.
  5. labels (list). This field allows to store meta information about a metric. For example, metrics can be computed for example on subgroups of specific features. For example, one can compute the accuracy for examples where the feature "gender" is set to "male". There can be multiple subgroups, which means that the metric is computed on the intersection of those subgroups. There can be multiple labels. For each label the following fields are present.
    1. name (OPTIONAL, string). The name of the feature. For example: "gender".
    2. type (OPTIONAL, string). The type of the label. Can for example be set to "feature" or "output_class". If name is set, this field is REQUIRED.
    3. dtype (OPTIONAL, string). The datatype of the feature, for example float. If name is set, this field is REQUIRED.
    4. value (OPTIONAL, string). The value of the feature. If name is set, this field is REQUIRED. For example: "male".
4. measurements.
  1. bar_plots (list). The purpose of this field is to capture bar plot like measurements, for example SHAP values. There can be multiple bar plots. For each bar plot the following fields are present.
    1. type (REQUIRED, string). The type of bar plot, for example "SHAP".
    2. name (OPTIONAL, string). A pretty name for the plot, for example "Mean Absolute SHAP Values".
    3. results (list). The contents of the bar plot. A result represents a bar. There can be multiple results. For each result the following fields are present.
      1. name (REQUIRED, string). The name of bar.
      2. value (REQUIRED, float). The value of the corresponding bar.
  2. graph_plots (list). The purpose of this field is to capture graph plot like measurements, such as partial dependence plots. There can be multiple graph plots. For each graph plot the following fields are present.
    1. type (REQUIRED, string). The type of the graph plot, for example "partial_dependence".
    2. name (OPTIONAL, string). A pretty name of the graph, for example "Partial Dependence Plot".
    3. results (list). Results contains the graph plot data. Each graph can depend on a specific output class and feature. There can be multiple results. For each result the following fields are present.
      1. class (OPTIONAL, string/int/float/bool). The output class name that the graph corresponds to. This field is not always present.
      2. feature (REQUIRED, string). The feature the graph corresponds to. This is required, since all relevant graphs are dependent on features.
      3. data (list)
        
        x_value (REQUIRED, float). The $x$-value of the graph.
        
        y_value (REQUIRED, float). The $y$-value of the graph.

`assessment_card`

An assessment_card contains the following information.

name (REQUIRED, string). The name of the assessment.
date (REQUIRED, string). The date at which the assessment is completed. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD.
contents (list). There can be multiple items in contents. For each item the following fields are present:
1. question (REQUIRED, string). A question.
2. answer (REQUIRED, string). An answer.
3. remarks (OPTIONAL, string). A field to put relevant discussion remarks in.
4. authors. There can be multiple names. For each name the following field is present.
  1. name (OPTIONAL, string). The name of the author of the question.
5. timestamp (OPTIONAL, string). A timestamp of the date and time of the answer.

Example

System Card

version: {system_card_version}                          # Optional. Example: "0.1a1"
name: {system_name}                                     # Optional. Example: "AangifteVertrekBuitenland"
upl: {upl_uri}                                          # Optional. Example: https://standaarden.overheid.nl/owms/terms/AangifteVertrekBuitenland
owners:
- oin: {oin}                                            # Optional. Example: 00000001003214345000
  organization: {organization_name}                     # Optional if oin is provided, Required otherwise. Example: BZK
  name: {owner_name}                                    # Optional. Example: John Doe
  email: {owner_email}                                  # Optional. Example: johndoe@email.com
  role: {owner_role}                                    # Optional. Example: Data Scientist.
description: {system_description}                       # Optional. Short description of the system.
labels:                                                 # Optional. Labels to store metadata about the system.
- name: {label_name}                                    # Optional.
  value: {label_value}                                  # Optional.
status: {system_status}                                 # Optional. Example "production".
publication_category: {system_publication_cat}          # Optional. Example: "impactful_algorithm".
begin_date: {system_begin_date}                         # Optional. Example: 2025-1-1.
end_date: {system_end_date}                             # Optional. Example: 2025-12-1.
goal_and_impact: {system_goal_and_impact}               # Optional. Goal and impact of the system.
considerations: {system_considerations}                 # Optional. Considerations about the system.
risk_management: {system_risk_management}               # Optional. Description of risks associated with the system.
human_intervention: {system_human_intervention}         # Optional. Description of human involvement in the system.
legal_base:
- name: {law_name}                                      # Optional. Example: "AVG".
  link: {law_uri}                                       # Optional. Example: "https://eur-lex.europa.eu/legal-content/NL/TXT/HTML/?uri=CELEX:31995L0046".
used_data: {system_used_data}                           # Optional. Description of the data used by the system.
technical_design: {technical_design}                    # Optional. Description of the technical design of the system.
external_providers:
- {system_external_provider}                            # Optional. Reference to used external providers.
references:
- {reference_uri}                                       # Optional. Example: URI to codebase.

models:
- !include {model_card_uri}                             # Optional. Example: cat_classifier_model.yaml.

assessments:
- !include {assessment_card_uri}                        # Required. Example: iama.yaml.

Model Card

language:
  - {lang_0}                                            # Optional. Example nl.
license: {license}                                      # Required. Example: Apache-2.0 or any license SPDX ID from https://opensource.org/license or "other".
license_name: {license_name}                            # Optional if license != other, Required otherwise. Example: 'my-license-1.0'
license_link: {license_link}                            # Optional if license != other, Required otherwise. Specify "LICENSE" or "LICENSE.md" to link to a file of that name inside the repo, or a URL to a remote file.
tags:
- {tag_0}                                               # Optional. Example: audio
- {tag_1}                                               # Optional. Example: automatic-speech-recognition
owners:
- organization: {organization_name}                     # Required. Example: BZK
  oin: {oin}                                            # Optional. Example: 00000001003214345000
  name: {owner_name}                                    # Optional. Example: John Doe
  email: {owner_email}                                  # Optional. Example: johndoe@email.com
  role: {owner_role}                                    # Optional. Example: Data Scientist.

model-index:
- name: {model_id}                                      # Required. Example: CatClassifier.
  model: {model_uri}                                    # Required. URI to a repository containing the model file.
  artifacts:
  - {model_artifact}                                    # Optional. URI to relevant model artifacts, if applicable.
  parameters:
  - name: {parameter_name}                              # Optional. Example: "epochs".
    dtype: {parameter_dtype}                            # Optional. Example: "int".
    value: {parameter_value}                            # Optional. Example: 100.
    labels:
      - name: {label_name}                              # Optional. Example: "gender".
        dtype: {label_type}                             # Optional. Example: "string".
        value: {label_value}                            # Optional. Example: "female".
  results:
  - task:
      type: {task_type}                                 # Required. Example: image-classification.
      name: {task_name}                                 # Optional. Example: Image Classification.
    datasets:
      - type: {dataset_type}                            # Required. Example: common_voice. Link to a repository containing the dataset
        name: {dataset_name}                            # Required. Example: "Common Voice (French)". A pretty name for the dataset.
        split: {split}                                  # Optional. Example: "train".
        features:
         - {feature_name}                               # Optional. Example: "gender".
        revision: {dataset_version}                     # Optional. Example: 5503434ddd753f426f4b38109466949a1217c2bb
    metrics:
    - type: {metric_type}                               # Required. Example: false-positive-rate. Use metric id from https://hf.co/metrics.
      name: {metric_name}                               # Required. Example: "FPR wrt class 0 restricted to feature gender:0 and age:21".
      dtype: {metric_dtype}                             # Required. Example: "float".
      value: {metric_value}                             # Required. Example: 0.75.
      labels:
        - name: {label_name}                            # Optional. Example: "gender".
          type: {label_type}                            # Optional. Example: "feature".
          dtype: {label_type}                           # Optional. Example: "string".
          value: {label_value}                          # Optional. Example: "female".
    measurements:
      # Bar plots should be able to capture SHAP and Robustness Toolbox from AI Verify.
      bar_plots:
      - type: {measurement_type}                        # Required. Example: "SHAP".
        name: {measurement_name}                        # Optional. Example: "Mean Absolute Shap Values".
        results:
        - name: {bar_name}                              # Required. The name of a bar.
          value: {bar_value}                            # Required. The corresponding value.
      # Graph plots should be able to capture graph based measurements such as partial dependence and accumulated local effect.
      graph_plots:
      - type: {measurement_type}                        # Required. Example: "partial_dependence".
        name: {measurement_name}                        # Optional. Example: "Partial Dependence Plot".
        # Results store the graph plot data. So far all plots are dependent on a combination of a specific class (sometimes) and feature (always).
        # For example partial dependence plots are made for each feature and class.
        results:
         - class: {class_name}                          # Optional. Name of the output class the graph depends on.
           feature: {feature_name}                      # Required. Name of the feature the graph depends on.
           data:
            - x_value: {x_value}                        # Required. The x value of the graph data.
              y_value: {y_value}                        # Required. The y value of the graph data.

Assessment Card

name: {assessment_name}                               # Required. Example: IAMA.
date: {assessment_date}                               # Required. Example: 25-03-2025.
contents:
  - question: {question_text}                         # Required. Example: "Question 1: ...".
    answer: {answer_text}                             # Required. Example: "Answer: ...".
    remarks: {remarks_text}                           # Optional. Example: "Remarks: ...".
    authors:                                          # Optional. Example: "['John', 'Peter']".
      - name: {author_name}
    timestamp: {timestamp}                            # Optional. Example: 1711630721.

Schema

JSON schema will be added when we publish the first beta version.

Deviation from the Hugging Face specification is in the License field. Hugging Face only accepts dataset id's from Hugging Face license list while we accept any license from Open Source License List. ↩↩
Deviation from the Hugging Face specification is in the model_index:results:dataset field. Hugging Face only accepts one dataset, while we accept a list of datasets. ↩↩
Deviation from the Hugging Face specification is in the Dataset Type field. Hugging Face only accepts dataset id's from Hugging Face datasets while we also allow for any url pointing to the dataset. ↩↩
For this extension to work relevant metrics (such as for example false positive rate) have to be added to the Hugging Face metrics, possibly this can be done in our organizational namespace. ↩↩