0.1a4
This document describes the Algorithm Management Toolkit (AMT) Reporting Standard.
For reproducibility, governance, auditing and sharing of algorithmic systems it is essential to have a reporting standard so that information about an algorithmic system can be shared. This reporting standard describes how information about the different phases of an algorithm's life cycle can be reported. It contains, among other things, descriptive information combined with information about the technical tests and assessments applied.
Disclaimer
The AMT Reporting Standard is work in progress. This means that the current standard is probably suboptimal and will change significantly in future versions.
Introduction
Inspired by Model Cards for Model Reporting and Papers with Code Model Index this standard almost 1 2 3 4 extends the Hugging Face model card metadata specification to allow for:
- More fine-grained information on performance metrics, by extending the
metrics_field
from the Hugging Face metadata specification. - Capturing additional measurements on fairness and bias, which can be partitioned into bar plot like
measurements (such as mean absolute SHAP values) and graph plot like measurements (such as partial dependence). This is achieved
by defining a new field
measurements
. - Capturing assessments (such as IAMA
and ALTAI).
This is achieved by defining a new field
assessments
.
Following Hugging Face, this proposed standard will be written in yaml.
This standard does not contain all fields present in the Hugging Face metadata specification. The fields that are optional in the Hugging Face specification and are specific to the Hugging Face interface are omitted.
Another difference is that we divide our implementation into three separate parts.
system_card
, containing information about a group of ML-models which accomplish a specific task.model_card
, containing information about a specific data science model.assessment_card
, containing information about a regulatory assessment.
Include statements
These model_card
s and assessment_card
s can be included verbatim into a system_card
,
or referenced with an !include
statement, allowing for minimal cards to be compact in a single
file. Extensive cards can be split up for readability and maintainability. Our standard allows for
the !include
to be used anywhere.
Specification of the standard
The standard will be written in yaml. Example yaml files are given in the next section. The standard defines
three cards: a system_card
, a model_card
and an assessment_card
. A system_card
contains information
about an algorithmic system. It can have multiple models and each of these models should have a model_card
.
Regulatory assessments can be processed in an assessment_card
. Note that model_card
's and
assessment_card
's can be included directly into the system_card
or can be included as separate yaml
files with help of a yaml-include mechanism. For clarity the latter is preferred and is also used in
the examples in the next section.
system_card
A system_card
contains the following information.
schema_version
(REQUIRED, string). Version of the schema used, for example "0.1a2".-
provenance
(OPTIONAL). In case this System Card is generated from another source file, this field can capture the historical context of the contents of this System Card.git_commit_hash
(OPTIONAL, string). Git commit hash of the commit which contains the transformation file used to create this card.timestamp
(OPTIONAL, string). A timestamp of the date, time and timezone of generation of this System Card. Timestamp should be given, preferably in UTC (represented asZ
), in ISO 8601 format, i.e.2024-04-16T16:48:14Z
.uri
(OPTIONAL, string). URI to the tool that was used to perform the transformations.author
(OPTIONAL, string). Name of person that initiated the transformations.
-
name
(OPTIONAL, string). Name used to describe the system. upl
(OPTIONAL, string). If this algorithm is part of a product offered by the Dutch Government, it should contain a URI from the Uniform Product List.-
owners
(list). There can be multiple owners. For each owner the following fields are present.oin
(OPTIONAL, string). If applicable the Organisatie-identificatienummer (OIN).organization
(OPTIONAL, string). Name of the organization that owns the model. Ifion
is NOT provided this field is REQUIRED.name
(OPTIONAL, string). Name of a contact person within the organization.email
(OPTIONAL, string). Email address of the contact person or organization.role
(OPTIONAL, string). Role of the contact person. This field should only be set when thename
field is set.
-
description
(OPTIONAL, string). A short description of the system. -
labels
(OPTIONAL, list). This fields allows to store meta information about a system. There can be multiple labels. For each label the following fields are present.name
(OPTIONAL, string). Name of the label.value
(OPTIONAL, string). Value of the label.
-
status
(OPTIONAL, string). The status of the system. For example the status can be "production". publication_category
(OPTIONAL, enum[string]). The publication category of the algorithm should be chosen from["high_risk", other"]
.begin_date
(OPTIONAL, string). The first date the system was used. Date should be given in ISO 8601 format, i.e.YYYY-MM-DD
.end_date
(OPTIONAL, string). The last date the system was used. Date should be given in ISO 8601 format, i.e.YYYY-MM-DD
.goal_and_impact
(OPTIONAL, string). The purpose of the system and the impact it has on citizens and companies.considerations
(OPTIONAL, string). The pro's and con's of using the system.risk_management
(OPTIONAL, string). Description of the risks associated with the system.human_intervention
(OPTIONAL, string). A description to want extend there is human involvement in the system.legal_base
(OPTIONAL, list). If there exists a legal base for the process the system is embedded in, this field can be filled in with the relevant laws. There can be multiple legal bases. For each legal base the following fields are present.name
(OPTIONAL, string). Name of the law.link
(OPTIONAL, string). URI pointing towards the contents of the law.
used_data
(OPTIONAL, string). An overview of the data that is used in the system.technical_design
(OPTIONAL, string). Description on how the system works.external_providers
(OPTIONAL, list[string]). Name of an external provider, if relevant. There can be multiple external providers.references
(OPTIONAL, list[string]). Additional reference URI's that point information about the system and are relevant.
1. Models
models
(OPTIONAL, list[ModelCard]). A list of model cards (as defined below) or!include
s of a yaml file containing a model card. This model card can for example be a model card described in the next section or a model card from Hugging Face. There can be multiple model cards, meaning multiple models are used.
2. Assessments
assessments
(OPTIONAL, list[AssessmentCard]). A list of assessment cards (as defined below) or!include
s of a yaml file containing a assessment card. This assessment card is an assessment card described in the next section. There can be multiple assessment cards, meaning multiple assessment were performed.
model_card
A model_card
contains the following information.
-
provenance
(OPTIONAL). In case this Model Card is generated from another source file, this field can capture the historical context of the contents of this Model Card.git_commit_hash
(OPTIONAL, string). Git commit hash of the commit which contains the transformation file used to create this card.timestamp
(OPTIONAL, string). A timestamp of the date, time and timezone of generation of this System Card. Timestamp should be given, preferably in UTC (represented asZ
), in ISO 8601 format, i.e.2024-04-16T16:48:14Z
.uri
(OPTIONAL, string). URI to the tool that was used to perform the transformations.author
(OPTIONAL, string). Name of person that initiated the transformations.
-
language
(OPTIONAL, list[string]). If relevant, the natural languages the model supports in ISO 639. There can be multiple languages. -
license
(REQUIRED, string). Any license from the open source license list 1. If the license is NOT present in the license list this field must be set to 'other' and the following two fields will be REQUIRED.license_name
(string). An id for the license.license_link
(string). A link to a file of that name inside the repo, or a URL to a remote file containing the license contents.
-
tags
(OPTIONAL, list[string]). Tags with keywords to describe the project. There can be multiple tags. -
owners
(list). There can be multiple owners. For each owner the following fields are present.oin
(OPTIONAL, string). If applicable the Organisatie-identificatienummer (OIN).organization
(OPTIONAL, string). Name of the organization that owns the model. Ifion
is NOT provided this field is REQUIRED.name
(OPTIONAL, string). Name of a contact person within the organization.email
(OPTIONAL, string). Email address of the contact person or organization.role
(OPTIONAL, string). Role of the contact person. This field should only be set when thename
field is set.
1. Model Index
There can be multiple models. For each model the following fields are present.
name
(REQUIRED, string). The name of the model.model
(REQUIRED, string). A URI pointing to a repository containing the model file.-
artifacts
(OPTIONAL, list). A list of artifactsuri
(OPTIONAL, string) URI refers to a relevant model artifactcontent-type
(OPTIONAL, string) Optional type, follow the Content-Type. Recognized values are "application/onnx"", to refer to an ONNX representation of the model.md5-checksum
(OPTIONAL, string) Optional checksum for the content of the file.
-
parameters
(list). There can be multiple parameters. For each parameter the following fields are present.name
(REQUIRED, string). The name of the parameter, for example "epochs".dtype
(OPTIONAL, string). The datatype of the parameter, for example "int".value
(OPTIONAL, string). The value of the parameter, for example 100.-
labels
(list). This field allows to store meta information about a parameter. There can be multiple labels. For each label the following fields are present.name
(OPTIONAL, string). The name of the label.dtype
(OPTIONAL, string). The datatype of the feature. Ifname
is set, this field is REQUIRED.value
(OPTIONAL, string). The value of the feature. Ifname
is set, this field is REQUIRED.
-
results
(list). There can be multiple results. For each result the following fields are present.-
task
(OPTIONAL, list).task_type
(REQUIRED, string). The task of the model, for example "object-classification".task_name
(OPTIONAL, string). A pretty name for the model tasks, for example "Object Classification".
-
datasets
(list). There can be multiple datasets 2. For each dataset the following fields are present.type
(REQUIRED, string). The type of the dataset, can be a dataset id from Hugging Face datasets or any other link to a repository containing the dataset3, for example "common_voice".name
(REQUIRED, string). Name pretty name for the dataset, for example "Common Voice (French)".split
(OPTIONAL, string). The split of the dataset, for example "train".features
(OPTIONAL, list[string]). List of feature names.revision
(OPTIONAL, string). Version of the dataset, for example 5503434ddd753f426f4b38109466949a1217c2bb.
-
metrics
(list). There can be multiple metrics. For each metric the following fields are present.type
(REQUIRED, string). A metric-id from Hugging Face metrics4, for example accuracy.name
(REQUIRED, string). A descriptive name of the metric. For example "false positive rate" is not a descriptive name, but "training false positive rate w.r.t class x" is.dtype
(REQUIRED, string). The data type of the metric, for examplefloat
.value
(REQUIRED, string). The value of the metric.-
labels
(list). This field allows to store meta information about a metric. For example, metrics can be computed for example on subgroups of specific features. For example, one can compute the accuracy for examples where the feature "gender" is set to "male". There can be multiple subgroups, which means that the metric is computed on the intersection of those subgroups. There can be multiple labels. For each label the following fields are present.name
(OPTIONAL, string). The name of the feature. For example: "gender".type
(OPTIONAL, string). The type of the label. Can for example be set to "feature" or "output_class". Ifname
is set, this field is REQUIRED.dtype
(OPTIONAL, string). The datatype of the feature, for examplefloat
. Ifname
is set, this field is REQUIRED.value
(OPTIONAL, string). The value of the feature. Ifname
is set, this field is REQUIRED. For example: "male".
-
measurements
.-
bar_plots
(list). The purpose of this field is to capture bar plot like measurements, for example SHAP values. There can be multiple bar plots. For each bar plot the following fields are present.type
(REQUIRED, string). The type of bar plot, for example "SHAP".name
(OPTIONAL, string). A pretty name for the plot, for example "Mean Absolute SHAP Values".results
(list). The contents of the bar plot. A result represents a bar. There can be multiple results. For each result the following fields are present.name
(REQUIRED, string). The name of bar.value
(REQUIRED, float). The value of the corresponding bar.
-
graph_plots
(list). The purpose of this field is to capture graph plot like measurements, such as partial dependence plots. There can be multiple graph plots. For each graph plot the following fields are present.type
(REQUIRED, string). The type of the graph plot, for example "partial_dependence".name
(OPTIONAL, string). A pretty name of the graph, for example "Partial Dependence Plot".results
(list). Results contains the graph plot data. Each graph can depend on a specific output class and feature. There can be multiple results. For each result the following fields are present.class
(OPTIONAL, string/int/float/bool). The output class name that the graph corresponds to. This field is not always present.feature
(REQUIRED, string). The feature the graph corresponds to. This is required, since all relevant graphs are dependent on features.data
(list)x_value
(REQUIRED, float). The $x$-value of the graph.y_value
(REQUIRED, float). The $y$-value of the graph.
-
-
assessment_card
An assessment_card
contains the following information.
-
provenance
(OPTIONAL). In case this Assessment Card is generated from another source file, this field can capture the historical context of the contents of this Assessment Card.git_commit_hash
(OPTIONAL, string). Git commit hash of the commit which contains the transformation file used to create this card.timestamp
(OPTIONAL, string). A timestamp of the date, time and timezone of generation of this System Card. Timestamp should be given, preferably in UTC (represented asZ
), in ISO 8601 format, i.e.2024-04-16T16:48:14Z
.uri
(OPTIONAL, string). URI to the tool that was used to perform the transformations.author
(OPTIONAL, string). Name of person that initiated the transformations.
-
name
(REQUIRED, string). The name of the assessment. date
(REQUIRED, string). The date at which the assessment is completed. Date should be given in ISO 8601 format, i.e.YYYY-MM-DD
.-
contents
(list). There can be multiple items in contents. For each item the following fields are present:question
(REQUIRED, string). A question.answer
(REQUIRED, string). An answer.remarks
(OPTIONAL, string). A field to put relevant discussion remarks in.authors
. There can be multiple names. For each name the following field is present.name
(OPTIONAL, string). The name of the author of the question.
timestamp
(OPTIONAL, string). A timestamp of the date, time and timezone of the answer. Timestamp should be given, preferably in UTC (represented asZ
), in ISO 8601 format, i.e.2024-04-16T16:48:14Z
.
Example
System Card
version: {system_card_version} # Optional. Example: "0.1a1"
provenance: # Optional.
git_commit_hash: {git_commit_hash} # Optional. Example: 5503434ddd753f426f4b38109466949a1217c2bb
timestamp: {modification_timestamp} # Optional. Example: 2024-04-16T16:48:14Z.
uri: {modification_uri} # Optional. Example: https://github.com/MinBZK/tad-conversion-tool
author: {modification_author} # Optional. Example: John Doe
name: {system_name} # Optional. Example: "AangifteVertrekBuitenland"
upl: {upl_uri} # Optional. Example: https://standaarden.overheid.nl/owms/terms/AangifteVertrekBuitenland
owners:
- oin: {oin} # Optional. Example: 00000001003214345000
organization: {organization_name} # Optional if oin is provided, Required otherwise. Example: BZK
name: {owner_name} # Optional. Example: John Doe
email: {owner_email} # Optional. Example: johndoe@email.com
role: {owner_role} # Optional. Example: Data Scientist.
description: {system_description} # Optional. Short description of the system.
labels: # Optional. Labels to store metadata about the system.
- name: {label_name} # Optional.
value: {label_value} # Optional.
status: {system_status} # Optional. Example: "production".
publication_category: {system_publication_cat} # Optional. Example: "impactful_algorithm".
begin_date: {system_begin_date} # Optional. Example: 2025-1-1.
end_date: {system_end_date} # Optional. Example: 2025-12-1.
goal_and_impact: {system_goal_and_impact} # Optional. Goal and impact of the system.
considerations: {system_considerations} # Optional. Considerations about the system.
risk_management: {system_risk_management} # Optional. Description of risks associated with the system.
human_intervention: {system_human_intervention} # Optional. Description of human involvement in the system.
legal_base:
- name: {law_name} # Optional. Example: "AVG".
link: {law_uri} # Optional. Example: "https://eur-lex.europa.eu/legal-content/NL/TXT/HTML/?uri=CELEX:31995L0046".
used_data: {system_used_data} # Optional. Description of the data used by the system.
technical_design: {technical_design} # Optional. Description of the technical design of the system.
external_providers:
- {system_external_provider} # Optional. Reference to used external providers.
references:
- {reference_uri} # Optional. Example: URI to codebase.
models:
- !include {model_card_uri} # Optional. Example: cat_classifier_model.yaml.
assessments:
- !include {assessment_card_uri} # Required. Example: iama.yaml.
Model Card
provenance: # Optional.
git_commit_hash: {git_commit_hash} # Optional. Example: 5503434ddd753f426f4b38109466949a1217c2bb
timestamp: {modification_timestamp} # Optional. Example: 2024-04-16T16:48:14Z.
uri: {modification_uri} # Optional. Example: https://github.com/MinBZK/tad-conversion-tool
author: {modification_author} # Optional. Example: John Doe
language:
- {lang_0} # Optional. Example nl.
license: {license} # Required. Example: Apache-2.0 or any license SPDX ID from https://opensource.org/license or "other".
license_name: {license_name} # Optional if license != other, Required otherwise. Example: 'my-license-1.0'
license_link: {license_link} # Optional if license != other, Required otherwise. Specify "LICENSE" or "LICENSE.md" to link to a file of that name inside the repo, or a URL to a remote file.
tags:
- {tag_0} # Optional. Example: audio
- {tag_1} # Optional. Example: automatic-speech-recognition
owners:
- organization: {organization_name} # Required. Example: BZK
oin: {oin} # Optional. Example: 00000001003214345000
name: {owner_name} # Optional. Example: John Doe
email: {owner_email} # Optional. Example: johndoe@email.com
role: {owner_role} # Optional. Example: Data Scientist.
model-index:
- name: {model_id} # Required. Example: CatClassifier.
model: {model_uri} # Required. URI to a repository containing the model file.
artifacts:
- uri: {model_artifact_uri} # Optional. Example: "https://github.com/MinBZK/poc-kijkdoos-wasm-models/raw/main/logres_iris/logreg_iris.onnx"
- content-type: {model_artifact_type} # Optional. Example: "application/onnx".
- md5-checksum: {md5_checksum} # Optional. Example: "120EA8A25E5D487BF68B5F7096440019"
parameters:
- name: {parameter_name} # Optional. Example: "epochs".
dtype: {parameter_dtype} # Optional. Example: "int".
value: {parameter_value} # Optional. Example: 100.
labels:
- name: {label_name} # Optional. Example: "gender".
dtype: {label_type} # Optional. Example: "string".
value: {label_value} # Optional. Example: "female".
results:
- task:
type: {task_type} # Required. Example: image-classification.
name: {task_name} # Optional. Example: Image Classification.
datasets:
- type: {dataset_type} # Required. Example: common_voice. Link to a repository containing the dataset
name: {dataset_name} # Required. Example: "Common Voice (French)". A pretty name for the dataset.
split: {split} # Optional. Example: "train".
features:
- {feature_name} # Optional. Example: "gender".
revision: {dataset_version} # Optional. Example: 5503434ddd753f426f4b38109466949a1217c2bb
metrics:
- type: {metric_type} # Required. Example: false-positive-rate. Use metric id from https://hf.co/metrics.
name: {metric_name} # Required. Example: "FPR wrt class 0 restricted to feature gender:0 and age:21".
dtype: {metric_dtype} # Required. Example: "float".
value: {metric_value} # Required. Example: 0.75.
labels:
- name: {label_name} # Optional. Example: "gender".
type: {label_type} # Optional. Example: "feature".
dtype: {label_type} # Optional. Example: "string".
value: {label_value} # Optional. Example: "female".
measurements:
# Bar plots should be able to capture SHAP and Robustness Toolbox from AI Verify.
bar_plots:
- type: {measurement_type} # Required. Example: "SHAP".
name: {measurement_name} # Optional. Example: "Mean Absolute Shap Values".
results:
- name: {bar_name} # Required. The name of a bar.
value: {bar_value} # Required. The corresponding value.
# Graph plots should be able to capture graph based measurements such as partial dependence and accumulated local effect.
graph_plots:
- type: {measurement_type} # Required. Example: "partial_dependence".
name: {measurement_name} # Optional. Example: "Partial Dependence Plot".
# Results store the graph plot data. So far all plots are dependent on a combination of a specific class (sometimes) and feature (always).
# For example partial dependence plots are made for each feature and class.
results:
- class: {class_name} # Optional. Name of the output class the graph depends on.
feature: {feature_name} # Required. Name of the feature the graph depends on.
data:
- x_value: {x_value} # Required. The x value of the graph data.
y_value: {y_value} # Required. The y value of the graph data.
Assessment Card
provenance: # Optional.
git_commit_hash: {git_commit_hash} # Optional. Example: 5503434ddd753f426f4b38109466949a1217c2bb
timestamp: {modification_timestamp} # Optional. Example: 2024-04-16T16:48:14Z.
uri: {modification_uri} # Optional. Example: https://github.com/MinBZK/tad-conversion-tool
author: {modification_author} # Optional. Example: John Doe
name: {assessment_name} # Required. Example: IAMA.
date: {assessment_date} # Required. Example: 25-03-2025.
contents:
- question: {question_text} # Required. Example: "Question 1: ...".
answer: {answer_text} # Required. Example: "Answer: ...".
remarks: {remarks_text} # Optional. Example: "Remarks: ...".
authors: # Optional. Example: "['John', 'Peter']".
- name: {author_name}
timestamp: {timestamp} # Optional. Example: 2024-04-16T16:48:14Z.
Schema
JSON schema will be added when we publish the first beta version.
Changelog
- 0.1a4: adds data provenance
- 0.1a3: require ISO 8601 timestamp
- 0.1a2: introduces typed artifacts
- 0.1a1: initial draft version of this reporting standard
-
Deviation from the Hugging Face specification is in the License field. Hugging Face only accepts dataset id's from Hugging Face license list while we accept any license from Open Source License List. ↩↩
-
Deviation from the Hugging Face specification is in the
model_index:results:dataset
field. Hugging Face only accepts one dataset, while we accept a list of datasets. ↩↩ -
Deviation from the Hugging Face specification is in the Dataset Type field. Hugging Face only accepts dataset id's from Hugging Face datasets while we also allow for any url pointing to the dataset. ↩↩
-
For this extension to work relevant metrics (such as for example false positive rate) have to be added to the Hugging Face metrics, possibly this can be done in our organizational namespace. ↩↩