Comparison of tools for transparency of algorithmic decision making
We have researched a few tools which we want to investigate further, this document is the next step in that investigation. We created a checklist to compare these tools against. The Fulfilled column will give a numerical value based on whether that requirement is fulfilled or not between 0 and 1. Then the actual scoring is the fulfilled value times the priority (the priority is translated to numerical values in the following way: {M:4, S:3, C:2, W:-1}).
Summary of the comparison
Requirement | AIVerify | VerifyML | IBM 360 Research Toolkit | Holistic AI | AI Assessment Tool |
---|---|---|---|---|---|
Functionality | 36 | 42 | 20 | 17 | 22.85 |
Reliability | 13 | 4 | 16 | 16 | 15.4 |
Usability | 9.4 | 0 | 0 | 0 | 13 |
Help & Documentation | 2.8 | 1.5 | 6.4 | 1.6 | 0.55 |
Performance Efficiency | 7.5 | 11 | 11 | 11 | 11 |
Maintainability | 15.8 | 24.5 | 29 | 23.5 | 25.6 |
Security | 8.3 | 2 | 2 | 2 | 7.5 |
Compatibility | 12.5 | 14 | 14 | 10 | 11 |
Accessibility | 0 | 0 | 0 | 0 | 0.3 |
Portability | 10.5 | 4.5 | 5.1 | 7.5 | 11.4 |
Deployment | 1.5 | 0.6 | 1.2 | 3.6 | 3 |
Legal & Compliance | 19 | 16 | 16 | 16 | 19 |
Total | 136.3 | 120.1 | 120.7 | 108.2 | 140.6 |
Notable differences between the tools
AIVerify notes:
-
Technical tests are supported, but it can be quite slow because of overhead of the tool
-
More flexibility would need to be built in before people could use the technical tests
-
If you have many variables you are not able to show it in the pdf
-
The error messages in why technical tests don't work on the model are not user-friendly
-
VerifyML notes:
-
This tool is not actively developed anymore, parties transferred their focus to AIVerify
-
This tool does not support for assessments
IBM 360 toolkit notes:
-
The toolkit has a strong backing of the industry and the community
-
There are many technical tests included from the latest research, and also supports mitigation algorithms
-
It is purely for developers and has therefore no support for assessments
Holistic AI:
-
Like IBM 360 Toolkit it does differentiate to different type of technical assessments like bias and explainability, but it is less extensive than the 360 toolkit
-
The ambition is large of Holistic AI, they want to capture, Efficacy, Robustness, and Privacy tests as well
-
It is a private company from the United Kingdom which has open sourced part of their tool
AI Assessment Tool:
-
This tool does not have any technical tests, but outshines the others with the discussion on assessment option
-
It is also very performant
Summary per tool in one sentence
-
AIVerify
is a tool with a UI to execute both assessments and technical tests. -
VerifyML
is a Python package to generate Model Cards. -
Holistic AI
is a Python package to test for and mitigate Bias in your model. -
IBM 360 Research Toolkit
is a Python and R package to test for Fairness & Explainability of your model. -
AI Assessment Tool
is a tool with a UI to execute assessments and log discussions.