AI Verify
See the introduction
Functionality
Requirement | Priority | Fulfilled | Comments |
---|---|---|---|
The tool allows users to conduct technical tests on algorithms or models, including assessments of performance, bias, and fairness. To facilitate these tests, users can input relevant datasets, | M | 1 | This is core functionality of AIVerify |
The tool allows users to choose which tests to perform. | M | 1 | This is core functionality of AIVerify |
The tool allows users to fill out questionnaires to conduct impact assessments for AI. For example IAMA or ALTAI. | M | 1 | This is core functionality of AIVerify, however work is needed to add extra impact assessments |
The tool can generate a human readable report. | M | 1 | This is core functionality of AIVerify |
The tools works with a standardized report format, that it can read, write, and update. | M | 0 | The outputted format is a PDF format, so this cannot be updated, or easily read by a machine. |
The tool supports plugin functionality so additional tests can be added easily. | S | 0.5 | One can add a test as a plugin, it can however be a bit too technical still for many people. |
The tool allows to create custom reports based on components. | S | 1 | One can slide the technical tests results and the assessment test results into a report which will be placed into a PDF |
It is possible to add custom components for reports. | S | 1 | It is possible, but just like with tests can be hard for non-technical people |
The tool provides detailed logging, including tracking of different model versions, changes in impact assessments, and technical test results for individual runs. | S | 0.5 | There are versions of models when uploaded, and the report itself is the technical test result of a run. Changes to impact assessments are not logged (only when a report is generated) |
The tool supports saving progress. | S | 1 | Reports can be saved, while it is being constructed |
The tool can be used on an isolated system without an internet connection. | S | 1 | Locally the docker container can be build and ran |
The tool offers options to discuss and document conversations. For example, to converse about technical tests or to collaborate on impact assessments. | C | 0 | Only the end-result will be logged into the report |
The tool operates with complete data privacy; it does not share any data or logging information. | C | 1 | The application is a docker application and does not do this |
The tool allows extension of report formats functionality. | C | 1 | We could program this functionality in the tool and submit a PR |
The tool can be integrated in a CI/CD flow. | C | 0.5 | It is possible, but would be very heavy to do so. The build time is quite large, and only the technical tests could be ran in an automated fashion |
The tool can be offered as a (cloud) service where no local installation is required. | C | 0 | AIVerify is currently not doing this, we could however offer it as a cloud service |
It is possible to define and automate workflows for repetitive tasks. | C | 0 | As this tool is focused on UI, this is not possible |
The tool offers pre-built connectors or low-code/no-code integration options to simplify the integration process. | C | 0 | This is not included |
total_score = 36
Reliability
Requirement | Priority | Fulfilled | Comments |
---|---|---|---|
The tool operates consistently and reliably, meaning it delivers the same expected results every time you use it. | M | 1 | The tool did not break down a single time while we were coding a plugin (only threw errors) |
The tool recovers automatically from common failures. | S | 1 | Common failures like missing datasets or models are not breaking |
The tool recovers from failures quickly, minimizing data loss, for example by automatically saving intermediate test progress results. | S | 0.5 | The assessments you need to manually save otherwise it will be lost, but over different sessions the data will be stored persistent even if the containers go down. Test results are only stored in the generated report |
The tool handles errors gracefully and informs users of any issues. | S | 1 | When failed to generate a report the tool will log the error messages, otherwise when loading in data that is non existing the application (while not being very clear in error message) just continues with an error |
The tool provides clear error messages and instructions for troubleshooting. | S | 0.5 | The test-engine-core is a dependency that is installed as a package, and therefore the error message will not contain error in that package |
total_score = 13
Usability
Requirement | Priority | Fulfilled | Comments |
---|---|---|---|
The tool possess a clean, intuitive, and visually appealing UI that follows industry standards. | S | 1 | The tool does follow the material design principles for example when you hover over items they will respond to user input |
The tool provides clear and consistent navigation, making it easy for users to find what they need. | S | 0.5 | It is not completely clear where in the tool you are when interacting with it and sometimes you could go back to home but not always |
The tool is responsive and provides instant feedback. | S | 1 | Even for jobs like generating tests and the report, it scheduled jobs and will notify you when it is done |
The user interface is multilingual and supports at least English. | S | 0.5 | Currently it only supports english |
The tool offers keyboard shortcuts for efficient interaction. | C | 0 | It is mainly UI and therefore no keyboard shortcuts |
The user interface can easily be translated into other languages. | C | 0.2 | It would need quite some refactoring when adding support for the Dutch Language (especially the more technical words like Warning or the metadata on all the plugins |
total_score = 9.4
Help & Documentation
Requirement | Priority | Fulfilled | Comments |
---|---|---|---|
The tool provides comprehensive online help documentation with searchable functionalities. | S | 0.8 | From the end-user perspective yes, from the development perspective no (for example that you need to rebuild packages like the test-engine-core |
The tool offers context-sensitive help within the application. | C | 0 | Not included in the tool |
The online documentation includes video tutorials and training materials for ease of learning. | C | 0 | Although it contains many images |
The project provides readily available customer support through various channels (e.g., email, phone, online chat) to address user inquiries and troubleshoot issues. | C | 0.2 | Just email, which they do not respond to very quickly |
total_score = 2.8
Performance Efficiency
Requirement | Priority | Fulfilled | Comments |
---|---|---|---|
The tool operates efficiently and minimize resource utilization. | M | 0.5 | The tool is efficient, minimal waiting and no lag although it uses up quite some resources which could be optimized |
The tool responds to user actions instantly. | M | 1 | Instantaneous response time |
The tool is scalable to accommodate increased user base and data volume. | S | 0.5 | As it is built into a container it can be made scalable with Kubernetes, but the the tool itself can become very slow when generating results for a large dataset and model (because of the extra overhead) |
total_score = 7.5
Maintainability
Requirement | Priority | Fulfilled | Comments |
---|---|---|---|
The tool is easy to modify and maintain. | M | 0.2 | Adding a new plugin for a model type was quite hard, other plugins however are more easier |
The tool adheres to industry coding standards and best practices to ensure code quality and maintainability. | M | 0.2 | The docker side of the project could have a big improvement |
The code is written in a common, widely adopted and supported and actively used and maintained programming language. | M | 1 | Backend in Python, Frontend in NextJs |
The project provides version control for code changes and rollback capabilities. | M | 0.8 | The code is stored on Github, but the container itself not and also the packages which the tools depend on not |
The project is open source. | M | 1 | Github link |
It is possible to contribute to the source. | S | 0.5 | It is possible, although with our three features it takes a while for them to dedicated time for integration |
The system is modular, allowing for easy modification of individual components. | S | 0.5 | The technical tests and assessments are easy to adjust, other core features not |
Diagnostic tools are available to identify and troubleshoot issues. | S | 0 | Diagnosing some parts of the system took us quite some time as we couldn't properly debug in the containers |
total_score = 15.8
Security
Requirement | Priority | Fulfilled | Comments |
---|---|---|---|
The tool must protect data and system from unauthorized access, use, disclosure, disruption, modification, or destruction. | M | 0.5 | This managed by that the data is stored in MongoDB however, it currently only has 1 user support |
Regular security audits and penetration testing are conducted. | S | 0.1 | We are unaware of the security audits but they do have a security policy here |
The tool enforce authorization controls based on user roles and permissions, restricting access to sensitive data and functionalities. | C | 0 | Currently only 1 user can use the system and see all the data |
Data encryption is used for sensitive information at rest and in transit. | C | 1 | When data is transferred to mongoDB, a secure connection is set-up and also in the DB it is encrypted by MongoDB, also you have an SSL connection with the tool |
The project allows for regular security audits and penetration testing to identify vulnerabilities and ensure system integrity. | C | 1 | As you can install it locally, this is possible |
The tool implements backup functionality to ensure data availability in case of incidents. | C | 1 | Data is stored persistent, so even if the tool breaks the data will be in volumes |
total_score = 8.3
Compatibility
Requirement | Priority | Fulfilled | Comments |
---|---|---|---|
The tool is compatible with existing systems and infrastructure. | M | 1 | As it is a container it can run on Kubernetes and therefore at Digilab |
The tool supports industry-standard data formats and protocols. | M | 1 | Most Datasets and Models are supported by the tool |
The tool operates seamlessly on supported operating systems and hardware platforms. | S | 1 | As it runs in a container it is able to run on all the major OS'es if you have Docker Desktop or use a cloud version managed by yourself |
The tool supports commonly used data formats (e.g., CSV, Excel, JSON) for easy data exchange with other systems and tools. | S | 0.5 | As input many types are accepted, but only as export there is a PDF report |
The tool integrates with existing security solutions. | C | 0 | It does not integrate with security solutions |
total_score = 12.5
Accessibility
Requirement | Priority | Fulfilled | Comments |
---|---|---|---|
The tool is accessible to users with disabilities, following relevant accessibility standards (e.g., WCAG). | S | 0 | It is not clear what the tool actually does with one look, also the color change when hovering over elements is not a large difference compared to the original color (the purple and pink) |
total_score = 0
Portability
Requirement | Priority | Fulfilled | Comments |
---|---|---|---|
The tool support a range of operating systems (e.g., Windows, macOS, Linux) commonly used within an organization. | S | 1 | It is containerized |
The tool minimizes dependencies on specific hardware or software configurations, promoting flexibility. | S | 1 | This is all containerized |
The tool offers a cloud-based deployment option or be compatible with cloud environments for scalability and accessibility. | S | 1 | As it is containerized we could host this ourselves in a cloud environment |
The tool adheres to relevant cloud security standards and best practices. | S | 0.5 | The making of the container it self is lacking some best practices, otherwise the cloud security standards are not applicable as it is a self-hosted tool |
total_score = 10.5
Deployment
Requirement | Priority | Fulfilled | Comments |
---|---|---|---|
The tool has an easy and user-friendly installation and configuration process. | S | 0.5 | You need to be technical to be able to install and deploy, but then it is relatively easy |
The tool has on-premise or cloud-based deployment options to cater to different organizational needs and infrastructure. | S | 0 | The tool does not promise on-prem or cloud-based managed deployments |
total_score = 1.5
Legal & Compliance
Requirement | Priority | Fulfilled | Comments |
---|---|---|---|
It is clear how the tool is funded to avoid improper influence due to conflicts of interest | M | 1 | On the website it is stated, that many commercial partners fund this project |
The tool is compliant with relevant legal and regulatory requirements. | S | 1 | |
The tool adheres to (local) data privacy regulations like GDPR, ensuring the protection of user data. | S | 1 | |
The tool implements appropriate security measures to comply with industry regulations and standards. | S | 1 | |
The tool is licensed for use within the organization according to the terms and conditions of the license agreement. | S | 1 | Apache 2.0 license |
The tool respects intellectual property rights and avoid copyright infringement issues. | S | 1 |
total_score = 19