TestifAI: Tomography-Based Testing for Deep Learning Systems

Arif, Arooj, Hartung, Tobias, Botoeva, Elena and Koliousis, Alexandros (2025) TestifAI: Tomography-Based Testing for Deep Learning Systems. In: 2026 IEEE/ACM 48th International Conference on Software Engineering, April 12--18, 2026, Rio de Janeiro, Brazil. (In Press)

Abstract

As AI systems are increasingly deployed in safety-critical application domains (e.g., autonomous driving), associated risks increase too. Deep learning models underlying modern AI systems, therefore, must undergo thorough testing to ensure their correct behaviour. A single robustness test involves thousands of inferences to empirically verify if a model's outputs remain stable under a bounded perturbation of its inputs. However, existing testing frameworks lack the means to systematically explore and summarise robustness across a combinatorial space of perturbations. We propose TestifAI, a deep learning testing framework for efficient and accurate estimation of robustness against combinations of perturbations. TestifAI enables users to specify operational conditions as structured spaces of semantic input perturbations (e.g., image blur, brightness and zoom) and discrete severity levels (e.g., low, medium and high). Users can query model robustness for any combination (e.g., "low blur, high brightness, and medium zoom"). To achieve efficiency and accuracy, TestifAI introduces partial model tomography, a novel approach to reconstructing model behaviour in a multi-perturbation space from tests that apply only a small number of perturbations (lower-order projections). To estimate robustness against at least three perturbations, TestifAI trains an auxiliary model on the results of tests involving up to two perturbations only, avoiding execution of an exponential number of tests. Our experiments on five image and language classification tasks show that TestifAI can predict higher-order (3 and 4 perturbations) test outcomes from low-order (1 and 2 perturbations) observations with an aggregate robustness estimation error of less than 7%, while reducing the number of inferences by 60--80%.

Item Type:	Conference or Workshop Item (Paper)
Disciplines:	Computing, Maths, Engineering and Natural Sciences > Computing and Information Systems Computing, Maths, Engineering and Natural Sciences > Data Science & AI
Depositing User:	Alexandros Koliousis
Date Deposited:	16 Feb 2026 14:32
Last Modified:	16 Feb 2026 14:32
URI:	https://nul.repository.guildhe.ac.uk/id/eprint/2663

Actions (login required)

Edit Item