Evaluation-Strategy Gap in Fault Diagnosis of Deep Learning Programs

Jahan, Sigma

Computer Science > Software Engineering

arXiv:2606.26492 (cs)

[Submitted on 25 Jun 2026]

Title:Evaluation-Strategy Gap in Fault Diagnosis of Deep Learning Programs

Authors:Sigma Jahan

View PDF HTML (experimental)

Abstract:Deep Learning (DL) programs can fail during training for many reasons, and diagnosing the cause is a costly and time-consuming maintenance task. Techniques for diagnosing such failures are commonly assessed using within-program cross-validation, which may be inadequate for deployment settings involving previously unseen programs. It is therefore necessary to assess how performance differs across these settings and to identify the causes of any performance gap in established fault diagnosis techniques for DL. We investigate this gap using DynFault, a corpus of 5,542 fault-injected training traces from 38 real-world DL programs. We found a gap of 0.190 in balanced accuracy for existing fault diagnosis techniques between within-program evaluation and holding out whole programs. We also found the gap comes from program-level structure in the features, which led us to examine two runtime feature sets, curvature features and optimizer features, and their behavior on unseen programs. We found that curvature features are useful for instability detection on unseen programs, while optimizer and activation features help only on programs seen during training.

Comments:	Accepted research track paper in the 42nd IEEE International Conference on Software Maintenance and Evolution (ICSME 2026)
Subjects:	Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2606.26492 [cs.SE]
	(or arXiv:2606.26492v1 [cs.SE] for this version)
	https://doi.org/10.48550/arXiv.2606.26492

Submission history

From: Sigma Jahan [view email]
[v1] Thu, 25 Jun 2026 00:52:22 UTC (1,162 KB)

Computer Science > Software Engineering

Title:Evaluation-Strategy Gap in Fault Diagnosis of Deep Learning Programs

Submission history

Access Paper:

Current browse context:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Software Engineering

Title:Evaluation-Strategy Gap in Fault Diagnosis of Deep Learning Programs

Submission history

Access Paper:

Current browse context:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators