Is it useful to combine just-in-time (JIT) software defect prediction with conformal prediction?

Conformal Software Defect Prediction (CDP)

The goal

  • Students will apply and systematically evaluate conformal prediction (CP) as a
    rigorous uncertainty quantification approach on top of state-of-the-art JIT defect predictors.

What students learn

  • basic knowledge on just-in-time (JIT) defect prediction
  • quantifying uncertainties for artificial intelligence (AI) systems
  • performing experiments on a modern GPU server
  • scientific work

Modern software development approaches, such as DevOps, facilitate continuous software integration and delivery (CI/CD). Such continuous CI/CD can be supported by Just-in-time (JIT) defect prediction techniques. These techniques provide feedback on whether a code change committed to the software repository is likely to contain defects. This immediate feedback allows practitioners to make timely decisions regarding potential defects. However, a prediction model may deliver false predictions, that may negatively affect practitioners' decisions. False positive predictions lead to unnecessarily spending resources on investigating clean code changes, while false negative predictions may result in overlooking defective changes. Knowing how uncertain a defect prediction is, would help practitioners to avoid wrong decisions.

A potential solution to address the problem of quantifying prediction uncertainty is conformal prediction (CP). CP can be combined with any prediction model that provides some heuristic notion of uncertainty, such as prediction probabilities. CP uses a small amount of additional calibration data to convert the heuristic notion of uncertainty into a rigorous one. Instead of generating an output in the form of a single label, CP generates prediction sets that are guaranteed, with probability 1-α, to contain the true label. In the optimal case, the prediction sets consist of a single label.

The goal of this project group is to apply and systematically evaluate conformal prediction (CP) as a rigorous uncertainty quantification approach on top of state-of-the-art JIT defect predictors. The students will first review the literature on JIT defect prediction and select suitable state-of-the-art approaches to work with. They will then apply CP on the selected JIT approach, using real-world large-scale defect datasets (e.g., such as from the OpenStack and Apache open-source projects). Provided with access to a modern GPU-Server (equipped with four NVIDIA GeForce RTX 4090 GPUs), they will perform experiments to analyze: (1) how often CP can provide guarantees for JIT defect predictions; and (2) how many false JIT defect predictions CP can filter out. Based on an in-depth analysis of the experiment results, the students will discuss potential directions for enhancing the performance of conformal JIT defect prediction.

[1] Yunhua Zhao, Kostadin Damevski, and Hui Chen. 2023. A Systematic Survey of Just-in-Time Software Defect Prediction. ACM Comput. Surv. 55, 10, Article 201 (October 2023), 35 pages. https://doi.org/10.1145/3567550

[2] Jalaj Pachouly, Swati Ahirrao, Ketan Kotecha, Ganeshsree Selvachandran, Ajith Abraham. A systematic literature review on software defect prediction using artificial intelligence: Datasets, Data Validation Methods, Approaches, and Tools, Engineering Applications of Artificial Intelligence, Volume 111, 2022, 104773, ISSN 0952-1976, https://doi.org/10.1016/j.engappai.2022.104773

[3] Angelopoulos, Anastasios N., and Stephen Bates. "A gentle introduction to conformal prediction and distribution-free uncertainty quantification." arXiv preprint arXiv:2107.07511 (2021)

[4] Shafer, Glenn & Vovk, Vladimir. (2007). A tutorial on conformal prediction. Journal of Machine Learning Research. 9. 10.1145/1390681.1390693.

Other info

Anmeldung Master-Projektgruppe

Die Anmeldephase für die Projektgruppen des Sommersemesters 2024 startet Anfang Januar 2024. Die Anmeldung läuft zentral über die Abteilung Software Engineering der Fakultät für Informatik. Weitere Infos folgen in Kürze.