A  study on identifying, finding and classifying Software bugs using data mining methods

Mohammad Beheshti

doi:10.63053/ijset.1

Authors

Mohammad Beheshti Bachelor of Computer Science, Computer Software Technology Engineering, Faculty of Electrical Engineering and Computer Engineering, Islamic Azad University, Germi Branch, Iran

DOI:

https://doi.org/10.63053/ijset.1

Keywords:

Software Bugs Tracking, Data Mining Techniques, Software Quality Assurance, Bug Tracking, Bug Classification

Abstract

A software bug is an error, flaw or fault in the design, development, or operation of computer software that causes it to produce an incorrect or unexpected result, or to behave in unintended ways. The process of finding and correcting bugs is termed "debugging" and often uses formal techniques or tools to pinpoint bugs. Since the 1950s, some computer systems have been designed to deter, detect or auto-correct various computer bugs during operations. Bugs in software can arise from mistakes and errors made in interpreting and extracting users' requirements, planning a program's design, writing its source code, and interaction with humans, hardware and programs, such as operating systems or libraries. A program with many, or serious, bugs is often described as buggy. Bugs can trigger errors that may have ripple effects. The effects of bugs may be subtle, such as unintended text formatting, through to more obvious effects such as causing a program to crash, freezing the computer, or causing damage to hardware. Other bugs qualify as security bugs and might, for example, enable a malicious user to bypass access controls in order to obtain unauthorized privileges. Some software bugs have been linked to disasters. Bugs in code that controlled the Therac-25 radiation therapy machine were directly responsible for patient deaths in the 1980s. In 1996, the European Space Agency's US$1 billion prototype Ariane 5 rocket was destroyed less than a minute after launch due to a bug in the on-board guidance computer program. In 1994, an RAF Chinook helicopter crashed, killing 29; this was initially blamed on pilot error, but was later thought to have been caused by a software bug in the engine-control computer. Buggy software caused the early 21st century British Post Office scandal, the most widespread miscarriage of justice in British legal history. In 2002, a study commissioned by the US Department of Commerce's National Institute of Standards and Technology concluded that "software bugs, or errors, are so prevalent and so detrimental that they cost the US economy an estimated $59 billion annually, or about 0.6 percent of the gross domestic product". In this paper, software defect detection and classification method is proposed and data mining techniques are integrated to identify, classify the defects from large software repository.

References

Ajay Prakash, B.V., Ashoka, D.V., Aradhya, V.N.: Application of data mining techniques for software reuse process. Procedia Technology 4, 384–389 (2012)

Kim, S., Whitehead Jr., E.J., Zhang, Y.: Classifying Software Changes: Clean or Buggy? IEEE Transactions on Software Engineering 34(2), 181–196 (2008)

Antoniol, G., Ayari, K., Penta, M.D., Khomh, F., Guéhéneuc, Y.G.: Is It a Bug or an Enhancement? A Text-Based Approach to Classify Change Requests. In: Proceedings of the 2008 Conference of the Center for Advanced Studies on Collaborative Research, New York, pp. 304–318 (2008)

Fluri, B., Giger, E., Gall, H.C.: Discovering Patterns of Change Types. In: Proceedings of the 23rd International Conference on Automated Software Engineering (ASE), L’Aquila, September 15-19, pp. 463–466 (2008)

Jalbert, N., Weimer, W.: Automated Duplicate Detection for Bug Tracking Systems. In: IEEE International Conference on Dependable Systems & Networks, Anchorage, June 24- 27, pp. 52–61 (2008)

Cotroneo, D., Orlando, S., Russo, S.: Failure Classification and Analysis of the Java Virtual Machine. In: Proceedings of the 26th IEEE International Conference on Distributed Computing Systems, Lisboa, July 4-7, pp. 1–10 (2006)

Kalinowski, M., Mendes, E., Card, D.N., Travassos, G.H.: Applying DPPI: A Defect Causal Analysis Approach Using Bayesian Networks. In: Ali Babar, M., Vierimaa, M., Oivo, M. (eds.) PROFES 2010. LNCS, vol. 6156, pp. 92–106. Springer, Heidelberg (2010)

Čubranić, D.: Automatic bug triage using text categorization. In: SEKE 2004: Proceedings of the Sixteenth International Conference on Software Engineering & Knowledge Engineering (2004)

Guo, P.J., Zimmermann, T., Nagappan, N., Murphy, B.: Characterizing and Predicting which Bugs Get Fixed: An Empirical Study of Microsoft Windows. In: ACM International Conference on Software Engineering, Cape Town, May 1-8, pp. 495–504 (2010)

Fenton, N.E., Neil, M.: A critique of software defect prediction models. IEEE Transactions on Software Engineering 25, 675–689 (1999)

Yorozu, Y., Hirano, M., Oka, K., Tagawa, Y.: Electron spectroscopy studies on magnetooptical media and plastic substrate interfaces (Translation Journals style). IEEE Transl. J. Magn. Jpn. 2, 740–741 (1982)

Lutz, R.R., Mikulski, C.: Requirements discovery during the testing of safety-critical software. In: Presented at the Proceedings of the 25th International Conference on Software Engineering, Portland, Oregon, pp. 578–583 (2003)

Honar, E., Jahromi, M.: A Framework for Call Graph Construction, Student thesis at School of Computer Science. Physics and Mathematics (2010)

Boetticher, G., Menzies, T., Ostrand, T.: PROMISE repository of empirical software engineering data, West Virginia University, Department of Computer Science (2007), http://promisedata.org/repository

Bugzilla, http://www.bugzilla.org

Perforce, http://www.perforce.com

Trac Integrated SCM & Project Management, http://trac.edgewall.org

Fossil, http://www.fossil-scm.org