Function call stacks are the raw data produced when a computer program encounters an unanticipated error. Each call stack records the state of the program execution at the point the program crashed. Large volumes of call stacks are produced in software vulnerability research via a technique known as “fuzzing”. This technique induces software exceptions by generating randomized inputs with the goal of finding unknown software vulnerabilities. To correct the program code efficiently, we need to identify groups of call stacks that are likely to have arisen from the same underlying programming error.
Mathematically, this task requires metrics for comparing call stacks, and methods for visualizing the clustering that results. This project will study methods that adapt a standard string-edit distance to call stack data. The data will then be clustered and visualized using “Mapper” a recent tool from Topological Data Analysis.
Key references are:
K. Bartz. Finding similar failures using call-stack similarity. Third Conference on Computer Systems Problems in Machine Learning, San Diego California 2008.
P. Lum, et al. Extracting insights from the shape of complex data using topology, Scientific Reports, 2013.
The supervisory team includes Vanessa Robins (RSPE), Katharine Turner (MSI), Rajeev Gore (CS), and software vulneraility researchers from the Australian public service.
NOTE: This project has generous student funding associated with it for suitably qualified candidates.