Delta debugging framework
Contents
Project Name
Delta debugging framework
Project Description
Delta debugging is an automated approach to debugging that isolates failures systematically. Given a failing test that can be mechanically verified (including a browser crash), delta debugging is a way of automatically isolating the change that introduced the failure. Having a framework in place to pull builds from CVS, bisect by date and change set (using bonsai data -- remember, CVS doesn't have changesets!), and report results would let computers make developers more productive.
Project Leader(s)
Name(s) of primary people working on the project. If you want to join a project as leader, discuss with other leaders first. Include links to personal pages within wiki.
Project Contributor(s)
Name(s) of people casually working on the project, or who have contributed significant help. Include links to personal pages within wiki. NOTE: only Project Leader(s) should add names here. You can’t add your own name to the Contributor list.
Project Details
Provides more depth than the Project Description. This is the place for technical discussions, project specs, or other details. If this gets very long, you might consider breaking this part into multiple pages and linking to them.
[WORK IN PROGRESS: This post will be updated once I gather all of my thoughts. Stay tuned...]
Based on the papers on delta debugging that I've sifted through (with From Automated Testing to Automated Debugging by Andreas Zeller being the most understandable to non-math wizards, and Yesterday, my program worked. Today, it does not. Why? also by Andreas Zeller probably being the most relevant to our project), here is an outline of my understanding of delta debugging, a summary of the concepts that must be taken into account while working on the delta debugging framework project, and an outline of the conceptual stages of developing the delta debugging framework.
Delta debugging is an algorithm that can automatically and systematically isolate/narrow down the failure-inducing circumstances that are necessary to produce the bug. Delta debugging can be applied to isolating various types of failure-inducing circumstances, including:
- program input
- user interaction (key presses, button presses, mouse clicks, etc.)
- program code modification (adding / updating / deleting variables, functions, classes, etc.)
How? From my understanding, given a known bug and a known set of circumstances that can reproduce the bug, we can continually execute tests that vary the circumstances until a minimal subset of circumstances that can reproduce the bug is left. That is, for each test, a circumstance is removed, and if the bug is still present, than that circumstance can theoretically be eliminated as the cause of the bug from the set of circumstances. In this project's case, the circumstance will be certain change(s) to the program code that caused a regression in the program.
[Note: The rest of this post contains mathematical concepts that I may not fully understand but am trying to put into terms understandable to non-math wizards. A dangerous combination.]
Before we continue, here are some important terms to understand.
- Configuration. A subset of all changes made to source code since the last known good version and the version with the regression.
- Test. A test case or function that can determine whether a configuration contains the failure-inducing change (Fails), doesn't contain the failure-inducing change (Passes), or produces indeterminate results (possibly because of interference or inconsistency).
- Subset. Given two sets (A and B), A is a subset of B if and only if all of the elements of set A are in set B.
- Superset. Given two sets (A and B), B is a superset of A if and only if B contains all of the elements of A.
- Union. Given two sets (A and B), the union of A and B contain all elements of A and B.
- Intersection. Given two sets (A and B), the intersection of A and B are the elements that are in both A and B.
With regards to the delta debugging algorithm, we must take into account these possible concepts that may complicate it:
- Interference. Each individual change may not cause the program to regress, but applying a combination of changes together causes the regression. Thus, there must be a method to identify the set of changes that cause the regression.
In the case of interference, we must recursively test both halves with all changes in one half of the configuration remaining applied.
- Inconsistency. Changes in the source code may be dependent on other changes in the source code and without them the program cannot be compiled and tested successfully. Thus, there must be a method to identify and handle dependencies (such as a dependency graph).
[Wild, far left field thought: How can we ensure that when applying a configuration, it is consistent? Well, in the compiler process, there is a lexical analysis step which breaks the source code into tokens, then there is a dependence analysis step which produces constraints/dependencies between the tokens. Theoretically, if we can harness parts of the compilation process, we could have a method of knowing the dependencies between the changes in the source code.]
- Granularity. A single logical change may consist of hundreds of lines of code, yet only a couple lines of the change may be responsible for the regression. Thus, There must be a method to break the change into smaller manageable chunks.
- Monotony. If a change causes a failure, any configuration that includes this change fails as well (makes sense to me). If a configuration does not cause a failure, then all subsets of the configuration do not cause a failure, thus they can be eliminated from the change set (makes some sense to me. However, if my understanding is correct, while a subset of the configuration does not cause the failure by itself, the concept of interference suggests that the subset combined with another configuration may cause the regression).
- Unambiguity. A failure is caused by only one configuration (No interference). Thus, for efficiency, we do not have to search the other configuration for more failure-inducing changes. Whether or not a configuration is ambiguous or unambiguous, a failure-inducing change will be produced. However, for completeness with regard to finding all failure-inducing changes, it is preferred to search both configurations.
Now that we are aware of the different concepts that we must take into account with regards to delta debugging, the next section will outline some facts and assumptions that are being made, and attempt to define the vision and process of the delta debugging framework.
Facts and Assumptions:
- The source tree for the Mozilla project is HUGE. With many different source file types (C++, JS, XUL) in many different directories.
- Failure-inducing change(s) will unlikely be localized to a single directory and file. Failure-inducing change(s) may be spread across many different directories and source files.
- The source files could be of the same type (C++), mixed type (C++, JS), same directory, different directory. It shouldn't matter. The framework should be source type and location agnostic.
- The failure-inducing change(s) may not be localized to a single developer. The failure-inducing change(s) may have been caused by another developer's change(s) to a source file they were working on. That is, a single developer's source scope may not be encapsulated but interconnected and interdependent on other developers source code.
- The developer's current version of the source code contains the regression.
- The developer has a test case that can be used indicate whether the test passes/fails/is indeterminate.
- The developer knows the last known good version number or the date of last known good version. The latter will most likely be simpler and more user friendly.
- Bonsai is a tool that can produce a list of differences between versions of a source file. (Bonsai's functionality has not been examined closely yet but will have to as it may be a key component to the framework)
Possible Vision of the Delta Debugging Framework:
(subject to change based on stakeholder consultation/feedback, feasibility study)
- Since the last time a developer executed a test case that passed, the developer modified some source files. The source files may be of the same type or mixed type, same directory or different directory. It shouldn't matter. The framework should be source type and location agnostic. Upon executing the test case again, the result is now a failure. The developer panics. It's only days before the deadline to submit bug patches before the source tree is supposed to be closed for release and the bug is a blocker. The developer doesn't want to be shamed for delaying the release, and the source code is too complex to find the bug in time, so what should they do? Use the delta debugging framework! that's what. How? you may ask. Well keep reading to find out. * scenario may vary.
- The delta debugging framework may require the developer to input two pieces of information. One, the test case/function that used to pass but now fails. It will be used to determine whether the source files with progressive changes passes/fails the test. Two, the date of the last known good version that passed the test case. Because of assumption 4 made above, the source directory and files or name of the developer may not be required.
- Once the developer has inputted the two pieced of information, it will use Bonsai to query the source tree and compile a list of all of the changes to the source files since the inputted date.
- (If there was a method of determining change dependencies so as to eliminate the possibility of inconsistencies, it would be done in this step. One possible way of reducing the possibility of inconsistencies is to logically group changes by location or check in time.)
- This step would be where the delta debugging algorithm would come into play. The algorithm should basically:
- Recursively, incrementally remove changes from the source code with the regression.
- Recompile the source tree.
- Execute the test case. There may be 3 outcomes:
- The test case passes. We know that the failure-inducing change(s) are in the change(s) that were removed.
- The test case fails. We know that the failure-inducing change(s) are not exclusively in the change(s) that were removed. I say not exclusively because of the concept of Interference (described above).
- The test case is indeterminate. There were some inconsistencies.
[WORK IN PROGRESS: This post will be updated once I gather all of my thoughts. Stay tuned...]
Project References
NewsForge: An Introduction to Delta Debugging Delta debugging simplifies debugging process in a program by automating the process and continually splitting the program to smaller chunks called deltas. This technique is useful in three circumstances:
- Error occurs due to user inputs (e.g. keypress, file I/O). Delta debugging is used to eliminate user actions irrelevant to the nature of the error and pinpoint the cause of the error.
- Error occurs due to recent changes to the code. In this situation, deltas are retrieved from the net differences from both codes.
- Multithreading environment. Delta debugging can track down the exact order of operations originating from multiple threads that caused the error.
We need to focus on one of these circumstances. Judging from the project description, we should work on the first case, while perhaps opening the door to future expansion.
Project News
Sept 26, 2006
- I read through your documentation here, and it is looking good. I also spoke to Shaver by phone this morning, and we chatted briefly about this project. He suggests that you start your work by looking for a suitable Crash Case, one that happens reliably. Then you need at what would be necessary in order to bisect the change set (e.g., bonsai data) in order to get closer to the change that introduced the bug. Shaver suggested that robc (Rob Campbell) might be a good person to help you brainstorm on this.