[Next] [Previous] [Up] [Top] [Contents]

CHAPTER 7

7.2 Discovery

The most significant advantage of employing the CSIS approach described here is that discovery is made easier. The main task in discovery is search, and in this section several discovery scenarios in the KBEDS application are presented, in which the maintainer is searching the code for some information. The KBEDS CSIS is then compared to other discovery methods.

7.2.1 Scenarios

These are scenarios that depict a maintainer engaged in a real discovery task on the KBEDS application. The scenarios themselves are drawn from common problems that face programmers when trying to understand programs [Brooks, 1983] [Lampert, et al., 1988] [Letovsky, 1986] [Shneiderman and Carroll, 1988] [Soloway, et al., 1986].

7.2.1.1 Scenario 1: Delocalized Plans

Delocalized plans have long been recognized as a significant obstacle to understanding programs [Soloway and Letovsky, 1986] [Lampert, et al., 1988]. A delocalized plan occurs when a particular goal of a programmer is implemented by lines of code that appear in spatially disparate areas of the program. The traditional view of a module as representing a single plan or sub-plan is violated by the delocalized plan, whose elements appear in different modules. It has been shown, in fact, that the likelihood of a maintainer correctly recognizing a plan or intention in a program decreases as the lines of code that realize it are spread out or delocalized in the text of the program [Soloway and Letovsky, 1986]. The reason for this is that it is too time consuming to find all the parts of the plan and figure out what they do, so that understanding is attempted based on purely local information.

There are several well known examples of delocalized plans, two of which are: debugging, where information indicating certain internal states are printed if some global variable is set to a specific value, and dirty-bit, in which a global variable is used to record whether or not some data being held in memory has changed since last read from disk, and whether the data should be written back to the disk before being erased from memory. The former plan is fairly simple to recognize, but the latter is not [Soloway and Letovsky, 1986]. Every procedure that changes the data will have a single line in it that seems to have nothing to do with accomplishing the goal of that procedure, because it sets the dirty bit.

The KBEDS CSIS provides various levels of support for detecting delocalized plans. First of all, most delocalized plans involve the use of global variables. The representation provides immediate access to all uses of any variable. Next, a library of recognition procedures can be built that recognize the elements of known delocalized plans. Currently the library can recognize the debugging, dirty bit, and the error logging plans.

The dirty bit plan has several attributes by which it can be recognized. There is a global variable that governs its behavior, which is typically a boolean. The variable is initialized to false, and is typically only ever changed to a false value in one procedure, a procedure that writes to a file. The variable is set to true in any number of procedures, all of which make other global changes. The code that implements a search for a dirty bit variable is fairly complex, and it should be mentioned that this search finds global variables that are potential dirty-bit variables - it is heuristic, not conclusive.

The dirty-bit plan is the only non-trivial delocalized plan in the KBEDS application. The scenario that follows illustrates how a maintainer would proceed in discovering a delocalized plan for which there was no pre-defined recognition function.

The maintainer is attempting to understand the method add-item-to-KB, which is a method of the data-type knowledge-base. A good place to start is the textual description of the method:

[METHOD-63], Add Item to KB, is a method attached to 
[DATA-TYPE-2: KBEDS Knowledge-Base], and all its subclasses:
()

The method has 1 parameter: ([PARAMETER-63: Item to Add]),
no local variables, and returns a Void.

The method is implemented as follows:
Sends [METHOD-10: Add New Element] to [SLOT-3: Container].
Assigns [SLOT-23: KB-Changed?] <- [CONSTANT-62: True].
Returns [CONSTANT-55: VOID].
The key here is that normally a maintainer will attempt to form an understanding of a section of code based on purely local information [Soloway and Letovsky, 1986]. With the KBEDS CSIS, the amount of local information available at the click of the mouse button is expanded tremendously. At this point in the analysis, it is clear what the message does, but not clear what the assignment does, or more accurately, it is not clear why the assignment is necessary. What is the function of the KB-changed? slot? Clearly it is true if the KB changed, and false otherwise, but why does the program need to know this? The purpose of the slot is not obvious from looking at this section of code because the goal of this assignment statement is implemented elsewhere.

To determine the purpose of the assignment statement, the maintainer must find all the places where the variable is used. It may seem like the next step would be to check the accessed-by role in the slot, but consider what this role represents: it is the aggregation of every action in which the slot is passed as a parameter, changed in an assignment, read, or sent a message. Knowing when it is passed as a parameter probably will not reveal much, though a check would show that it is not used in this way. Inspecting the places where a variable is changed often gives you information about what the variable represents, but not why. In this case the maintainer has probably already made the assumption that the variable is set to True whenever the KB is changed. The question to be answered is, "Why does the program need to know when the KB is changed?"


Typically the answers to "why?" questions lie in the places where the slot is read. The maintainer brings up a role-filler window for the slot, shown in Figure 7.6, and is now presented with a view of the slot. Realizing that the read-by role of the slot is the most relevant, the maintainer clicks on that role in the window, and the role-filler window for the only action that reads the slot pops up, as shown in Figure 7.7.


The maintainer sees that this action is a switch in the method Save KB, and then gets a description of that method:

[METHOD-25], Save KB, is a method attached to 
[DATA-TYPE-2: KBEDS Knowledge-Base], and all its subclasses: ()

The method has no parameters, no local variables,
and returns a Void.

The method is implemented as follows:
Switch on [SLOT-23: KB-Changed?]:
Case [CONSTANT-63: False]:
     Returns [CONSTANT-55: VOID].

Case [CONSTANT-62: True]:
     Sends [METHOD-24: Internal Dump KB to File] to [SLOT-24: File].
     Returns [CONSTANT-55: VOID].
When the variable is true, the KB is written to a file, and when it is False, it isn't. The purpose of the slot has been revealed with a few mouse clicks. The information necessary to understand delocalized plans is actually localized by the representation.

7.2.1.2 Scenario 2: Vestigial Code

Another obstacle to program understanding is vestigial code. Vestigial code is code that is leftover in a program that serves no purpose, because the plan of which it was once part has been removed or altered in such a way that the code does nothing. Vestigial code will hang around because the maintainers don't know what it does and so do not want to touch it. Its presence has a similar effect to a delocalized plan in a module, because it really isn't part of the plan of the module it can confuse the maintainer who tries to understand the module by incorporating the vestigial code.

The most common form of vestigial code is a change to a global variable that is not used to realize any plan, and as mentioned in the previous section, the realization of a plan can be found by inspecting the places where the variable is read. A vestigial variable, then, is one that is never read, and all variables that are not read can be found with the Classic expression (and changeable-instance (at-most 0 read-by)). Vestigial code is therefore any assignment statement that changes a vestigial variable, (and assignment (all changes (and changeable-instance (at-most 0 read-by)))).

7.2.1.3 Scenario 3: Debugging

In this scenario, a maintainer is trying to discover the source of the following bug reported by a KBEDS user: "I have a filter to save mail from a particular colleague in a special file. He has two email addresses, and when he uses one of them the filter works, but when he uses the other the filter is ignored. I checked the knowledge-base and the entry for him does have both his email addresses."

Let us assume the KBEDS maintainer assigned this task is a novice to KBEDS. A logical first question to ask is, "What is a filter?" This is domain information, so the maintainer queries the KBEDS CSIS for all data-types that have the word "filter" in their name role (this can be done through the user interface by selecting Find Individual from the CSIS menu. This scenario involves many accesses to the user interface, so not every step will be displayed). This query turns up two data-types, information-filter and filter-action, and the trees resulting from the search are shown in Figure 7.8. The maintainer inspects the first data-type (by clicking on it in the tree window) and finds that it has two slots, antecedent whose value is an instance of the data-type mail-message, and consequent, whose value is an instance of the data-type filter-action. The maintainer also sees that while information-filter has no subclasses, filter-action has four.

From this information and the initial bug report, the maintainer probably concluded that an information filter is an object that under some conditions causes a mail message to be delivered in special ways. The next question to be answered is, "How does a filter get processed?" This is code-level information, and the maintainer sees that each of the two data-types has a method attached to it: information-filter has a method called check-for-activation and filter-action has a method called fire-filter. The maintainer remembers the bug report had to do with a filter for saving a message to a file not being activated, and decides to inspect the check-for-activation method.

Understanding methods is covered in greater detail in Section 6.4. There are many choices and paths a maintainer can take, and what may seem informative or intuitive to one may seem obscure to another [Redmiles, 1993]. This maintainer inspects the method and finds that all it does is compare the values in the slots of the mail message being processed to the mail message that is the filter antecedent (just a bunch of switches). If any slots match, the method fire-filter is called. The bug report mentioned that mail from a particular user wasn't being filtered correctly, and it is clear from the method that if the Sender slots match in the two messages, the filter should fire. The problem, then, must be in the way one of the two Sender slots gets its value.

Finding the places where a slot or variable gets a value is, of course, one of the key features of the KBEDS CSIS. The maintainer asks, "What are all the methods that change the sender slot?" (this simple query is really just a path: (changed-by implementation-of)). There is only one such method, register-sender.

The maintainer now asks, "How does the sender slot get its value in this method?" The answer to this question is simply the filler for the new-value role of the assignment statement that changes the slot, and this is a message invoking the method find-sender-by-email-addr.

The maintainer now asks, "What does that method return?" This is a question that can be answered in increasing levels of detail. The simplest answer is the data-type of what is returned, which is the filler for the method's has-return-data-type role. In this case the method returns an instance of valid-mail-sender (there is an example on page 17 of a maintainer using the domain model to understand this concept). The maintainer may well have already assumed that the sender slot of a mail message is going to be filled with a valid mail sender, so the next level of detail is to retrieve all the return values themselves, which can be accessed with the path (has-implementation return-value)[4]. There are two such values, one is the constant void, and the other is the variable current-kb-entry. Neither of these gives the maintainer any clues, so the next step is to examine the return statements themselves.


One way to do this is to pop up a role-filler window. Figure 7.9 shows the process up to this point, with the role-filler window for the return in the forefront. Here the maintainer sees something interesting. The return statement that returns current-kb-entry is the case-action of a switch. In other words, it is only executed under some condition, and the maintainer now asks, "What is that condition?"

The answer to this question is found in the fillers of the case-condition and switch-on-value roles of the switch-case and switch, resp. When these two values are equal, the case is activated and the control flow beginning with the case's case-action (which is the return statement the maintainer is trying to understand) is executed. The two values are the parameter email-address-string (which was passed into the method) and a message that invokes the method get-email-address on the variable current-kb-entry. The maintainer concludes that the current KB entry is returned only if its email address is the same as the email address of the sender of the message.

The next step would depend on the developer remembering that the problem had to do with a mail sender that had multiple email addresses, and the name of the method get-email-address seems to imply that it only considers one address. The maintainer seeks to verify this implication by asking the question, "What does get-email-address return?" This time, the answer to that question is very useful: a message that invokes the method first on the slot email-address-list. In other words, the find-sender-by-email-addr method only checks the first email address if there are more than one for a person. The bug has been found.

This scenario may seem tedious and drawn out, but consider how much was done by the KBEDS CSIS. Each question that the maintainer asked had an immediate answer available often at the click of a button. Without the KBEDS CSIS, the maintainer would have had to search through multiple source code files to find each of the methods that were being considered, tracing variable usage is very hard using conventional tools, and tracing only the places where the variable is changed is nearly impossible. Every part of the software was at the fingertips of the maintainer, and the complexity of some of the searches, like "What are the names of all the methods that change the sender slot?" is beyond any text-based search, and something previous SISs were not capable of.

7.2.2 Comparison to other methods

The value of Software Information Systems over other discovery techniques has been established [Brachman, et al., 1990], and the improvements of the KBEDS CSIS over an SIS such as LaSSIE [Devanbu, Selfridge, and Brachman, 1990] was discussed in Chapter 3. The rest of this document, culminating with the scenarios in the previous section, has shown that the improvements proposed in Chapter 3 were achieved: the KBEDS CSIS does far more towards assisting discovery than previous SISs. What remains to be said is how the KBEDS CSIS does not measure up to LaSSIE.

The overwhelming majority of software discovery problems occur in systems that are very old, very large, and written in languages like COBOL, FORTRAN, and Assembler. Making use of the KBEDS CSIS requires starting over from scratch, and LaSSIE generated its, albeit simple, knowledge base automatically from a huge C code repository. This fact is without a doubt the most serious criticism of the CSIS approach: you must re-engineer an software system into this representation in order to reap the benefits, and it is questionable whether the gain would outweigh the cost. This topic is covered in more detail in Section 8.2.1, and there is some hope. The KBEDS CSIS representation is the necessary first step, it defines an ontology in which the code and domain models can be merged.


Chris Welty - Dissertation - 17 SEP 1996
[Next] [Previous] [Up] [Top] [Contents]

Generated with Harlequin WebMaker