[Next] [Previous] [Up] [Top] [Contents]

CHAPTER 6

6.1 Localizing Information

One of the principal obstacles to software understanding, and thus discovery, is the fact that programming languages delocalize a lot of information. The best known and most closely studied example of this is the delocalized plan [Soloway and Letovsky, 1986], in which some abstract goal is achieved by lines of code that are spread out through the source program (probably across different source files). The problem is more than just delocalized plans, however, it's delocalized information in general. The essence of the programming language representation from a maintainer's perspective is simply text in a file, and in text the only information that is local is control-flow: the next line of the file typically indicates the next action in the control flow.

Consider, for example, a maintainer engaged in discovery on a large C program. The maintainer is looking at a particular line in one of the source files for this program, and sees a variable. Most information about this variable will not be local, that is it will not be contained in the next or previous line of the source text. In many cases, the information won't even be visible on the screen - again, only control flow is local. Information about the variable like its data-type, the slots, method, or superclasses of that data-type, the places where the variable is accessed, changed, etc., is someplace else. Thus we say this information is delocalized in the programming language source code.

In the KBEDS CSIS representation, specifically the code-level ontology, all this information can be made local. In the source text only control flow is local, but in the code-level ontology control flow is just another role. The data-type of a variable (as well as references to the variable, changes, etc.) are represented in such a way that they are as easy to find as the next action in the control flow.

This chapter will cover the parts of the ontology that serve to further localize useful information, and the ways in which the user interface can facilitate access to this information. Specific scenarios in which this information is used to recognize vestigial code and a delocalized plan are found in Section 7.2.1.1 and Section 7.2.1.2.

It is important to note that all this localization comes as a result of the representation, though a developer is not required to specify any more information than would be represented in a conventional programming language.

6.1.1 Rules for Discovery

The code-level ontology provides numerous rules for further localizing information and thus helping discovery. Many were discussed in the previous chapter, and several more will be discussed in other sections of this chapter. This section describes the rules for inferring the decomposition, data-type of a self-variable, and the descriptions of actions.

6.1.1.1 Functional Decomposition

One important and useful kind of information about a software system is the functional decomposition. In traditional top-down, modular design, this view of the software is typically generated early, before the actual implementation [Parnas, 1972]. Once the program is written, however, the initial decomposition plan will likely have changed, perhaps even dramatically [Shneiderman and Carroll, 1988], and recovery of the actual decomposition must be done through a painstaking process of analyzing the code. Consider what must be done: the decomposition begins at the top level with the program, and each call to a procedure within the top-level program constitutes an element of the second level of decomposition, etc. Decomposition is another example of information that is extremely delocalized in code.

For the code-level ontology it is important to note that it is not the invocation of a procedure that is part of the decomposition, it is the procedure that is invoked itself. Therefore the same procedure may appear in the decomposition many times. All procedure invocations within the implementations of the second-level procedures make up the decomposition of the procedure, and are the third level of decomposition of the program.

Clearly determining the actual decomposition of a system would be a tedious process for a maintainer to engage in. If a software system has been represented using the code-level ontology described here, the decomposition of a program or of any method is computed automatically with one simple rule:

Any message sent to an object (sending a message is a method invocation) as part of the implementation of a method or program implies that the method invoked by the message is part of the decomposition of that code block. The rules for finding hidden methods discussed in Section 4.2.2 will insure that all messages that appear in a method will be fillers for the has-implementation role.

This rule may seem odd, however, because there are many kinds of code-level actions that could fill the has-implementation role: assignments, switches, returns. None of these kinds of actions invoke methods themselves (they may do so indirectly, by using the return value of a message as a value, but this is covered in the discussion of hidden methods in Section 4.2.2). We need a way to say "get only the fillers for the has-implementation role that are individuals of message, and for each of those get the fillers of their call-method role." This is taken care of by the path facility (discussed in Section 2.2.4.2), since every code-level concept in the ontology except message has an (at-most 0 call-method) restriction. When the path facility gets to an intermediate individual in a path that has no value for the next role in the path, it simply ignores that individual. This rule, then, will ignore individuals of other actions because there can be no fillers for the call-method roles of any actions other than messages.

This rule will generate one level of decomposition. The complete decomposition, or more accurately, a list of every method in the decomposition of a code-block, could be derived by adding an immediate and inherited subrole of has-decomposition, modifying the above rule to infer has-immediate-decomposition instead of has-decomposition, and adding:

This has not been implemented in the KBEDS CSIS, because decomposition is not intrinsically useful as a list of methods, but as a way of visualizing the software system in a hierarchical manner. To visualize the decomposition of a code-block, only the first level decomposition of each code-block is needed by the user interface (discussed in Section 6.2).

6.1.1.2 Data-Types of a Self-Variable

Another useful bit of information is the data-type of a self-variable in a method. A method is part of a data-type, but through superclass inheritance may be attached to instances of any subclasses of that data-type. A self-variable, then, may be an instance of any of the data-types in the hierarchy. There is no automatic way to narrow it down further until run time. The following rule can provide the range of possible data-types of a self-variable:

There is a rule of thumb which can apply in some circumstances, however. In Section 6.1.2 the mechanism for determining every call to a method is described. Using this it is possible to find every message in the system which invokes the method. Further, each of these messages is sent to an instance (a slot, variable, or constant), and each of these instances has an immediate data-type. The set of all these immediate data-types may be smaller than the set of data-types generated by the above rule, unless one of the messages is sent to another self-variable. This rule of thumb is too complex to be expressed as a path, or any other declarative facility in Classic, but Classic does provide the ability to create complex rules whose antecedents are a role and a LISP function. When the rule fires on an individual, the LISP function is called with the individual as a parameter, and should return a list of individuals which will become new derived fillers for the role (actually, the path facility is implemented as a macro which generates such a rule). The rule of thumb for narrowing the data-type of a self-variable is implemented as one of these complex rules.

Again, we see an example of how localizing some information helps to improve discovery. In this case we are dealing with information that would be extremely delocalized in source code (every message that invokes a method).

6.1.1.3 Descriptions of Actions

Each type of action can be described in terms of what it does. What each action does is completely determined by the values that fill certain important roles.

For example, an assignment statement can be described as giving a new value to the variable (or slot) that fills its changes role. That new value is whatever fills its new-value role. Again, as with the rule of thumb for self-variable data-types, this is too complex for a simple rule or path expression, because it involves the combination of two different paths into one filler. In other words, for assignment we want to fill the description role with a string of the form: "Assigns <variable> <- <new-value>," where:

The two values, which are obtained from these two different paths, are then merged with other characters into the required string.


A more concrete example of this inference is shown in Figure 6.1. The assignment assignment-01 changes variable-11, giving it the value of a constant (constant-08). The name of variable-11 is "counter" and the name of constant-08 is "zero". These two strings are concatenated together to form the string "Assigns counter <- zero", which is derived by the function to be the filler for the description role of the assignment.

The Classic code for the general rules that derive the descriptions of each kind of action is given in Appendix C, and the LISP code for the functions that actually derive these description strings is shown in Appendix D. Briefly, this is what the functions compute for each action type:

The filters in the rule specifications prevent these description rules from firing until all the relevant roles are closed.

6.1.1.4 Descriptions of Methods

Once these descriptions are derived for actions, full text outlines of a method can be generated easily. The LISP code that generates these textual descriptions of methods is shown in Appendix E. Note that, although the code may seem long, it is actually quite simple since most of it is format statements. The logic is basically to print the name, the parameters, the local variables, and the type of value returned. Then the describe-implementation function takes over and simply prints the description of each statement in the implementation of the method, in order of the control flow. When it reaches a switch, it pursues each resulting branch in the control flow to the end before proceeding to the next. When it reaches a return or detects a loop, it terminates the current branch. The simplicity of these functions is made possible by the representation, a point that will be stressed again in Section 6.2. Below is the sample output of the describe-method function when applied to a method in the KBEDS Application (a full description of the domain and the meaning of the objects in this description are given in Chapter 5). Even without a complete explanation, the description reads fairly clearly:

[METHOD-3], Classify Message, is a method attached to [DATA-TYPE-6: Mail Message],
and all its subclasses: ([DATA-TYPE-76: Message to Group] [DATA-TYPE-57: Message 
to Person]).

The method has 1 parameter: ([PARAMETER-12: Knowledge Base]), 2 local variables: 
([SELF-VARIABLE-14] [VARIABLE-1: Recipient email string]), and returns a Void.

The method is implemented as follows:
Assigns [VARIABLE-1: Recipient email string] <- 
        [MESSAGE-11: Sends [METHOD-57: Find Recipient Field in String] to 
                           [SLOT-12: Header]].
Assigns [SLOT-57: Recipient] <- 
        [MESSAGE-13: Sends [METHOD-13: Find Recipient by email] to 
                           [PARAMETER-12: Knowledge Base]].
Switch on [MESSAGE-14: Sends [METHOD-58: Void?] to [SLOT-57: Recipient]]:
  Case [CONSTANT-63: False]:
     Sends [METHOD-59: Classify Individual] to [SELF-VARIABLE-14].
     Switch on [MESSAGE-16: Sends [METHOD-4: Valid Message?] to [SELF-VARIABLE-14]]:
       Case [CONSTANT-62: True]:
          Sends [METHOD-5: Process Message] to [SELF-VARIABLE-14].
          Returns [CONSTANT-55: VOID].

       Case [CONSTANT-63: False]:
          Returns [CONSTANT-55: VOID].

  Case [CONSTANT-62: True]:
     Assigns [SLOT-14: Error] <- [CONSTANT-14: No Such Recipient].
     Returns [CONSTANT-55: VOID].

6.1.2 Role Inferences for Discovery

The role hierarchy and role inverses provide the simplest inference mechanism in Classic. In the list of all roles given in Appendix B, it can be observed that every role in the code-level ontology, excepting only the roles in the data-slot hierarchy, has an inverse. Figure 3.6 on page 66 shows the seven role hierarchies in the ontology. While any of the role inverses can potentially provide information that is useful during discovery, the most significant of these hierarchies is accessed-by. The accessed-by roles are all the inverses of the roles that link code-level actions to software values. This is at once an immensely simple and immensely powerful notion, and is the place where the most localization of information takes place.

There are basically four kinds of access to a software-value: reading the value, passing a value to a parameter of a method, changing the value (not all individuals of software-value can be changed, only individuals of changeable-instance), and sending a message to the value (only individuals of data-type-instance).

Reading a software value, represented by the reads sub-hierarchy (the inverse of read-by), occurs when the value is used in an action with any of these roles: case-condition, return-value, new-value, switch-on-value, or argument-value. These roles must be filled in by the developer in order for the various actions to be complete. The inverses of these roles are the roles listed at the bottom of the read-by hierarchy shown in Figure 3.6 on page 66.

An example is shown in Figure 6.2. The developer has implemented a method and has filled the roles represented by solid lines, and Classic has derived all the rest through inverses and the role hierarchy. Consider the significance of all this derived information during discovery - Classic automatically keeps track of every access to a value. For a particular variable, a maintainer simply has to retrieve the values that fill the accessed-by role to see every place the variable is used. To restrict this list to only the places where the variable is changed, the maintainer retrieves the fillers for the changed-by role in the variable, as shown in Figure 6.2, the read-by role records every place the variables value is used.

All this added information, all this localization of information, comes essentially for free in this representation - that is, no extra work is required on the part of the developer to provide this cross-referencing, yet it is tremendously useful. Understanding a variable in a program involves determining how it is used, and the first step in that determination is finding where it is used and in what ways (read, change, etc.). In source code, especially when dealing with global variables, information about how a variable is used is very delocalized, and this is typically the source of delocalized plans.

Most derived information by far comes from role inverses and the role hierarchy, and that should be clear from looking at Figure 6.2 (and all the other figures in this document where these role inferences are shown), where each told role results in five derived roles. Rules typically result in the addition of a single link.

6.1.3 Superclass Inheritance and Discovery

Superclass inheritance, while a powerful and frequently used tool for modeling and setting up data-type hierarchies, actually causes significant problems during maintenance because the specifications a data-type, and consequently of each of its instances, is not all in one place [Huitt and Wilde, 1992]. Maintainers of object-oriented programs actually spend a great deal of time engaged in the rather tedious task of trying to find all the slots, methods, and data-types of individual variables [Meyers, Reiss and Lejter, 1992]. This occurs because object-oriented languages are also typically text-based representations, and the information inherited by an object from its superclasses isn't shown in the immediate text, and is thus delocalized.

It should be clear that the KBEDS CSIS representation solves this problem and localizes inherited information. Each data-type of a variable (there can be only one immediate data-type, but there may be many inherited data-types) is in the role has-data-type. Examining the immediate data-type of a variable (again, there is only one) will show all the slots (including the inherited ones) in the has-slots role, and all the methods in the has-methods role. In addition, the next section describes mechanisms which can put all this information in graphical form on the screen for any object at the click of a button.


Chris Welty - Dissertation - 17 SEP 1996
[Next] [Previous] [Up] [Top] [Contents]

Generated with Harlequin WebMaker