CHAPTER 8
Automatic reverse engineering is, however, the Holy Grail of the legacy software community. Most companies with investments in large scale software systems believe that the cost of re-engineering from scratch would greatly outweigh their current maintenance costs, and that only something that works automatically from the existing artifacts will suffice. From this perspective, the KBEDS CSIS would seem flawed, however the following points make it less so.
The simple fact is that, in large systems, the maintenance phase not only persists, but it is the most expensive phase. While good or even great design may help reduce maintenance problems, there is no way to design a system for all the changes it will make during its lifetime, and most design and development tools do little to help make maintenance easier (some even make it harder).
This research focused on the problems that occur during maintenance, not during requirements elicitation or design, and presents some ideas for how these problems can be alleviated (not eliminated) up front. In other words, while the CSIS itself must be built in the early stages, it stresses processes and representations that will make discovery, and hence maintenance, easier.
The next step in this research is to bring the ontology up to a point where it completely subsumes Smalltalk (the obstacles to completing this as part of the Ph.D. research were given in Section 7.1.1.2, the most time-consuming of which would be filling out the Smalltalk Class Library). Once this is complete, it should be possible to translate any Smalltalk program into the CSIS. The domain model would still need to be expanded to support discovery, however.
The ability to translate a Smalltalk program into a CSIS is not as academic as it may seem. Smalltalk is believed by some to be as prominent in industry as C++ [Green, 1994], and the N.Y. Times' help wanted section posted more jobs calling for Smalltalk experience than C++ over the month of February, 1995[5]. While the popularity of Smalltalk over C++ is controversial, the simple point is that Smalltalk is used in industry, and not on a small scale. A CSIS can greatly enhance the Smalltalk environment, which, while advanced, still poses significant maintenance challenges [Huitt and Wilde, 1992] which the KBEDS CSIS does address directly.
After handling Smalltalk, the representation could then be modified to represent a language in which significant software artifacts have been implemented, such as COBOL or C.
The challenge for a language like C would be handling pointers, which is also the challenge in programming and understanding that language. The inferences and search heuristics for automatically finding information about side-effects are far more complex, since in the compile-time environment (which is where the KBEDS CSIS operates) every memory access through a pointer can have a multitude of meanings. The basic structure of the ontology would be the same, but a new section dealing specifically with pointers would have to be added.
Research into partitioning knowledge-bases and sharing ontologies has indicated it may be possible to deal with smaller, computationally manageable, parts of very large knowledge-bases at a time [Gruber, 1993]. This may restrict some of the inference and thus the information provided by the CSIS, but would still be an improvement over existing techniques. Current large domain models, for example used by the Armed Forces, are specified on paper in manuals, and these manuals tend toward the hundreds of thousands of pages. Even if a CSIS could work with only a small part of its model at a time it would be much more usable and useful.
The issues involved with scalability can only be realistically addressed by doing it. It is clear that current approaches to dealing with scale in software engineering are not working, and it seems that almost anything would be an improvement.
There are certain features missing from the user interface that would make it a lot easier to use, both from the perspective of development and discovery.
The most important meta-information is contextual. During the development of KBEDS, for example, almost all the individuals were created through another individual. In other words, most individuals were created to fill a role in another individual. For example, clicking on the next-action role of an action with the intention of creating a new action to fill that role. In this case, the interface should not display a line for the role previous-action because that is the inverse of the role that will be filled with the new individual to be created. An action need not, however, be created through the next-action role of another action, but can be created as a result of trying to fill the has-implementation role of a method. In this case, the interface should not display a line for the implementation-of role, since that is the inverse of has-implementation. Both the previous-action and the implementation-of roles, however, are important roles of any software action, so they are displayed.
The point here is that if the interface kept track of the context of the creation (or the display) of a new individual, the key-role, important roles, and hidden roles, could be more flexible.
Another feature that would significantly speed up development would be for the interface to have an (also context dependent) understanding of the order in which the roles of each kind of individual are typically filled. For example, during the development of KBEDS, new slots had their roles filled starting with name, and followed by has-data-type, default-value, etc. It would have been faster if the interface automatically let the developer fill in these three roles, rather than requiring each role to be clicked on.
First of all, the interface itself only supports queries of the kind "find all individuals of concept x that have v1 as a filler for role r1, v2 as a filler for role r2..." There are a variety of other kinds of queries that classic supports, including those that use role cardinality, inter-role relationships, paths, even LISP functions, and these must be made in Classic directly, not through the user interface.
The interface should supply a lot more support for the most tedious aspect of programming with a CSIS: closing roles. A "compilation mode" could be added, giving the developer the option to say "this section of the implementation is finished" and let the interface figure out which roles need to be closed and in what order. This would be very computationally intense, but better than requiring the developer to do it, and do it correctly.
The interface should also support another "compilation" feature for doing batch retraction (and addition, but retraction is more important). Batch retraction is currently supported by Classic, but there is no way to do it from the interface. Currently the easiest way to do retraction in Classic is to dump the knowledge-base to a file, edit that file with a text editor to make the changes, and then re-load the data.
The discovery aspects of the KBEDS CSIS are well suited for working in groups. In fact, a fairly simple extension to the ontology, the addition of a representation of the organizational structure of the maintenance group, could add a lot more meaning. For example, each method or data-type could have a maintained-by role, which would be filled with an individual representing the person responsible for the data-type or method. Another role, perhaps called knows-about, might link a person to the domain objects that person understands. These simple representations can assist a developer with the age-old "Who knows what?" problem [Terveen and Selfridge, 1994].
There is no support in the KBEDS CSIS for multiple maintainers making changes to the knowledge-base. In fact, this is currently not supported by Classic at all. There is currently no way to effectively "partition" the representation so that developers could work on independent sections. Clearly a range of multiple-user issues would need to be addressed in order for a CSIS to be used on a large scale.
It should be noted, however, that this is more a problem of the representation system, not of the representation, and that efforts to address it are being made. KIF, the knowledge interchange format [Genesereth, 1991], and KQML, the Knowledge Query and Manipulation Language [Finin, McKay, and Fritzson, 1992], are examples of efforts to develop knowledge systems that support multiple users and multiple knowledge bases. There have been successes adding this technology to the knowledge representation language LOOM [MacGregor and Brill, 1992], and then applying it to multi-user domains [Knoblock, Arens, and Hsu, 1994]. Since LOOM is, like Classic, a descendent of KL-ONE, this would seem to indicate that it will be possible to use Classic in this way as well.
Generated with Harlequin WebMaker