Epistemology for Software Representations

[Next] [Previous] [Top]


3 Representing Software

This work is the result of studying Software Information Systems [Devanbu, Selfridge, and Brachman, 1990] in order to determine how to make them more effective [Welty, 1995]. A Software Information System (SIS) is a knowledge-based system which serves to make software maintenance less time-consuming by providing faster and more intelligent access to the software.

An SIS contains two representational parts: a code model and a domain model [Selfridge, 1990]. The former represents objects in the code (the software domain), which facilitates access to these objects, and the latter represents objects in the application domain, which facilitates understanding that domain.

3.1 Objects in the Application Domain

Knowledge of the application domain has long been recognized as a critical part of software maintenance [Curtis, Iscoe, and Krasner, 1988]. Representing some of this knowledge in a domain model is a fairly common practice, to assist in understanding during any phase of the software lifecycle [Iscoe, 1991].

Modeling a domain requires building an ontology for that domain, the specifics of which are dependant on the representation system being used. Typical elements of a domain ontology are classes, a class hierarchy, links, and rules which can infer these links between instances.

An example class hierarchy from the application domain of email distribution is shown in Figure 1. An instance of mail-message would be an electronic mail message from some instance of mail-sender to some instance of mail-recipient, where from and to are links. The goal of a domain model in software engineering is to provide an accurate description of what the software "knows" about the objects in the domain.

3.2 Objects in the Software Domain

The software domain contains objects like functions, data-types, and variables. It also contains assignment statements, for and while loops, if statements, parameters, etc. These are all the constructs defined by the programming language in which the software to be represented is written, and they are some of the classes in an ontology for code-level knowledge. Part of a possible class hierarchy for this domain is shown in Figure 2.

Instances of these classes would be lines of code, variables, and the aggregation of variables and lines of code into functions, etc. For example, consider the following function in C:

void deliver_message_to_group (message, group)
MAIL_MESSAGE message;
GROUP group;
{ LIST members;

  members = get_members(group);
  while (! empty(members)) {
    deliver_message_to_person(message,
                              first(members));
    members = butfirst(members); 
  }
}
This entire function can be completely described as instances of the classes in Figure 2, since each of the C-language statements has a very rigid form that can be represented as links. For example, an assignment statement always has a variable which is changed (the left-hand side of the "="), and a new value which is either another variable or a function of another variable or variables (the right-hand side). An assignment statement, then, has two links: one which relates it to the variable to be changed, and one that relates it to another variable or to a function call. A semantic network view of the C function above represented in this way is shown in Figure 3.

Providing this level of representation for a program allows for significant benefits to maintainers engaged in understanding the program [Welty, 1995]. For example, rules and other forms of inference can be employed to make information in the program more accessible.

3.3 Integrating the Code and Domain Models

At this point it would be desirable to be able to link the domain and code models. For example, the objects mail-message and group appear both in the domain hierarchy shown in Figure 1 and the function representation shown in Figure 3. In fact, conceptually they refer to the same thing. Groups and mail messages are objects in the domain that the program deals with directly. It would be very useful to be able to somehow indicate that these pairs of represented objects denote the same concepts, and it would better serve the goal of making the program more understandable if some mechanism could insure that the domain model objects accurately portrayed their counterparts in the code model.

There is a problem in linking these pairs, however. While they clearly do represent the same concepts, in the domain model they are classes and in the code model they are instances of the class data-type. A class can not be linked to an instance in a first order representation. It may seem that making the domain model objects instances would solve this problem. This would allow the pairs to be linked, but what are the domain objects instances of, and what becomes of their instances? All the people, groups, and mail messages in the domain model would become instances of instances, which is also not allowed in a first order representation.

Numerous combinations and representation "hacks" can be (and have been) attempted to address this problem, but there actually is no first order solution. The reason is simply that software representations are second order. The correct representation is to make the domain objects instances of the class data-type, and allow these instances to have instances of their own.

At a glance, it would seem that this fits into the Smalltalk meta-class structure described in Section 2.4.1, but it does not for two reasons.

First of all, Smalltalk does no inference with meta-classes. The whole purpose of representing the code-level knowledge this way was to employ inference to make information about the program more accessible to a maintainer trying to understand it.

Second of all, data-types are not the only second order objects in the software domain. Functions, for example, can be represented in the domain model as plans [Devanbu and Litman, 1991].

Classic meta-individuals, described in Section 2.4.2, are also inadequate for this second order representation problem. Each class in the domain model could have a meta-individual which was linked to the corresponding instance of data-type in the code-model. This does provide part of the representation desired, but, again, there is no inference. There is a strong relationship between the domain model and code model objects, and one goal of integrating the two models is to verify that the corresponding objects accurately portray each other. When the program changes, the domain model must reflect that change.


Epistemology for Software Representations - 01 MAY 95
[Next] [Previous] [Top]

Generated with CERN WebMaker