[Next] [Previous] [Up] [Top] [Contents]

CHAPTER 2

2.2 Knowledge Representation

Knowledge Representation (KR) has long been considered one of the principal elements of Artificial Intelligence, and a critical part of all problem solving [Newell, 1982]. The subfields of KR range from the purely philosophical aspects of epistemology to the more practical problems of handling huge amounts of data. This diversity is unified by the central problem of encoding human knowledge - in all its various forms - in such a way that the knowledge can be used. This goal is perhaps best summarized in the Knowledge Representation Hypothesis:

Any mechanically embodied intelligent process will be comprised of structural ingredients that a) we as external observers naturally take to represent a propositional account of the knowledge that the overall process exhibits, and b) independent of such external semantical attribution, play a formal but causal and essential role in engendering the behavior that manifests that knowledge [Smith, 1982].

A successful representation of some knowledge must, then, be in a form that is understandable by humans, and must cause the system using the knowledge to behave as if it knows it. The "structural ingredients" that accomplish these goals are typically found in the languages for KR, both implemented and theoretical, that have been developed over the years.

It has already been established in Section 2.1.4 that programming is a process of representing knowledge. The medium used for this representation, programming languages, do not satisfy the criteria set down by the KR Hypothesis very well. They do effect a behavior that is consistent with the knowledge represented, but they are well known to be difficult to understand.

2.2.1 Knowledge Representation Languages

William Woods defines the properties of a KR Language as follows:

A KR language must unambiguously represent any interpretation of a sentence (logical adequacy), have a method for translating from natural language to that representation, and must be usable for reasoning [Woods, 1975].

Wood's definition is merely a simplification of the KR Hypothesis where "reasoning" is the only method of "engendering the behavior that manifests that knowledge." Reasoning is essential to KR, and especially to KR languages, yet even simple reasoning capabilities can lead to serious tractability problems [Brachman and Levesque, 1987], and thus must be well understood and used carefully.

One of the most important developments in the application of KR in the past 20 years has been the proposal [Minsky, 1981], study [Woods, 1975] [Brachman, 1977] [Brachman, 1979], and development [Brachman and Schmolze, 1985] [Fox, Wright, and Adam, 1985] [Bobrow and Winograd, 1985] of frame-based KR languages. While frame-based KR languages differ in varying degrees from each other, the central tenet of these systems is a notation based on the specification of objects (concepts) and their relationships to each other. The main features of such a language are:

1. Object-orientedness. All the information about a specific concept is stored with that concept, as opposed, for example, to rule-based systems where information about one concept may be scattered throughout the rule base.

2. Generalization/Specialization. Long recognized as a key aspect of human cognition [Minsky, 1981], KR languages provide a natural way to group concepts in hierarchies in which higher level concepts represent more general, shared attributes of the concepts below.

3. Reasoning. The ability to state in a formal way that the existence of some piece of knowledge implies the existence of some other, previously unknown piece of knowledge, is important to KR. Each KR language provides a different approach to reasoning.

4. Classification. Given an abstract description of a concept, most KR languages provide the ability to determine if a concept fits that description, this is actually a common special form of reasoning.

Object orientation and generalization help to make the represented knowledge more understandable to humans, reasoning and classification help make a system behave as if it knows what is represented. Frame-based systems thus meet the goals of the KR Hypothesis.

It is important to realize both the capabilities and limitations of frame-based representations, especially as compared to other formalisms. To begin with, all symbolic KR techniques are derived in one way or another from First Order Logic (FOL), and as a result are suited for representing knowledge that doesn't change. Different KR systems may be able to deal with non-monotonic changes in the knowledge being represented, but the basic assumption has been that change, if present, is the exception rather than the rule.

Two other major declarative KR formalisms are production systems and database systems. Production systems allow for the simple and natural expression of if-then rules. However, these systems have been shown to be quite restrictive when applied to large problems, as there is no ordering of the rules, and inferences cannot be constrained away from those dealing only with the objects of interest. Production systems are subsumed by frame-based systems, which additionally provide natural inference capabilities like classification and inheritance, as well as knowledge-structuring techniques such as generalization and object orientation.

Database systems provide only for the representation of simple assertions, without inference. Rules of inference are important pieces of knowledge about a domain. For example, consider the bibliographic database in Figure 2.6. If someone were interested in article-10 and wanted to know where it was, that person would have to be smart enough to realize that an article can be found in the location of the journal in which it is published. That sentence is a rule, it is knowledge. It is knowledge that cannot be expressed in a database system. The person doing the retrieval of the information in the database must have that knowledge in order to type the SQL statement that will get the proper location. In a frame-based system that knowledge can be expressed as a rule that will fire when article-10 is accessed, thus the user is not required to know it.

Frame-based systems are currently severely limited when dealing with procedural knowledge [Winograd, 1975]. An example of procedural knowledge would be Newton's Law of Gravitation - the attraction between two masses is inversely proportional to the square of their distances from each other. Given two frames representing the two bodies, with slots holding their positions and mass, the value of the gravitational attraction between them cannot be inferred declaratively using the standard reasoning mechanisms available in frame-based KR languages, though a function or procedure in a programming language could represent the mechanism for performing this "inference" quite well. Frame-based systems that can deal with this kind of knowledge do so by adding a procedural language to its representation. The knowledge is not being represented in a frame-based way, it is being represented as C or (more commonly) LISP code which is accessed through a slot in the frame [Bobrow and Winograd, 1985]. This is an important distinction - there is knowledge being encoded in those LISP functions that is not fully accessible. The system can reason with that knowledge, but not about it; in other words we can use some attached procedure to compute (or infer) the value of one slot based on some others, but we cannot ask how that value was obtained.

2.2.2 Domain Modeling

Domain modeling is the field in which the application of KR to specific domains is studied and performed. Figure 2.7 shows a framework for discussing domain modeling that seems to map well onto most examples [Iscoe, Tam, and Liu, 1991].

The amorphous shape labeled Domain Knowledge refers to the knowledge possessed by the domain expert that must be encoded in some fashion. This knowledge is not well defined and is fairly difficult for others to access. The box labeled Meta-Model refers to the KR formalism, typically a KR language, that will be used as the symbol level [Newell, 1982] for the machine representation of this knowledge. The box labeled instantiation refers to the process of taking the domain knowledge and physically representing it using the meta-model, this process is sometimes referred to as knowledge acquisition [Schoen, 1991]. The box labeled domain model refers to the knowledge-base that results from the instantiation, and the operational goals are typically not represented formally, but refer to the reason the domain model was built and what it will be used for.

Specific examples of real-world domain modeling efforts and how they fit into this framework can be found in [Iscoe, 1991], and it has become clear that the most prevalent operational goal across modeling efforts today is understanding the domain of a large software system [Arango, 1989]. One thing that seems to be universally lacking in efforts with this operational goal is the realization that a software system operating within a domain is a part of that domain, and deserves as much attention and detail in the model as any other part. The main reason for this oversight is that there is a historical reason for distinguishing procedural from declarative knowledge [Winograd, 1975], and as a result the two are typically represented differently: domain models are represented with frame based KR languages and programs are represented with programming languages.

This traditional separation between programs and domain models causes problems during the instantiation of a domain model that includes not only knowledge of the objects and attributes, but knowledge of the procedural aspects of the processes associated with the domain as well. The problems stem from the fact that domain modeling is a discipline in which advances are made incrementally, by building upon previous systems [Simon, 1991]. Some of the most significant results are in the form of methodologies which help other domain modelers to avoid pitfalls and and use techniques that work [Gruber, 1993].

The predominant methodologies for domain modeling clearly indicate that the instantiation of the model is the most time consuming part, and that the most important part of instantiation is ontological analysis [Alexander, Freiling, and Shulman, 1986] (which is more fully described in the next section). Ontologies for general taxonomies of objects are abundant, and there seem to be clear guidelines for developing new ones.

The problem is that for the knowledge required to represent procedural knowledge and reason about it (not with it), there are few guidelines, especially when the procedures requiring representation are implemented as software. There is not much background to draw upon, other than software information systems, as far as ontologies and methodologies for modeling what software does. Ontological analysis ended up being a large part of the effort for this research, since it had never been done before.

2.2.3 Ontological Analysis

The word ontology means "the study of the state of being." An ontology describes the states of being of a particular set of things. This description is usually made up of axioms that define each thing. In knowledge representation, an ontology has become the defining term for the part of a domain model that excludes the instances, yet describes what they can be. Ontological analysis is the process of defining this part of the model.

What makes up a specific domain ontology is restricted by the representational capabilities of the meta-model - the language used to construct the model. Each knowledge representation language differs in its manner and range of expression. In general, an ontology consists of three parts: concept definitions, role definitions, and further inference definitions.

The concept definitions set up all the types of objects in the domain. In object oriented terms this is called the class definitions, and in database terms these are the entities. There can be three parts to the concept definitions:

A role is an attribute of an object. In object-oriented terms it is a slot, in database terms (and even some KR languages) it is a relation. In the simplest case, a role for an object just has a value; the object mailbox-4 might have a role number-of-messages, for example, that would have a value which is a number. Roles also express relationships between objects. The same object might have a role called owner which relates mailbox-4 to the object person-2. Roles which represent relationships are unidirectional. A role definition may have up to three parts as well:

The final part of an ontology is the specification of additional inference that the language provides. Examples of this are forward and/or backward chaining rules, path grammars, subsumption and/or classification, demons, etc. An explanation of the inference mechanisms used in this research will be given in the next section.

2.2.4 Classic

Classic is a frame-based knowledge representation language that belongs to the family of description logics [Brachman, et al., 1991]. It is descended from KL-ONE [Brachman and Schmolze, 1985], and has been specifically designed to be mindful of the tractability of its own inferences [Brachman, et al., 1989].

Classic knowledge-bases are composed of four kinds of objects: concepts, roles, rules, and individuals. An ontology in Classic consists of a concept taxonomy, role taxonomy, role inverses, role restrictions and defaults. The role restrictions and defaults are specified as part of the concept definitions. Classic rules are forward chaining rules.

2.2.4.1 The Classic Language

Throughout this document, it has been necessary to make explicit use of the notation and terminology of Classic to explain and describe some of the more detailed aspects of this research. This section contains a brief introduction to the language of Classic in order to make it clear precisely how one goes about describing objects. This introduction only presents the subset of Classic which was used in this research. The Classic language is specifically designed to make it possible to describe objects in such a way that it is possible to determine automatically whether one object is subsumed by another. The peculiarities of the language arise from the goal of making this determination not only possible, but tractable.

To begin with, a typical concept description looks like this:

 (defconcept information-filter
  (and kbeds-object
       (all information-filter-of valid-mail-recipient)
       (at-least 1 information-filter-of)
       (all has-filter-condition kbeds-mail-message)
       (at-least 1 has-filter-condition)
       (at-most 1 has-filter-condition)
       (all has-filter-action kbeds-filter-action)
       (at-least 1 has-filter-action))
  :disjoint kbeds-thing)
 

This says an information-filter is subsumed by (or is more specialized than, or is a subclass of) kbeds-object (and therefore is also described by that concept), and that all the fillers (in Classic a filler is the value of a particular role on a particular object) for its information-filter-of role must be individuals of valid-mail-recipient, and that there must be at least one filler for that role, and similarly for the role has-filter-condition except that there can also be at most one filler (in other words, there can be only one filler), and so on. The disjoint specification identifies this concept as a member of a disjoint set, which means that an individual cannot be subsumed by more than one concept in that set. The most obvious example of a disjoint set would be the gender set, containing the concepts "male person" and "female person."

In the terminology of Classic, the description above is told information. Told information is precisely the information that is explicitly typed into the knowledge base. This is opposed to derived information, which is all the information that Classic derives or infers through its various mechanisms.

This description actually shows a primitive concept - one which Classic will not automatically try to classify. Classic will also not automatically try to find which individuals are subsumed by a primitive concept, this information must be told. This subtle notion may seem irrelevant, but it is the norm in most representation languages (when a concept is created the modeler explicitly names the parent concepts, and when an individual is created, the modeler explicitly names the concept that the new individual is an instance of). It is important in Classic because there is another kind of concept, the defined concept, which Classic actually does automatically classify and find individuals of. For example, in Figure 2.8 a simple taxonomy of primitive concepts is shown. Let us suppose we create a defined concept called vegetarian-mammal as follows: (and mammal (all food plant)). Next we create another defined concept called fruit-eating-person: (and person (all food fruit)). Classic will derive that vegetarian-mammal subsumes fruit-eating-person (why? because mammal subsumes person and plant subsumes fruit). If we created two individuals, joe and apple, and tell Classic that they are instances of person and fruit, resp., and further tell Classic that apple is a filler for joe's food role, Classic will derive that joe is an instance of fruit-eating-person (and therefore also vegetarian-mammal). Again, Classic will never derive that an individual is an instance of a primitive concept, it must always be told that.

This automatic classification of individuals of defined concepts through subsumption is a simple, yet extremely powerful process. It is the key to several significant advances in software information systems described in later sections.

Another important point about Classic is the Open World Assumption. Classic does not assume that the information it knows is all the information there is. Returning to the example above, we have the following told information about two individuals, joe: (and person (fills food apple)), and apple: fruit. Classic will then add all the derived information it can to these individuals, yielding joe: (and person mammal classic-thing (fills food apple)), and apple: (and fruit plant classic-thing). Where is the information about joe being a fruit-eating-person? The truth is that Classic cannot derive this yet. The definition of fruit-eating-person specifies that all the fillers for the food role of an individual must be instances of fruit. Classic does not assume that because it knows one (or two, or zero, etc.) fillers for an individual's role, that it knows them all. In fact, Classic assumes the opposite: it assumes it does not know them all.

There are two ways for Classic to figure out that it knows all the fillers for an individual's role. The first way is for it to be told, by closing the role. When a role is closed on an individual, it tells Classic that there can be no more fillers. In the above example, the user would have to close the food role on joe in order for Classic to derive that joe is an instance of fruit-eating-person (Classic always tries to reclassify individuals when their roles are closed). The second way is for Classic to derive that a role is closed on an individual if there is an at-most restriction on the role. For example, if the concept person (or mammal) additionally had (at-most 1 food) in its description, then since joe is told to be an instance of person that restriction would apply to him, and since he already has one filler in his food role, and since he can have at most one filler in his food role, he can have no more and the role is derived to be closed.

The final part of an ontology in Classic are the rules. Classic rules come in two forms, description rules and filler rules. All classic rules have as their antecedent a named concept, and are fired on an individual when the individual is classified as an instance of the concept.

The consequent of a classic description rule is a classic description which, when the rule fires on an individual, is merged into the description of the individual. For example, if we had a description rule like: vegetarian-mammal --> (at-most 0 has-prey), the rule would fire on joe when he is classified as a fruit-eating-person and would add the at-most restriction to the description of joe. Classic would then also derive that joe's has-prey role is closed as well.

The consequent of a classic filler rule is the name of a role and a LISP function that will be invoked when the rule fires. The function is passed the individual the rule fired on and the role named in the consequent, and returns a list of new fillers for that role and individual. One useful application for filler rules is to create inter-role dependencies. For example, the concept rectangle has three roles: length, width, and area. We could define a function for calculating the area in LISP as follows:

 (defun calculate-area (rect role)
  (let ((length (car (cl-fillers rect @length)))
        (width (car (cl-fillers rect @width))))
    (* length width)))
and then define a filler rule: rectangle --> area calculate-area, the filler for the area role would automatically be generated based on the fillers in the length and width roles.

2.2.4.2 Enhancements to Classic

It was necessary to extend Classic in several ways in order to support this research. Each extension had a different motivation, which may not be entirely clear until that aspect of the research is discussed. These extensions are explained here, however, so that the sections involving the research do not need to sidetrack into explanations of the underlying support.

The first extension to Classic was a facility for supporting what some other representation languages call path grammars or role transitivity [Fox, Wright, and Adam, 1985]. A very common form of inference in frame-based representations is one in which the fillers for one role in a class of individuals can always be found by following the same role path. For example, an individual representing an article in a journal might have a role called published-in which is filled with an individual representing a journal. Journal individuals could have a role called location which is filled with some string indicating the place where the journal is physically located. It makes sense that the article individual should also have a location that is the same as the location of the journal it is published in, and this can be represented as a path rule.

A path rule, like all Classic rules, has for its antecedent a concept name, and for its consequent a role and a role path (an ordered list of roles). When a rule fires on an individual, classic follows the role path to the end, and fills the role specified in the rule with all the values it finds at the end of the path. In the journal article example, the path rule would be: article --> location (published-in location). An individual of article might be described: (and article (fills published-in journal-10)), and the journal individual journal-10: (and journal (fills location "Shelf 10")). Classic would fire the path rule on the individual of article and follow the path: the first part of the path is published-in, which gets us to article-10, and the next part of the path is location which gets us to the string "Shelf 10." This value is then derived to be the filler for the location role of the individual of article. If a particular path ends "early," that is, the path leads to an intermediate individual that has no fillers for the next role in the path, no fillers are returned for that particular branch of the path.

The path rule facility was further expanded to allow for the expression of specialization overrides. A specialization override is a type of inference in which a value that would normally be derived to fill a role is blocked if and only if there is already a value filling the role that is more specialized. The most common example of this is in object-oriented languages, a class inherits all the methods of its superclass, except the ones that are already defined by the class.

The next enhancement to Classic was a facility for dumping individuals into a file, in essence there is no way to save the current state of a classic knowledge-base. While this may not sound like a significant extension, there is one aspect of dumping (or, more accurately, of loading) a knowledge-base that is very intricate: the order in which roles are closed. When a role is told to be closed on an individual, it means there can be no more fillers for that role - told or derived. However, when a derived role filler depends on a filler or fillers in other individuals, the role cannot be closed until the fillers it depends on are closed. There is an implicit ordering of role closing based on all the ways Classic can derive information.

The most significant enhancement to Classic was the addition of a facility for representing spanning objects [Welty and Ferrucci, 1994]. In reality, this enhancement is in the process of being made to Classic by its support team, and the spanning object facility used for this research was actually applied to the "dumped" knowledge-based - that is, the spanning functions worked by generating a text file containing Classic descriptions, then the knowledge-base was cleared and the text file could be loaded in. Until support for multiple universes of discourse is added to Classic (which will happen in the next major release), this was the only way to proceed. The only limitation this presented was an inability to change the first universe from the second.


Chris Welty - Dissertation - 17 SEP 1996
[Next] [Previous] [Up] [Top] [Contents]

Generated with Harlequin WebMaker