Intelligent Assistance for Web Navigation

[Next] [Previous] [Top]


2 Ontology for Electronic Information

From a KR perspective, the central issue in the Untangle project is the ontology for representing information that is in electronic form.

2.1 Previous Work

This work began before the World Wide Web became popular, and the initial goal was to support intelligent email distribution. The basic elements of the ontology remain unchanged from the KBEDS system originally described at FLAIRS-94 [Welty, 1994a], however some minor changes have been made to bring it in line with the standard bibliographic ontology that is part of the Ontolingua Ontology Library [Gruber, 1994]. This updated ontology is described in [Welty, 1995], and while the reader is referred to either of these previous papers for more in-depth details of the main ontological elements, a brief discussion follows.

The ontology of electronic information is broken into three disjoint parts: information-items, information-views, and event-or-objects. These concepts partition all the information in the knowledge base. An information-item is a piece of information that, typically, describes an event or object, and an information-view is a particular view of that piece of information.

Figure 1 shows individuals of these three concepts. In this figure, and throughout this paper, the dashed lines represent the individual-of relationship, the solid lines represent the named relationships (roles), the rounded boxes are individuals, and the sharp boxes are concepts.*1.

The individual Chris, an object, represents a person. This person has a piece of information, a resume, associated with it through the has-information link. The inverse of this link, information-of (not shown), typically represents the fact that the information item is about the event or object. The information item, Chris-resume, has two views. It is important to realize that each of these views is the same information, simply in different formats, and therefore it would not make sense to represent this as Chris having two separate resumes. This is why the three kinds of individuals are distinct.

Having the subjects of the information available in the representation is the key advantage to this approach. While most users of the system will be after particular information items, it is the objects and events that tie the various pieces of information together and facilitate the inference that makes the system intelligent.

A frequent source of confusion in the ontology is the representation of paper, a sub-concept of publication, as an object. Many see this as obviously an information item. A paper is, however, an object which has a title, an author, may reference and be referenced by other papers, may be published in a book or journal, etc. It can have several information items associated with it as well: an abstract, a summary, a review, or the document text itself [Gruber, 1994]. Each of these latter kinds of items would be individuals of information-item. Once the existence of the document-text concept, which is clearly an information item that can have multiple views, is realized, the representation becomes more clear. Figure 3 shows an example of how a paper is represented.*2

2.2 New Goals of the Research

With the explosive growth and popularity of The WWW as a medium for disseminating information, it seemed far more interesting, practical, and trendy to apply the ontology for electronic information to this huge tangled mess. The original goals were therefore subtly altered: to provide intelligent assistance for navigating The Web.

The most popular navigational tool available on The Web is Yahoo, which presents a hierarchical view of subject areas that can be browsed top-down, or searched for keywords. The incredible popularity of this service indicates that it is useful and desperately needed, and clearly this form of information organization (a hierarchy of increasingly specific subjects) lends itself quite easily to expression in a KR system.

There are numerous shortcomings to Yahoo that can easily be overcome with KR technology. There is little place in the Yahoo taxonomy for web pages that can't be explicitly classified. A person's home page, for example, will typically not be be located on Yahoo, since it describes a person, not a specific topic.

The granularity of the Yahoo categories is often too large, and there is no other information about the web pages that might be used to constrain a search, other than keywords appearing in the link name. The Artificial Intelligence category, for example, has over 150 links. If you are interested in only AI conferences, however, you can either browse the 150 links, or try to find everything in the AI category that has the word "Conference" in its name. The latter search would miss such important items as FLAIRS-96 or the AAAI Spring Symposium Series, and clearly it completely lacks the much needed ability to search for "conferences in Florida."

In addition, without something to tie them together, related web pages will simply be listed in alphabetical order, thus separating such things as the web pages that describe the various special tracks of FLAIRS. If there was knowledge that these different pages describe aspects of the same event, it could be helpful to someone searching for this information.

Clearly KR has much to offer as a technology for assisting web navigation, and the Untangle Project began as an effort to apply KR to this domain.

2.3 Obstacles

Criticisms of Yahoo aside, it is an extremely useful service, and the initial plan for the Untangle Project was to at least duplicate the functionality of Yahoo, and build from there. This deceptively simple goal, however, presented some obstacles whose solutions were fairly good lessons for ontological development.

The initial approach used to integrate the Yahoo subject taxonomy into the ontology of electronic information was to create a concept called web-page, below which were the subjects, as shown in Figure 4. This was essentially the Yahoo hierarchy, since Yahoo is only capable of representing Web pages, and not the objects or events behind them.

This approach works for web pages, but when the individuals in the event-or-object category were considered, it quickly became clear that they, too, needed to be grouped by subject. Suddenly the ontology grew to the point where there was a subject taxonomy below the concept person (i.e. science-person, art-person, etc...), below conference (i.e. science-conference, art-conference, etc...), below publication, organization, and so on.

Aside from the obvious problem of too many redundant concepts, there was nothing to indicate that sets of concepts such as ai-person, ai-page, ai-conference, etc., were somehow related. It would seem like an obvious and desirable inference to be able to make that if a web page is an information item of a conference, and the web page is considered an ai-page, then the conference must be an ai-conference. Figure 5 shows a situation with the simple inference missing.

Clearly this would be fairly easy to remedy with a rule that says any individual of an ai-page is the information-item-of an individual of ai-conference. The problem is that a separate rule is needed to infer this between database-conference and database-page, art-conference and art-page, and between every pair of related subjects.

The next approach was to take out the subject sub-taxonomies, and create a concept called subject, whose individuals were all the different subjects, and these individuals could fill the role has-subject in all events, objects, web pages, etc. This worked a little better, since one rule would now cover all the cases mentioned above, but the hierarchical ordering of subjects was lost. It was possible to create a role called sub-subject between individuals of subject, but this was undesirable because Classic uses subsumption as its central form of inference [Brachman, et al., 1991], and the built-in subsumption facilities do not recognize any named roles as a specialization link.

2.4 Solution

The obvious solution may seem fairly clear at this point, but it was stubbornly avoided for semantic reasons. The solution was to create a fourth kind of concept, call it subject for the moment, that was the parent concept of a taxonomy of subjects, such as computer-science, artificial-intelligence, art, drama, etc. Figure 6 shows an example of this fourth taxonomy merged with the previous example.

With this approach the taxonomy of subjects is preserved, the natural subsumption relationship between Classic concepts is used, there is no duplication in the concept hierarchies, and a single rule can be used to infer the subject of an event or object if it is known of an information item. Every individual is an individual of one of event-or-object, information-item, or information-view, and in addition is an individual of any number of concepts in the subject taxonomy.

Clearly this is an elegant solution to the problem in an operative sense, but semantically there is a problem with it. Every individual of a concept is also an individual of all the concept's parents, thus in Figure 6 the individual FLAIRS is an individual of cs and subject. Consider the normal meaning of being an individual: the individual is a whatever the concept is. For FLAIRS, it makes sense to say that it is a conference, but is it a subject, or even an AI or CS?

This semantic difficulty may seem trivial, but the Untangle Project was founded on the very notion that a deep and more accurate representation of information on the web would facilitate access to that information. Compromising on the semantics, on the very way the information is interpreted, seemed to violate the principles of the project and to create the possibility for confusion.

Religion aside, the final (perhaps current is more appropriate) solution was actually not far off. Simply by adding thing to the names of the concepts in the subject taxonomy, and changing the name of the top concept to represented-thing, the semantics are no longer confusing. FLAIRS is a conference, an ai-thing, cs-thing, and represented-thing.


Intelligent Assistance for Web Navigation - 18 OCT 95
[Next] [Previous] [Top]

Generated with CERN WebMaker