CMPU 235 : Software Design Methodologies
Assignment 2

Due: Tuesday Feb 19, 2001

The cs235::string Class

Each group must use the string class provided to them by their designated supplier. When I grade the programs I will use the string.o file submitted by the developer of the class, not the .o file submitted by the user of the string class.

During the course of this assignment you may be asked by the group using your own string class for improvements or bug fixes. You should attend to these requests promptly as the other group can likely not procede until the problems are fixed. When requesting improvements to the string class make sure the improvements you request are necessary for the operation of your own program; the developing group does have another program to write. Try to request any changes to the string class you are using well before the due date of the assignment.

If you make changes to your string class please document them someplace in the Cocoon generated HTML files.

XML

For this assignment each group is to write a simple parser for a tiny subset of XML. The markup language consists of start tags, end tags and content. A start tag consists of a single word inside angle brackets and an end tag consists a '/' prepended to a single word inside angle brackets. For each start tag there must be a matching end tag. Content is everything in between a start tag and its matching end tag. A Tag will either contain literal content or other tags as content. All tags in the document form a tree structure and there will always be exactly one root tag in the file. For example:

<superhero>
   <name>Superman</name>
   <weakness>Kryptonite</weakness>
</superhero>

Here, the name tag has the literal content "Superman" and the weakness tag has the literal content "Kryptonite". The superhero tag has the name and weakness tags as content and is the root tag. A tag will never contain both literal content and other tags.

The parser should produce a tree that represents the XML document. Each tag in the file will become an internal node in the tree. Literal content is stored in the leaf nodes in the tree. Note, end tags are not explicitly represented in the tree as they are implied by the existence of the start tag.

 

The Program

The program will consist of three components, a scanner, a parser, and an interactive component for searching the XML tree.

Scanning

The scanner should read the XML document and identify the start and end tags. The scanner should also be able to return the literal content between tags.

suggestion: Have a method in the scanner that returns the next tag, or content, that is found in the file. The method should skip over white space, if the first non-whitespace character encountered is a left angle bracket then a start or end tag has been encountered. The type of tag encountered can be determined by examining the next character in the file, if it is a '/' then an end tag has been found otherwise it is a start tag. In either case the name of the tag can be scanned. If the first non-whitespace character encountered is anything other than a left angle bracket then content has been found and the method can proceed to scan everything up to the next left angle bracket.

Parsing

The parser will maintain a stack of tree nodes that will be used to construct the XML tree. Since tags and content will eventually be represented by nodes in the tree it makes sense to push tree nodes onto the stack (although, you are free to do otherwise). The parser will repeatedly call the scanner to get the next tag from the file. When a start tag is encountered the parser should create a tree node and label it with the name of the tag. The tree node should then be pushed onto the stack. When content is encountered the parser should create a leaf node, copy the content to the leaf node , and push the leaf node onto the stack. When an end tag is encountered the parser should pop the stack until the node containing matching start tag is found. All items popped from the stack should be made children of the node with the start tag and the node should be pushed back onto the stack.

When the end of the file has been encountered there should be exactly one item on the stack (unless it was an empty file), and that is the node corresponding to the root tag in the file.

XML Tree Queries

Once the XML tree has been constructed the user should be prompted for a query to perform on the tree. Minimally the user should be able to enter the name of a tag and receive a listing of all subtrees rooted at nodes with that tag. For example, given the sample input file, if the user entered the query book then they should receive a listing of all books, with the author's first and last names, title, price, etc. All output should be neatly formatted.

The user should also be able to enter the name of a tag and a matching condition that must hold for the tag to be selected. For example the user might enter:

select book where author/name/first = Bruce

and receive just the information for the book "Foo Fighters". The format of the queries is to be determined by each group and the above syntax is strictly for example purposes.

Note: To view the sample input file in your browser select view source (or something similar), otherwise the browser ignores what it thinks are malformed HTML tags.

Deliverables

Each group must submit the following:

Grading

The following criteria will be considered when assigning grades: