CS 102 Assignment #5

Due: March 2

Sample Solution

A large publisher of textbooks has recently been criticized for the quality of its books -- many think they're too hard to read. The company has decided to do a scientific analysis of their books to determine exactly how hard they are to read, and has hired you to help.

A group of overpaid consultants arrived at a top-secret formula for determining the difficulty of a section of text. They won't tell you how it's actually computed, but they need to know the number of nouns, verbs, adjectives, and adverbs contained in the text. Your program must open a file full of text, analyze its contents, and print a summary to the screen. Words that your program didn't recognize will be written to an output file for further study.

For this program, you will write the following functions:

  1. A function, called OpenFiles, that takes input and output file stream variables as arguments and opens input and output files specified by the user. It must prompt the user for the names of input files until one is opened successfully. The same must then be done for the output file.
  2. A function, called Categorize, that is passed a single word from the input file and returns the corresponding part of speech. (You must create an enumerated type to represent the required parts of speech, and return a value from this enumerated type.) Make sure your function recognizes at least four of each required part of speech. If the input word isn't among those that your function recognizes, it should return a special enumerated type value reflecting this fact. As part of the documentation (i.e. comments) for this function you must include a description of how new words are added to each category.
  3. Your program main, which calls OpenFiles, reads text from the input file, and passes each word to Categorize. It must keep an array of counters, one for each part of speech, and increment them appropriately as new words are processed. Words that aren't recognized should be written to the output file. Once the input file has been completly processed, print the final word counts to the screen.
A well designed program will have a simple and flexible mechanism for adding new words to each category. You should concentrate first on getting your program to work properly, then, if you have time, focus on making your Categorize function better. This criterion will only be worth 10%.

When reading from a file, you should use the get function to avoid reading more text than your input string can hold. Since get reads a whole line at a time, it's likely that the string will contain more than one word. We are providing a function called Parse that takes a string and returns an array of individual words found within. (See parse.H for details.)

The Parse function is being distributed in a file named parse.o. You can't see the source code for the function, but you can use it in your program by including parse.H and putting the name of both your .C file and parse.o on the g++ command line when compiling. For example:

% g++ analyze.C parse.o

In order to include a local file (i.e. a file in the same directory as the file you are compiling) you use quotes ("") instead of angle-brackets (<>) around the file name. For example:

#include "iostream.h"
#include "parse.H"

Both parse.o and parse.H are in ~cs102/Parse. You should copy both files to your working directory before compiling. The Parse directory also contains a sample input file named story that you can use to test your program. Here is some sample output:

% cat story
 The black dog quickly
        chased the  slow white cat 
up the (very large) tree.
% a.out
Please enter the name of the input file: story
Please enter the name of the output file: unknown

Summary of input text:
  Nouns:          3
  Verbs:          1
  Adjectives:     3
  Adverbs:        1
  Unknown:        6

Unknown words were left in output file.

% cat unknown
The
the
white
up
the
very

Submit your assignment via the submit102 program.

Grading criteria: