A large publisher of textbooks has recently been criticized for the quality of its books -- many think they're too hard to read. The company has decided to do a scientific analysis of their books to determine exactly how hard they are to read, and has hired you to help.
A group of overpaid consultants arrived at a top-secret formula for determining the difficulty of a section of text. They won't tell you how it's actually computed, but they need to know the number of nouns, verbs, adjectives, and adverbs contained in the text. Your program must open a file full of text, analyze its contents, and print a summary to the screen. Words that your program didn't recognize will be written to an output file for further study.
For this program, you will write the following functions:
OpenFiles, that takes
input and output file stream variables
as arguments and opens input and output files specified by
the user. It must prompt the user for the names of input files
until one is opened successfully. The same must then be done for
the output file.
Categorize, that is passed a single
word from the input file and returns the corresponding part of speech.
(You must create an enumerated type to represent the required parts
of speech, and return a value from this enumerated type.) Make sure
your function recognizes at least four of each required part of
speech. If the input word isn't among those that your function
recognizes, it should return a special enumerated type value reflecting
this fact. As part of the documentation (i.e. comments) for this
function you must include a description of how new words are added to
each category.
main, which calls OpenFiles,
reads text from the input file, and passes each word to
Categorize. It must keep an array of counters, one for
each part of speech, and increment them appropriately as new words are
processed. Words that aren't recognized should be written to the output
file. Once the input file has been completly processed, print the final
word counts to the screen.
Categorize function better. This criterion will only be worth
10%.
When reading from a file, you should use the get function
to avoid reading more text than your input string can hold. Since
get reads a whole line at a time, it's likely that the
string will contain more than one word. We are providing a function
called Parse that takes a string and returns an array of
individual words found within. (See parse.H for
details.)
The Parse function is being distributed in a file named
parse.o. You can't see the source code for the function,
but you can use it in your program by including parse.H
and putting the name of both your .C file and
parse.o on the g++ command line when compiling.
For example:
% g++ analyze.C parse.o
In order to include a local file (i.e. a file in the same directory as the file you are compiling) you use quotes ("") instead of angle-brackets (<>) around the file name. For example:
#include "iostream.h" #include "parse.H"
Both parse.o and parse.H are in ~cs102/Parse.
You should copy both files to your working directory before compiling.
The Parse directory also contains a sample input file named
story that you can use to test your program. Here is
some sample output:
% cat story
The black dog quickly
chased the slow white cat
up the (very large) tree.
% a.out
Please enter the name of the input file: story
Please enter the name of the output file: unknown
Summary of input text:
Nouns: 3
Verbs: 1
Adjectives: 3
Adverbs: 1
Unknown: 6
Unknown words were left in output file.
% cat unknown
The
the
white
up
the
very
Submit your assignment via the submit102 program.
Grading criteria: