by JL Popyack

PURPOSE:

After completing this assignment, you will have further experience with subprogram design, use, and implementation, and with argument passing via value and reference.


THE ASSIGNMENT:

This assignment consists of four sections:

  1. Suggested problems (not to be turned in).
  2. Written problems to be turned in.
  3. A program to be completed.
  4. Extra credit for the programming assignment.
Click here to return to top of this page..

  1. Suggested Problems:

    From the Horstmann text:
    Chapter 5 (pp. 195 - 205): R5.9, R5.10, R5.11, R5.12

    Back to Top


  2. Written Problems:

    From the Horstmann text, Chapter 5 (pp. 195 - 205):

    1. Problem R5.8 (6 points)
    2. Problem R5.20 (5 points)

    Back to Top


  3. A Program

    Have you noticed that a once-moderate winter turned rather chilly shortly after Punxsutawney Phil's weather prediction on Groundhog Day? Is Phil a reliable weather predictor? Time for some analysis.

    You are to write a C++ program that will extract information from a text file, compute some statistics and produce some output files. In particular, your program will:

    Your program will first prompt the user for the name of a text file, and then verify that the file can be found. If the program cannot open the file for some reason, it will give a brief explanation to the user, then ask the user to try again. If at any time the user wishes to give up, (s)he can type QUIT at the prompt, causing the program to exit without doing anything. The text file you will use is from Groundhog.org, the official website of the Punxsutawney Groundhog Club. You should save a copy of this file on your local machine as type "Web Page, HTML Only" for your use in this assignment. (We are grateful to Alan Freed of OnTV, the website's designer, for permission to use this page as data for the assignment.) has been archived here in the event the original is unavailable.

    Your program will then read the file, one line at a time, and decide whether or not the line contains useful information. It contains useful information if it begins with a 4-digit number signifying the year of a Groundhog Day observance. If it does, it will contain not only the year, but other information including HTML formatting and either the text "Saw Shadow", "No Shadow", or something else indicating an unknown or inconclusive result. For instance, the following lines are repeated verbatim from the file:
     <tr><td CLASS=DATE>
     1899</td><td>No Record.
     </td></tr>
     <tr><td CLASS=DATE>
     1900</td><td>Saw Shadow.
     </td></tr>
     <tr><td CLASS=DATE>
     1901</td><td>Saw Shadow.
     </td></tr>
     <tr><td CLASS=DATE>
     1902</td><td>Probably No Shadow.
     </td></tr>

    Unfortunately, however, not all lines are that predictable. Here are some other lines excerpted from the file:
    1940</td><td>Saw Shadow at 9 AM; 
    <br>
    Pictured with first Groundhog Day Queen, Miss Margaret Hunam.
    </td></tr>
    <tr><td CLASS=DATE>
    1941</td><td>Saw Shadow at 4:25 PM;
    <br>
    Six Groundhog Girls.
    </td></tr> <tr><td CLASS=DATE>
    1942</td><td>Partial Shadow at 7:40 AM;
    <br>
    "War clouds have blacked out parts of the shadow."
    <br>
    (<i>The Spirit,</i> Feb. 2, 1942).
    </td></tr>

    Your program should keep a count of how many "lines of interest" contained "Saw Shadow", how many contained "No Shadow", and how many total "lines of interest" were found. (NOTE: in some cases - e.g., 1887 and 1942 - a human reader may deduce that a shadow was seen, but according to the rules described here, those years may be counted as "unknown", since the line containing the year did not contain either "Saw Shadow" or "No Shadow".) After all input is read, your program should output statistics on the console reporting the number of occurrences of "Saw Shadow", "No Shadow", and "unknown", total occurrences, and the percentages of each type.

    Your program should also produce two output files. One will be a transcript showing the result for each year on a separate line, in some pleasant to read format for human reading. (Since this output may be too long to capture on the console window, a file is appropriate for this output.) The other will be a tab-delimited text file containing only the year and result information, using the tab character "\t" to separate entries on a line. (For example, the entries in your output file for the years shown earlier may look like this, with " \t " signifying the tab character - there should be no spaces between the year and the results, just the tab character):
     1899 \t ?
     1900 \t yes
     1901 \t yes
     1902 \t no

    You will be able to tell if your file is properly formatted if it can be imported into Microsoft Excel, creating two separate columns for the data values.

    A
    B
    1899 ?
    1900 yes
    1901 yes
    1902 no



    DETAILS:

    You should write the following subprograms and use them in this program, as described below:

    1. void getfile(istream& in, string& theFile) is a procedure that works analogously to getline. Whereas getline reads an entire line from an istream, getfile reads the entire file into a string variable, including line breaks.
    2. string getTableEntry(const string& theFile, int& pos) is a function that examines theFile and retrieves the information between the next occurrences of <td> ... </td>, starting at position n. If none is found, a null string is returned. It also updates pos to the first position following the information retrieved. (Note that pos is passed by reference.)
    3. bool containsUsefulInfo(string entry) is a predicate function that returns a value of true if entry begins with a year in the given range, as described above, false otherwise.
    4. string extractResult(string line) is a function that returns one of three values: whether or not the groundhog saw its shadow, or result unknown.

    You are free (and encouraged!) to write more subprograms should you find them useful.

    Your program (and all subsequent programs) must use good style, including proper variable names, use of indentation and whitespace, and commenting. Students are advised to follow the style guidelines supplied by the Horstmann text in Appendix A. (You should also consult the on-line version found on the course web page which contains instructors' editorials).

    Students are also being graded on their User Interface. A quality user interface should be easy to understand, and supply adequate instructions for use. Spend a few minutes on each program from here on out to design make your program as "user-friendly" as possible. (You may want to use various I/O manipulators to assist you).

    Note that for this (and all) programs, you must supply appropriate documentation.

    Back to Top


  4. Extra Credit:
    1. (5 points) In the program, before saving the output to a file, check to see if a file with that name already exists, and give the user the option of not overwriting it.
    2. (5 points) Mathematics texts routinely use the "=" to mean both assignment (as in "let x=5") and comparison (as in "if x=0 ..."). Programming language designers have found this ambiguity to be a problem. Research and report on how the languages FORTRAN and Pascal handle these concepts. What statement is equivalent to the C++ statement
      if( x == 0 )
        y = 15 ;

      in each of these languages?
    3. (5 points) The find method for string objects returns a value between 0 and the string's size (minus 1) when the value is found. When not found, it returns a value outside that range. What value? You may think it is -1. However, that is not exactly correct. To understand what is really going on will require knowledge of some other types besides ints, namely the unsigned int and size_t types.

      Because find returns the value of an index in the string, it should be a nonnegative number. So, it makes sense for find to return an unsigned int rather than an int. (Binary number representation will allow twice as many unsigned ints as ints, since negative numbers need not represented, which means that the maximum size of a string can be twice as long.) Unfortunately, we have already seen that the language standard may permit ints to be shorter than long ints on a given system. Some systems also permit long long ints. Goodness! You can't even guarantee that an unsigned long long int would be recognized by your compiler.

      What is really needed is a type that will always be the largest possible word size, which is what size_t is. That is the return type of find. When you use the statement int pos = str.find("x") , it actually is doing a type conversion from a size_t to an int . Visual C++ 2003 will give you a warning message about this. It really doesn't matter unless the size of the string exceeds the largest value of an int, or if the string you are looking for ("x" in this example) is not found. The actual value returned by find when the string is not found is the largest possible integer value, which gets interpreted as -1 when converted to an int.

      In case you are wondering why they didn't come up with a symbolic name for this number, they did. It is a property of the string class. You can either use string.npos (yes, that is the actual class name, string, in use) or, since it is a property of every string, you can use it directly as a property of any given string, e.g., str.npos in the example above.

      Whew! What does this all mean? It means the proper way to deal with strings is to use size_t and npos where appropriate. Notice that if you do this, your warning messages about type conversion will disappear, too. Use these throughout in your program for extra credit.

    4. (10 points) Problem P4.3 from the Horstmann text.

    Back to Top


WHAT TO SUBMIT:

All homework for this course must be submitted electronically using WebCT. Do not e-mail your assignment to a TA or Instructor! If you are having difficulty with your WebCT account, you are responsible for resolving these problems with a TA, an Instructor, or someone from IRT, before the assignment it due. It is suggested you complete your work early so that a TA can help you if you have difficulty with this process.

For this assignment, you must submit:

Click here to return to top of this page.

ACADEMIC HONESTY:

You must compose all written material yourself, including answers to book questions. All material taken from outside sources must be appropriately cited. If you need assistance with this aspect of the assignment, see a consultant during consulting hours.

Click here to return to top of this page.