Lab - Design and Implementation
Questions will be denoted with a Q.
- Make some appropriate subdirectory for this lab and go there
- Copy (recursively) all the files from
~kschmidt/public_html/CS265/Labs/Design/ to your lab directory. Note that
there is a symlink to a directory containing the C files. You might want
to read up on the
-L option to
The Markov Chain algorithm
- We will use a sentinel string, "\n" (a single newline):
- The very first (2-word) prefix (State) to be put into the
table will be ("\n", "\n"). What follows is a suffix that would begin
- Note, there may be more than one suffix here, if we modified the
program to read several input texts.
- The last State in a text will have, as a suffix, "\n"
- A "this is a fine place to stop" kind of token.
- Again, if multiple texts are input, there might be several good
places to end a text in the table
The C implementation
- Look at markov.c
- Look at the 2 structs, and the array of State pointers,
- The statetab *is* your hash table.
- A table of pointers to linked lists (a table of linked lists)
- The linked-lists contain entries
- Note, initially table is all NULL
- The "buckets" are just linked-lists of States
- Each entry (prefix) is stored in a State object
- Each entry (prefix) is stored exactly once in the table
- Each State is associated with one or more suffices
- These suffices are stored in a linked-list. This list is a State's
satellite data; the State (prefix) is the key, the
list is the associated data
- Suffix is a node in this list; it stores a single
Q1 What is the difference between add() and
Q2 What does lookup do? Which function(s)
- Note that space is allocated for each string (single word) exactly
- Everybody maintains pointers to these things.
- Consider, as input:
It's a new dawn
- Draw the table, much as in the notes, after build is called
Q3 Include the table in your lab sheet. How many
references (pointers) are there to each string:
Q4 What are the advantages of this?
Q5 What are the drawbacks?
- Does the program explicitly give this memory back?
- Wherever we have more than one reference to heap memory we have a
Q6 Write a function to clean up statetab when
we're all done, give the memory back (don't spend a lot of time
here). In-line your function here, in the lab-sheet.
Q7 Does your function work properly? What difficulties
did you have?
Q8 Take a step back from the table. How could you make
sure that each string was freed exactly once?
The C++ Implementation
- Look at markov.cc
Q9 How are the prefixes stored? Why not use a
Q10 What serves as our dictionary (replaces our hash
Q11 How is the satellite data (list of suffices)
Q12 Are there any advantages to this implementation
over the C implementation? What are they?
Q13 Are there any drawbacks?
The Python Implementation
- Look at markov.py
Q14 How are the prefixes stored?
Q15 What serves as our dictionary (replaces our hash
Q16 How is the satellite data (list of suffices)
Q17 Are there any advantages to this implementation
over the C++ implementation? What are they?
Q18 Are there any drawbacks?
Strictly for Fun
You are done with your gradesheet. Nothing more will go there. Still
need to follow these instructions, but will not submit anything as a result.
See the fortune directory (do a man; it moves around a
little bit). It's usually somewhere around
Modify one of the above implementations so, instead of an entire file
being a "story" (a start and stop place), each entry in the fortunes files
(separated by a %) is a new "story".