CS 360
Winter 2015
Programming Language Concepts
Tuesdays, Thursdays 14:00-15:20
University Crossings 151

Instructor:
Geoffrey Mainland
mainland+cs360@cs.drexel.edu
University Crossings 106
Teaching Assistant:
Brian Lee
bl389@drexel.edu
Warning! This material is for an old version of the course.

CS 360 Homework 6: Finite Automata and Regular Expressions

In this assignment, you will implement several function we saw in lecture using Haskell and build a compiler from regular expressions to C.

Before you attempt the homework, be sure you are using our version of GHC. See the GHC guide.

You must implement the functions as specified. You may write other helper functions and define test data in your file, but you may not change the functions’ names or the number or order of arguments.

We have provided you with a shell for your solution here. Please extract this tarball in your ~/cs360/git directory and immediately commit the resulting hw6 directory. You can do this as follows:

$ cd ~/cs360/git
$ wget 'https://www.cs.drexel.edu/~mainland/courses/cs360-201425/homework/hw6.tar.gz'
$ tar xf hw6.tar.gz
$ git add hw6
$ git commit -m "Initial check-in for homework 6."

Your changes should be made to the files we have provided as described in each problem below. We have included a non-functional version of every function you must write, so you can also grep the source we have to provied to find the function we have asked you to implement.

We have included several tests for you. You may run these tests by executing the command make test in your ~/cs360/git/hw6 directory. Code that does not compile will receive a zero.

Your code must run on tux under the version of ghc that we provide.

This assignment is worth 90 points. There are 96 possible points.

Hints

  1. Read about list comprehensions in LYAH.
  2. You can escape characters (like '"') in string constants with a backslash, just as in C.

Working with the fsm program

When you type make, your code will be compile to a binary named fsm. This binary will take a regular expression and perform one or more operations on it:

  1. Dump a graphical representation of the NFA to a file using the --nfa-dot option.
  2. Compile the NFA to C using the --to-c option.
  3. Match the regular expression against a string using the --match option.
  4. Run some tests using the --test option.

Problem 1: Warmup: Sets (10 points)

Implement the power set function, $\mathcal{P}(\cdot)$, from lecture and place your implementation in the file Set.hs that we have given you. Recall that the power set $\mathcal{P}(S)$ of a set $S$ is the set of all subsets of $S$, including the empty set and $S$ itself.

The type signature for powerSet should look like this:

powerSet :: ... => ... -> ...

Before you write the function, figure out what its type signature should be. If you cannot write the function, you will receive partial credit for a correct type signature. However, your code must compile for you to receive any credit, so if you only write a type signature for powerSet, leave it in a comment.

Your changes should be made to the file Set.hs.

Problem 2: Translating Regular Expressions to NFAs (25 points total)

We have partially implement the translation from regular expressions to NFAs that we saw in lecture in the file RegExpToNfa.hs. For this problem, you must complete the implementation. All your changes should appear in the file RegExpToNfa.hs

Refer to the lecture notes if you do not understand how to construct an NFA from a regular expression.

Hints:

  1. Be careful to renumber the states of the second NFA in the sequence. This can be done with the function numberNfaFrom.
  2. When constructing the moves of the combined NFA, you will need to add one extra move from the accepting state of the first NFA to the starting state of the second NFA. You can use the acceptState function to get the single accepting state of an NFA.

Problem 2.1: Literals (10 points)

Implement the NFA construction for literals. It is almost exactly like the $\epsilon$ construction, which we have given you, but instead of creating an NFA with a single $\epsilon$ transition (represented by the Emove data constructor), you must create a transition on a literal character (represented by the Move data constructor).

Problem 2.1: Concatenation (15 points)

Implement the NFA construction for the concatenation of two regular expressions, i.e., one regular expression followed in sequence by another.

We have given you the implementation of alternation, i.e., one regular expression or another, as well as Kleene star. Unlike alternation, you do not have to create any new states.

Problem 3: Translating an NFA to a dot file (20 points)

For this problem, you will implement the conversion of an NFA into a graphical form by completing the toDot function in the file NfaToDot.hs. This function takes an NFA and outputs a description of the graphical representation of the NFA in the dot language. The dot language is a plain text graph description language. A command-line program called dot converts this plain text graph representation into a picture in one of many formats (pdf, svg, png, etc.). If you are curious, you can read about the dot language here, but the example below should tell you all you need to know.

You will need to implement the stateToDot and moveToDot functions.

The stateToDot function should generate a string in the dot language that describes a single NFA state using the dot language. Remember, this state could be an accepting state.

The moveToDot function should generate a string in the dot language that describes a single NFA transition using the dot language. This transition could be an $\epsilon$ transition.

Remember, there are two types of NFA nodes (accepting states and non-accepting states) and two types of edges ($\epsilon$ transitions and symbol transitions). It would make sense to write two helper functions: one for generating dot syntax for nodes, and one for generating dot syntax for edges. You could parametrize these functions by, e.g., shape and/or label. We have given you a string constant epsilon that you can use to label $\epsilon$ transitions.

Here is an example session:

$ ./fsm --nfa-dot=ab.dot ab
$ cat ab.dot 
digraph fsm {
rankdir="LR";
start [shape="plaintext",label="start"];
start->0;
0[shape="circle",label="q0"];
1[shape="circle",label="q1"];
2[shape="circle",label="q2"];
3[shape="doublecircle",label="q3"];
1->2[label="ε"];
0->1[label="a"];
2->3[label="b"];
}
$ dot -Tpdf -o ab.pdf ab.dot
$

The following command converts the ab.dot file into a PDF named ab.pdf.

dot -Tpdf -o ab.pdf ab.dot

If you are using one of the lab workstations, you can view this PDF using evince.

If you type make figs, a number of examples will be built to test your dot translator.

Problem 4: Implement NFA Matching (20 points)

Implement the nfaMatch function in the file NfaMatch.hs.

Start by figuring out what the base case is and handling it properly. In the base case, how can you tell if a match occurred?

Be sure to use the function epsilonClosure to calculate the $\epsilon$ closure of the initial state when you calculate the initial set of states for your NFA simulator. You will want to use one of the function we saw in lecture to handle transitions on a literal. I used a recursive helper function that takes the current set of states and a string to match. My helper function is two lines long, not counting the type signature.

You can test your nfaMatch function like this:

$ ./fsm "(ab)*" --match "ab"
(ab)* matched "ab" using the naive matcher
(ab)* matched "ab" using the nfa matcher
(ab)* matched "ab" using the dfa matcher
$ ./fsm "(ab)*" --match "a" 
(ab)* did not match "a" using the naive matcher
(ab)* did not match "a" using the nfa matcher
(ab)* did not match "a" using the dfa matcher
$

With the --match argument, the fsm program will attempt to match the regular expression using:

  1. The naive backtracking matcher we saw in lecture.
  2. Your nfaMatch with the NFA you built from the regular expression.
  3. Your nfaMatch with the minimized DFA we built from your NFA.

You can test your NFA simulator by running make test.

Problem 5: Compiling a DFA to C (20 points)

For this problem you will create a regular expression to C compiler by completing the implementation of the dfaToC function in the file DfaToC.hs. There are two local functions used by dfaToC that you must complete: genCharCase and genTransitionCase.

The genCharCase function generates a string containing the C code for handling a single character in the input (inside the switch (*(cs++)) block). For example, in the code below, genCharCase is called twice, once to generate the case statement that handles 'a', and again to generate the case statement that handles 'b'. The genCharCase function should call genTransitionCase for all transitions involving the character genCharCase was given as an argument. You will probably want to use map to do this.

The genTransitionCase function generates C code to handle the state transitions on a character. In the code below, it generates the inner case statements (inside the switch (state) blocks).

Both functions should generate a string. This string will contain several lines of code. You may find it useful to use the stack function to combine multiple lines of code as we have done elsewhere in this file.

The --to-c argument to fsm will use the dfaToC function to generate code to match the specified regular expression. We have included a simple driver as main.c. You can use it as follows:

$ ./fsm --to-c=match.c "(ab)*" 
$ make match
gcc -o match main.c match.c
$ ./match "" ab a             
"" matched
"ab" matched
"a" did not match
$

This is the contents of my match.c when compiling “(ab)*”

int match(const char* cs)
{
  int state = 0;
  int accept = 1;
  while (1) {
    switch (*(cs++)) {
      case 'a':
        switch (state) {
          case 0:
            state = 1;
            accept = 0;
            break;
          default: return 0;
        }
        break;
      case 'b':
        switch (state) {
          case 1:
            state = 0;
            accept = 1;
            break;
          default: return 0;
        }
        break;
      case '\0': return accept;
      default: return 0;
    }
  }
}

Problem 6: Homework Statistics (1 point total)

How long did it take you to complete each problem? Please tell us in a comment in each of the files you submit. You must tell us how long each problem took you to receive the point.