CS 360

Winter 2016

Programming Language Concepts

**CS 360-001** Tuesday/Thursday 15:30-16:50 (Rush 014)

**CS 360-002** Tuesday/Thursday 14:00-15:20 (Rush 014)

**CS 360-003** Tuesday 18:30-21:20 (University Crossings 153)

Instructor: |
Geoffrey Mainland mainland+cs360@drexel.edu Office: University Crossings 106 Office hours: Mondays 3pm–5pm; Thursdays 5pm–6pm. |

Teaching Assistants: |
Pavan Kantharaju Matthew Roll |

In this assignment, you will implement several function we saw in lecture using Haskell and build a compiler from regular expressions to C.

You must implement the functions as specified. You may write other helper
functions and define test data in your file, but you **may not** change the
functions’ names or the number or order of arguments.

Your changes should be made to the files we have provided as described in each problem below. We have included a non-functional version of every function you must write, so you can also grep the source we have provided to find the function we have asked you to implement.

Note that source files are located in the `src`

subdirectory.

This assignment is worth 70 points. There are 96 possible points.

Your code must run either on `tux`

or on the course VM using the version of GHC
(7.8.3) that we provide.

A solution template is available in the DrexelCS360/homeworks GitHub repository. You should only need to modify the files provided by the template—please do not check in any files beyond those that the template provides.

You can check that your code compiles by typing `make` in the `hw7`

directory. If `make`

does not complete successfully, it means your code does not
compile. **Code that does not compile will receive a zero**.

We have included several test for your convenience. Passing all provided tests
does not guarantee full credit, but failing tests does guarantee less than full
credit. You can run the tests by typing `make run-tests` in the `hw7`

directory.

- Read about list comprehensions in LYAH.
- You can escape characters (like
`'"'`

) in string constants with a backslash, just as in C.

`fsm`

programWhen you type `make`

, your code will be compile to a binary named `fsm`

. This
binary will take a regular expression and perform one or more operations on it:

- Dump a graphical representation of the NFA to a file using the
`--nfa-dot`

option. - Compile the NFA to C using the
`--to-c`

option. - Match the regular expression against a string using the
`--match`

option.

Implement the power set function, $\mathcal{P}(\cdot)$, from lecture. Recall that the power set $\mathcal{P}(S)$ of a set $S$ is the set of all subsets of $S$, including the empty set and $S$ itself.

The type signature for `powerSet`

should look like this:

Before you write the function, figure out what its type signature should be. If
you cannot write the function, you will receive partial credit for a correct
type signature. However, your code **must compile** for you to receive any
credit, so if you only write a type signature for `powerSet`

, leave it in a
comment.

Please note the following constraints:

- You
**may not**change the code in`Set.hs`

. - Solutions that use
`fromList`

or`toList`

will receive zero credit. You must use the`Set`

abstraction without relying on the fact that “under the covers” it is implemented using lists.

Your changes should be made to the file `PowerSet.hs`

.

We have partially implement the translation from regular expressions to NFAs
that we saw in lecture in the file `RegExpToNfa.hs`

. For this problem, you must
complete the implementation. All your changes should appear in the file
`RegExpToNfa.hs`

Refer to the lecture notes if you do not understand how to construct an NFA from a regular expression.

Hints:

- Be careful to renumber the states of the second NFA in the sequence. This
can be done with the function
`numberNfaFrom`

. - When constructing the moves of the combined NFA, you will need to add one
extra move from the accepting state of the first NFA to the starting state of
the second NFA. You can use the
`acceptState`

function to get the single accepting state of an NFA.

Implement the NFA construction for literals. It is almost exactly like the
$\epsilon$ construction, which we have given you, but instead of creating an NFA
with a single $\epsilon$ transition (represented by the `Emove`

data
constructor), you must create a transition on a literal character (represented
by the `Move`

data constructor).

Implement the NFA construction for the concatenation of two regular expressions, i.e., one regular expression followed in sequence by another.

We have given you the implementation of alternation, i.e., one regular
expression **or** another, as well as Kleene star. Unlike alternation, you do
not have to create any new states.

For this problem, you will implement the conversion of an NFA into a graphical
form by completing the `toDot`

function in the file `NfaToDot.hs`

. This function
takes an NFA and outputs a description of the graphical representation of the
NFA in the dot language. The dot language is a plain text graph description
language. A command-line program called `dot`

converts this plain text graph
representation into a picture in one of many formats (pdf, svg, png, etc.). If
you are curious, you can read about the dot language
here, but the example below
should tell you all you need to know.

You will need to implement the `stateToDot`

and `moveToDot`

functions.

The `stateToDot`

function should generate a string in the dot language that
describes a single NFA state using the dot language. Remember, this state could be
an accepting state.

The `moveToDot`

function should generate a string in the dot language that
describes a single NFA transition using the dot language. This transition could
be an $\epsilon$ transition.

Remember, there are two types of NFA nodes (accepting states and non-accepting
states) and two types of edges ($\epsilon$ transitions and symbol
transitions). It would make sense to write two helper functions: one for
generating dot syntax for nodes, and one for generating dot syntax for
edges. You could parametrize these functions by, e.g., shape and/or label. We
have given you a string constant `epsilon`

that you can use to label $\epsilon$
transitions.

Here is an example session:

```
$ ./fsm --nfa-dot=ab.dot ab
$ cat ab.dot
digraph fsm {
rankdir="LR";
start [shape="plaintext",label="start"];
start->0;
0[shape="circle",label="q0"];
1[shape="circle",label="q1"];
2[shape="circle",label="q2"];
3[shape="doublecircle",label="q3"];
1->2[label="ε"];
0->1[label="a"];
2->3[label="b"];
}
$ dot -Tpdf -o ab.pdf ab.dot
$
```

The following command converts the `ab.dot`

file into a PDF named `ab.pdf`

.

```
dot -Tpdf -o ab.pdf ab.dot
```

If you are using one of the lab workstations, you can view this PDF using `evince`

.

If you type `make figs`

, a number of examples will be built to test your dot
translator.

Implement the `nfaMatch`

function in the file `NfaMatch.hs`

.

Start by figuring out what the base case is and handling it properly. In the base case, how can you tell if a match occurred?

Be sure to use the function `epsilonClosure`

to calculate the $\epsilon$ closure
of the initial state when you calculate the initial **set** of states for your
NFA simulator. You will want to use one of the function we saw in lecture to
handle transitions on a literal. I used a recursive helper function that takes
the current set of states and a string to match. My helper function is two lines
long, not counting the type signature.

You can test your `nfaMatch`

function like this:

```
$ ./fsm "(ab)*" --match "ab"
(ab)* matched "ab" using the naive matcher
(ab)* matched "ab" using the nfa matcher
(ab)* matched "ab" using the dfa matcher
$ ./fsm "(ab)*" --match "a"
(ab)* did not match "a" using the naive matcher
(ab)* did not match "a" using the nfa matcher
(ab)* did not match "a" using the dfa matcher
$
```

With the `--match`

argument, the `fsm`

program will attempt to match the regular
expression using:

- The naive backtracking matcher we saw in lecture.
- Your
`nfaMatch`

with the NFA you built from the regular expression. - Your
`nfaMatch`

with the minimized DFA we built from your NFA.

You can test your NFA simulator by running `make run-tests`

.

For this problem you will create a regular expression to C compiler by
completing the implementation of the `dfaToC`

function in the file
`DfaToC.hs`

. There are two local functions used by `dfaToC`

that you must
complete: `genCharCase`

and `genTransitionCase`

.

The `genCharCase`

function generates a string containing the C code for handling
a single character in the input (inside the `switch (*(cs++))`

block). For
example, in the code below, `genCharCase`

is called twice, once to generate the
case statement that handles `'a'`

, and again to generate the case statement that
handles `'b'`

. The `genCharCase`

function should call `genTransitionCase`

for
all transitions involving the character `genCharCase`

was given as an
argument. You will probably want to use `map`

to do this.

The `genTransitionCase`

function generates C code to handle the state
transitions on a character. In the code below, it generates the inner case
statements (inside the `switch (state)`

blocks).

Both functions should generate a string. This string will contain several lines
of code. You may find it useful to use the `stack`

function to combine multiple
lines of code as we have done elsewhere in this file.

The `--to-c`

argument to `fsm`

will use the `dfaToC`

function to generate code
to match the specified regular expression. We have included a simple driver as
main.c. You can use it as follows:

```
$ ./fsm --to-c=match.c "(ab)*"
$ make match
gcc -o match main.c match.c
$ ./match "" ab a
"" matched
"ab" matched
"a" did not match
$
```

This is the contents of my `match.c`

when compiling “(ab)*”

```
int match(const char* cs)
{
int state = 0;
int accept = 1;
while (1) {
switch (*(cs++)) {
case 'a':
switch (state) {
case 0:
state = 1;
accept = 0;
break;
default: return 0;
}
break;
case 'b':
switch (state) {
case 1:
state = 0;
accept = 1;
break;
default: return 0;
}
break;
case '\0': return accept;
default: return 0;
}
}
}
```

How long did it take you to complete each problem? Please tell us in a comment in each of the files you submit. You must tell us how long each problem took you to receive the point.