- Regular expressions (CS 265, CS 270, ECE 200)
- Finite state machines (CS 270, ECE 200)
- Chapter 10 of Alfred V. Aho and Jeffrey D. Ullman, Foundations of Computer Science - C Edition, W. H. Freeman and Company, 1995 (text for CS 270).

- Sec. 4.1 of the text (tokens and scanning)

Regular expressions describe patterns which can be recognized by finite state machines (FSM). It is possible to algorithmically construct a FSM that corresponds to a given regular expression. A FSM can be described by a transition table (program), which can be represented by a string. A FSM can be simulated to recognize the patterns it accepts.

- Deterministic Finite Automata (DFA) - language recognizer.
- Definition: (A,S,s0,F,T), A = Alphabet, S = States, s0 = start state, F = accepting states, T = transition function). For s in S, a in A, T[s,a] = s' in S.
- Simulation on input string. Starting in s0, read one symbol at a time applying T to determine next state. The input string is accepted if the final state is in F.
- Language accepted by DFA = set of input strings that cause the DFA to end up in an accepting state.
- Textual representation of DFAs.
- Graphical representation of DFAs using Graphviz. See fsm.pdf and its input as a DOT program fsm.dot

- Non-deterministic Finite Automata (NDFA)
- Definition: Same as DFA (A,S,s0,F,T) except there can be multiple transitions from a given s in S and a in A. I.E. T may not be a function. Also, allowed are epsilon transitions. I.E. it is possible to transition to a new state without reading a symbol from the input.
- set of possible transition states and epsilon closure.
- Simulation of NDFA M: Compute S = set of states at M could be in after
reading each symbol in the input.
- Initialize S = {s0}
- Let S_i be the set of states that M could be in after reading the
first i symbols in str. S_i is computed by taking the union of
all possible transitions, and then computing the epsilon closure.
(i.e. the states that can be reached by applying epsilon transitions.

T_i = Union_{s in S_{i-1}} M->T[s,str[i]]

S_i = EpsilonClosure(T_i) - If after reading the entire string S contains an accepting state report that the input string was accepted by M.

- For any NDFA there exists an equivalent DFA which accepts the same strings, i.e. defines the same language. This means that for finite automata non-determinism does not add any more power.

- Regular expressions (language generator)
- Definition:
- Base Case: a character, symbol epsilon (empty string), the empty set.
- Recursion: If R and S are regular expressions then R|S [union], RS [concatenation], and R* [closure] are regular expressions.

- Examples: a, (a|b), (aa)*, (a|b)*abb, b*(b*ab*ab*a)*
- Constructing a NDFA that accepts the language described by a
regular expression. Will construct a NDFA with a single accepting state.
- Base case. For R = a, create a 2 state DFA with a start state s0 and an accepting state s1, and T[s0,a] = s1. transition
- Construction for recursive part of definition.
- [R|S] Add new start state with epsilon transitions into the start states of R and S. Add epsilon transitions from accepting states [no longer accepting states] of R and S to a new accepting state.
- [RS] Make the start state of R the start state of RS and connect, via an epsilon transition the accepting state [no longer an accepting state] of R to the start state of S. The accepting state of RS is the accepting state of S.
- [R*] Add new start and accepting states with an epsilon transition to the start state of R and an epsilon transition from the start state to the accepting state. Also add an epsilon transition from the accepting state [no longer an accepting state] of R to the new accepting state. Finally, add an epsilon transition from the accepting state of R to the start state of R.

- Definition:
- The awk language

- Chapter 10 of Alfred V. Aho and Jeffrey D. Ullman, Foundations of Computer Science - C Edition, W. H. Freeman and Company, 1995 (text for CS 270).
- Wikipedia entry on Finite State Machines
- Graphviz.
- Dave Hannay's FSM simulator.
- Alfred V. Aho, Brian W. Kernighan, and Peter J. Weinberger, The AWK Programming Language, Addison-Wesley, 1988.
- Gawk - GNU Project
- Wikipedia entry on AWK

- Simulate a deterministic FSM
- Construct a FSM corresponding to regular expression
- Assignment 1