# Regular Expressions and Finite State Machines

### Background Material

• Regular expressions (CS 265, CS 270, ECE 200)
• Finite state machines (CS 270, ECE 200)
• Chapter 10 of Alfred V. Aho and Jeffrey D. Ullman, Foundations of Computer Science - C Edition, W. H. Freeman and Company, 1995 (text for CS 270).

• Sec. 4.1 of the text (tokens and scanning)

### Theme

Regular expressions describe patterns which can be recognized by finite state machines (FSM). It is possible to algorithmically construct a FSM that corresponds to a given regular expression. A FSM can be described by a transition table (program), which can be represented by a string. A FSM can be simulated to recognize the patterns it accepts.

### Topics

1. Deterministic Finite Automata (DFA) - language recognizer.
• Definition: (A,S,s0,F,T), A = Alphabet, S = States, s0 = start state, F = accepting states, T = transition function). For s in S, a in A, T[s,a] = s' in S.
• Simulation on input string. Starting in s0, read one symbol at a time applying T to determine next state. The input string is accepted if the final state is in F.
• Language accepted by DFA = set of input strings that cause the DFA to end up in an accepting state.
• Textual representation of DFAs.
• Graphical representation of DFAs using Graphviz. See fsm.pdf and its input as a DOT program fsm.dot
2. Non-deterministic Finite Automata (NDFA)
• Definition: Same as DFA (A,S,s0,F,T) except there can be multiple transitions from a given s in S and a in A. I.E. T may not be a function. Also, allowed are epsilon transitions. I.E. it is possible to transition to a new state without reading a symbol from the input.
• set of possible transition states and epsilon closure.
• Simulation of NDFA M: Compute S = set of states at M could be in after reading each symbol in the input.
1. Initialize S = {s0}
2. Let Si be the set of states that M could be in after reading the first i symbols in str. Si is computed by taking the union of all possible transitions, and then computing the epsilon closure. (i.e. the states that can be reached by applying epsilon transitions.

Ti = Union_{s in Si-1} M->T[s,str[i]]
Si= EpsilonClosure(Ti)
3. If after reading the entire string S contains an accepting state report that the input string was accepted by M.
• For any NDFA there exists an equivalent DFA which accepts the same strings, i.e. defines the same language. This means that for finite automata non-determinism does not add any more power.
3. Regular expressions (language generator)
• Definition:
1. Base Case: a character, symbol epsilon (empty string), the empty set.
2. Recursion: If R and S are regular expressions then R|S [union], RS [concatenation], and R* [closure] are regular expressions.
• Examples: a, (a|b), (aa)*, (a|b)*abb, b*(b*ab*ab*a)*
• Constructing a NDFA that accepts the language described by a regular expression. Will construct a NDFA with a single accepting state.
1. Base case. For R = a, create a 2 state DFA with a start state s0 and an accepting state s1, and T[s0,a] = s1. transition
2. Construction for recursive part of definition.
• [R|S] Add new start state with epsilon transitions into the start states of R and S. Add epsilon transitions from accepting states [no longer accepting states] of R and S to a new accepting state.
• [RS] Make the start state of R the start state of RS and connect, via an epsilon transition the accepting state [no longer an accepting state] of R to the start state of S. The accepting state of RS is the accepting state of S.
• [R*] Add new start and accepting states with an epsilon transition to the start state of R and an epsilon transition from the start state to the accepting state. Also add an epsilon transition from the accepting state [no longer an accepting state] of R to the new accepting state. Finally, add an epsilon transition from the accepting state of R to the start state of R.

### Exercises

• Simulate a deterministic FSM
• Given a (non-trivial) NFSM, create a DFSM that recognises the same langauge
• Construct a FSM corresponding to regular expression
Created: Jan. 8, 2006 by jjohnson AT cs DOT drexel DOT DOT edu