Regular Expressions and Finite State Machines
Background Material
- Regular expressions (CS 265, CS 270, ECE 200)
- Finite state machines (CS 270, ECE 200)
- Chapter 10 of Alfred V. Aho and Jeffrey D. Ullman,
Foundations of Computer Science - C Edition, W. H. Freeman and
Company, 1995 (text for CS 270).
Reading
- Sec. 4.1 of the text (tokens and scanning)
Theme
Regular expressions describe patterns which can be recognized by finite
state machines (FSM). It is possible to algorithmically construct a FSM
that corresponds to a given regular expression. A FSM can be described by
a transition table (program), which can be represented by a string. A FSM
can be simulated to recognize the patterns it accepts.
Topics
- Deterministic Finite Automata (DFA) - language recognizer.
- Definition: (A,S,s0,F,T), A = Alphabet, S = States, s0 = start state,
F = accepting states, T = transition function).
For s in S, a in A, T[s,a] = s' in S.
- Simulation on input string. Starting in s0, read one symbol at a time
applying T to determine next state. The input string is accepted if the
final state is in F.
- Language accepted by DFA = set of input strings that cause the DFA to
end up in an accepting state.
- Textual representation of DFAs.
- Graphical representation of DFAs using
Graphviz. See fsm.pdf and its input as a DOT program fsm.dot
- Non-deterministic Finite Automata (NDFA)
- Definition: Same as DFA (A,S,s0,F,T) except there can be
multiple transitions from a given s in S and a in A. I.E. T may not be
a function. Also, allowed are epsilon transitions. I.E. it is possible
to transition to a new state without reading a symbol from the input.
- set of possible transition states and epsilon closure.
- Simulation of NDFA M: Compute S = set of states at M could be in after
reading each symbol in the input.
- Initialize S = {s0}
- Let Si be the set of states that M could be in after
reading the first i symbols in str. Si is computed by
taking the union of all possible transitions, and then computing the
epsilon closure. (i.e. the states that can be reached by applying
epsilon transitions.
Ti = Union_{s in Si-1} M->T[s,str[i]]
Si= EpsilonClosure(Ti)
- If after reading the entire string S contains an accepting
state report that the input string was accepted by M.
- For any NDFA there exists an equivalent DFA which accepts the same
strings, i.e. defines the same language. This means that for finite
automata non-determinism does not add any more power.
- Regular expressions (language generator)
- Definition:
- Base Case: a character, symbol epsilon (empty string), the
empty set.
- Recursion: If R and S are regular expressions then R|S
[union], RS [concatenation], and R* [closure] are regular
expressions.
- Examples: a, (a|b), (aa)*, (a|b)*abb, b*(b*ab*ab*a)*
- Constructing a NDFA that accepts the language described by a
regular expression. Will construct a NDFA with a single accepting
state.
- Base case. For R = a, create a 2 state DFA with a start state
s0 and an accepting state s1, and
T[s0,a] = s1. transition
- Construction for recursive part of definition.
- [R|S] Add new start state with epsilon transitions into
the start states of R and S. Add epsilon transitions from
accepting states [no longer accepting states] of R and S to a
new accepting state.
- [RS] Make the start state of R the start state of RS and
connect, via an epsilon transition the accepting state [no
longer an accepting state] of R to the start state of S. The
accepting state of RS is the accepting state of S.
- [R*] Add new start and accepting states with an epsilon
transition to the start state of R and an epsilon transition
from the start state to the accepting state. Also add an
epsilon transition from the accepting state [no longer an
accepting state] of R to the new accepting state. Finally,
add an epsilon transition from the accepting state of R to the
start state of R.
Outline
References and programs
Exercises
- Simulate a deterministic FSM
- Given a (non-trivial) NFSM, create a DFSM that recognises the same
langauge
- Construct a FSM corresponding to regular expression
Created: Jan. 8, 2006 by jjohnson AT cs DOT drexel DOT DOT
edu