One well-known approach is to solve the word problem using Brzozowski's derivative operation. The derivative operation is defined as follows
d(l,r) = { w | (l,w) in r ]
We describe the algorithm in the following using Haskell.
> import qualified Data.ByteString.Char8 as S
> data RE = Phi -- empty lang
> | Empty -- empty word
> | L Char -- literal
> | Choice RE RE -- r1 + r2
> | Seq RE RE -- (r1,r2)
> | Star RE -- r*
A word is a byte string.
> type Word = S.ByteString
> type Letter = Char
We follow Brzozowski's approach to define the derivative operations.
> deriv :: RE -> Letter -> RE
> deriv Phi l = Phi
> deriv Empty l = Phi
> deriv (L l') l
> | l == l' = Empty
> | otherwise = Phi
> deriv (Choice r1 r2) l = Choice (deriv r1 l) (deriv r2 l)
> deriv (Seq r1 r2) l
> | isEmpty r1 =
> let c1 = (Seq (deriv r1 l) r2)
> c2 = deriv r2 l
> in Choice c1 c2
> | otherwise = Seq (deriv r1 l) r2
> deriv (Star r) l = Seq (deriv r l) (Star r)
To establish the base case of the algorithm,
we need a function that checks whether a regexp accepts the empty word.
-- check whether regular expressions are empty
> class IsEmpty a where
> isEmpty :: a -> Bool
> instance IsEmpty RE where
> isEmpty Phi = False
> isEmpty Empty = True
> isEmpty (Choice r1 r2) = (isEmpty r1) || (isEmpty r2)
> isEmpty (Seq r1 r2) = (isEmpty r1) && (isEmpty r2)
> isEmpty (Star r) = True
> isEmpty (L _) = False
Then the word matching algorithm can be implemented as follows
> match :: Word -> RE -> Bool
> match w r =
> case S.uncons w of
> Nothing -> isEmpty r -- empty
> Just (l,w') -> w' `match` (deriv r l)
The gist of the word matching algorithm is that a non empty word (l,w') matches a regular expr r if we take away l from the word, the remainder w' matches with the derivative (deriv r l).
A sample execution:
*Main> match (S.pack "AA") (Seq (Star (L 'A')) (Star (L 'A')))
True
No comments:
Post a Comment