One well-known approach is to solve the word problem using Brzozowski's derivative operation. The derivative operation is defined as follows
d(l,r) = { w | (l,w) in r ]
We describe the algorithm in the following using Haskell.
> import qualified Data.ByteString.Char8 as S > data RE = Phi -- empty lang > | Empty -- empty word > | L Char -- literal > | Choice RE RE -- r1 + r2 > | Seq RE RE -- (r1,r2) > | Star RE -- r* A word is a byte string. > type Word = S.ByteString > type Letter = Char
We follow Brzozowski's approach to define the derivative operations.
> deriv :: RE -> Letter -> RE > deriv Phi l = Phi > deriv Empty l = Phi > deriv (L l') l > | l == l' = Empty > | otherwise = Phi > deriv (Choice r1 r2) l = Choice (deriv r1 l) (deriv r2 l) > deriv (Seq r1 r2) l > | isEmpty r1 = > let c1 = (Seq (deriv r1 l) r2) > c2 = deriv r2 l > in Choice c1 c2 > | otherwise = Seq (deriv r1 l) r2 > deriv (Star r) l = Seq (deriv r l) (Star r)
To establish the base case of the algorithm,
we need a function that checks whether a regexp accepts the empty word.
-- check whether regular expressions are empty > class IsEmpty a where > isEmpty :: a -> Bool > instance IsEmpty RE where > isEmpty Phi = False > isEmpty Empty = True > isEmpty (Choice r1 r2) = (isEmpty r1) || (isEmpty r2) > isEmpty (Seq r1 r2) = (isEmpty r1) && (isEmpty r2) > isEmpty (Star r) = True > isEmpty (L _) = False
Then the word matching algorithm can be implemented as follows
> match :: Word -> RE -> Bool > match w r = > case S.uncons w of > Nothing -> isEmpty r -- empty > Just (l,w') -> w' `match` (deriv r l)
The gist of the word matching algorithm is that a non empty word (l,w') matches a regular expr r if we take away l from the word, the remainder w' matches with the derivative (deriv r l).
A sample execution:
*Main> match (S.pack "AA") (Seq (Star (L 'A')) (Star (L 'A'))) True
No comments:
Post a Comment