One well-known approach is to solve the word problem using Brzozowski's derivative operation. The derivative operation is defined as follows

d(l,r) = { w | (l,w) in r ]

We describe the algorithm in the following using Haskell.

> import qualified Data.ByteString.Char8 as S

> data RE = Phi -- empty lang

> | Empty -- empty word

> | L Char -- literal

> | Choice RE RE -- r1 + r2

> | Seq RE RE -- (r1,r2)

> | Star RE -- r*

A word is a byte string.

> type Word = S.ByteString

> type Letter = Char

We follow Brzozowski's approach to define the derivative operations.

> deriv :: RE -> Letter -> RE

> deriv Phi l = Phi

> deriv Empty l = Phi

> deriv (L l') l

> | l == l' = Empty

> | otherwise = Phi

> deriv (Choice r1 r2) l = Choice (deriv r1 l) (deriv r2 l)

> deriv (Seq r1 r2) l

> | isEmpty r1 =

> let c1 = (Seq (deriv r1 l) r2)

> c2 = deriv r2 l

> in Choice c1 c2

> | otherwise = Seq (deriv r1 l) r2

> deriv (Star r) l = Seq (deriv r l) (Star r)

To establish the base case of the algorithm,

we need a function that checks whether a regexp accepts the empty word.

-- check whether regular expressions are empty

> class IsEmpty a where

> isEmpty :: a -> Bool

> instance IsEmpty RE where

> isEmpty Phi = False

> isEmpty Empty = True

> isEmpty (Choice r1 r2) = (isEmpty r1) || (isEmpty r2)

> isEmpty (Seq r1 r2) = (isEmpty r1) && (isEmpty r2)

> isEmpty (Star r) = True

> isEmpty (L _) = False

Then the word matching algorithm can be implemented as follows

> match :: Word -> RE -> Bool

> match w r =

> case S.uncons w of

> Nothing -> isEmpty r -- empty

> Just (l,w') -> w' `match` (deriv r l)

The gist of the word matching algorithm is that a non empty word (l,w') matches a regular expr r if we take away l from the word, the remainder w' matches with the derivative (deriv r l).

A sample execution:

*Main> match (S.pack "AA") (Seq (Star (L 'A')) (Star (L 'A')))

True

## No comments:

Post a Comment