1 Intro
When I was playing around inside Mark Jones' gofer sources in the early 90s I saw this piece of commented C code.
The code was impenetrable… at first.
But the comment was elegant and beautiful and cleared up the code nicely.. on careful reading.
Why is it needed?
Well gofer is parsed with a conventional yacc parser.
Problem is gofer (like Haskell) allows for runtime declared binary operators with
3 fixities infixr, infixl and infix
So
infixr 8 ^
infixl 7 *
means ^ is at precedence level 8 and right-associates
whereas * is at level 7 and left associates
Plain infix means non-associative; ie its a syntax error to use the operator associatively.
So the yacc parser collects all operators with one single precedence and associativity treated as left and then regroups the tree based on the fixity declarations.
Mark Jones' original gofer Gofer with pugofer modifications
I give this here as a beautiful example of how functional programming can clear up arcane imperative programming
Why is it needed?
Well gofer is parsed with a conventional yacc parser.
Problem is gofer (like Haskell) allows for runtime declared binary operators with
3 fixities infixr, infixl and infix
So
infixr 8 ^
infixl 7 *
means ^ is at precedence level 8 and right-associates
whereas * is at level 7 and left associates
Plain infix means non-associative; ie its a syntax error to use the operator associatively.
So the yacc parser collects all operators with one single precedence and associativity treated as left and then regroups the tree based on the fixity declarations.
Mark Jones' original gofer Gofer with pugofer modifications
I give this here as a beautiful example of how functional programming can clear up arcane imperative programming
2 C Code
static Cell local TidyInfix(e) /* convert InfixExpr to Expr */ Cell e; { /* :: InfixExpr */ Cell s = NIL; /* :: TidyStack */ Syntax sye = APPLIC; /* Syntax of op in e (init unknown) */ Syntax sys = APPLIC; /* Syntax of op in s (init unknown) */ Cell temp; while (nonNull(tl(e))) { if (isNull(s)) { s = e; e = arg(fun(s)); arg(fun(s)) = NIL; sys = sye; sye = APPLIC; } else { if (sye==APPLIC) { /* calculate sye (if unknown) */ sye = syntaxOf(textOf(fun(fun(e)))); if (sye==APPLIC) sye=DEF_OPSYNTAX; } if (sys==APPLIC) { /* calculate sys (if unknown) */ sys = syntaxOf(textOf(fun(fun(s)))); if (sys==APPLIC) sys=DEF_OPSYNTAX; } if (precOf(sye)==precOf(sys) && /* amb */ (assocOf(sye)!=assocOf(sys) || assocOf(sye)==NON_ASS)) { ERROR(row) "Ambiguous use of operator \"%s\" with \"%s\"", textToStr(textOf(fun(fun(e)))), textToStr(textOf(fun(fun(s)))) EEND; } else if (precOf(sye)>precOf(sys) || /* shift */ (precOf(sye)==precOf(sys) && assocOf(sye)==LEFT_ASS)) { temp = arg(fun(e)); arg(fun(e)) = s; s = e; e = temp; sys = sye; sye = APPLIC; } else { /* reduce */ temp = arg(fun(s)); arg(fun(s)) = arg(e); arg(e) = s; s = temp; sys = APPLIC; /* sye unchanged */ } } } e = hd(e); while (nonNull(s)) { temp = arg(fun(s)); arg(fun(s)) = e; e = s; s = temp; } return e; }
3 Explanation
/* expressions involving a sequence of two or more infix operator symbols * are parsed as elements of type: * InfixExpr ::= [Expr] * | ap(ap(Operator,InfixExpr),Expr) * * thus x0 +1 x1 ... +n xn is parsed as: +n (....(+1 [x0] x1)....) xn * * Once the expression has been completely parsed, this parsed form is * `tidied' according to the precedences and associativities declared for * each operator symbol. * * The tidy process uses a `stack' of type: * TidyStack ::= ap(ap(Operator,TidyStack),Expr) * | NIL * when the ith layer of an InfixExpr has been transferred to the stack, the * stack is of the form: +i (....(+n NIL xn)....) xi * * The tidy function is based on a simple shift-reduce parser: * * tidy :: InfixExpr -> TidyStack -> Expr * tidy [m] ss = foldl (\x f-> f x) m ss * tidy (m*n) [] = tidy m [(*n)] * tidy (m*n) ((+o):ss) * | amb = error "Ambiguous" * | shift = tidy m ((*n):(+o):ss) * | reduce = tidy (m*(n+o)) ss * where sye = syntaxOf (*) * (ae,pe) = sye * sys = syntaxOf (+) * (as,ps) = sys * amb = pe==ps && (ae/=as || ae==NON_ASS) * shift = pe>ps || (ps==pe && ae==LEFT_ASS) * reduce = otherwise * * N.B. the conditions amb, shift, reduce are NOT mutually exclusive and * must be tested in that order. * * As a concession to efficiency, we lower the number of calls to syntaxOf * by keeping track of the values of sye, sys throughout the process. The * value APPLIC is used to indicate that the syntax value is unknown. */
4 Yacc usage
And this is how the yacc usage of tidyInfix occurs
exp : opExp COCO sigType {$$ = ap(ESIGN,pair($1,$3));} | opExp {$$ = $1;} | error {syntaxError("expression");} ; opExp : pfxExp {$$ = $1;} | pfxExp op pfxExp {$$ = opapd($1, $2, $3);} | opExp0 {$$ = tidyInfix($1);} ; opExp0 : opExp0 op pfxExp {$$ = opap($1, $2, $3);} | pfxExp op pfxExp op pfxExp {$$ = opap(opap(singleton($1), $2, $3),$4,$5);} etc
5 Haskell (Modified gofer)
On grappling with it I felt it would be well to goferize (haskellize) it.
Here is the literate source I wrote as an executable — not exactly same version but and analogous intention to mpj's original.
Here is the literate source I wrote as an executable — not exactly same version but and analogous intention to mpj's original.
1. Preliminaries ---------------- This script demonstrates how to write a simple shift-reduce parser. In the interests of simplicity, we handle expressions with single-digit operands and operators from the set {^, *, +, <, &} The parser will build trees of the following type: > ctype Exp where > S : Int -> Exp > Pow, Mul, Add, Less, And : Exp -> Exp -> Exp The precedences as we may expect are as follows: > opPrecAl = [('^', 5), ('*', 4), ('+', 3), ('<', 2), ('&', 1)] We must also handle associativities of operators. Operators may be left, right or non-associative. The following enumerated type captures this: > ctype Associativity where > L, R, N : Associativity The operators have the following associativities: > opAssocAl = [('^', R), ('*', L), ('+', L), ('<', N), ('&', L)] Given a character (that better be an operator), the following functions get its precedence and associativity: > precedence : Char -> Int > associativity : Char -> Associativity > precedence = assoc.opPrecAl > associativity = assoc.opAssocAl 2. A rough and simple distinction between operator and operand -------------------------------------------------------------- > isOperator.c = c `elem` "^*+<&" > isOperand.c = not.(isOperator.c) 3. Shift Reduce Parsing Basics ------------------------------ 4 basic actions may be distinguished - shifting an operator into the stack, - reducing the top two elements of the output by the top of the stack, - shifting an operand onto the output (we'll call this SOp - Shift Operand) - error - an ill-formed expression was detected Since this is central to our enterprise we capture it into a type > ctype Action where > Shift, Reduce, Error, Sop : Action 4. Relation between Parse-Actions and Associativities ----------------------------------------------------- In order to see the close connection between parsing actions and associativities for the moment, let us just consider expressions involving only 1 kind of operator eg. "1^2^3" or "1+2+3" or "1<2<3" And consider further the situation when 1 and 2 have been output and one operator is on the stack and the next one is seen. ie we have input = "^3" stk = ['^'] output = [S.1, S.2] because the '^' is right-associative, the '^' in the input takes precedence the '^' in the stack and therefore it gets pushed. ie we must shift However if we look at "1+2+3" in a similar configuration, we see that a reduction is called for. Finally, when the second '<' in "1<2<3" is seen it is a syntax-error. We can therefore capture our discoveries in the association-list: > assocActionAl = [(R, Shift), (L, Reduce), (N, Error)] 5. Comparing Operators ---------------------- Now let io be the input operator, so be the stack-top operator, and continuing with our temporary assumption that the expressions involve only one kind of operator, we see that precedences of io and so must be same. Even the associativities must be the same. The precedences in fact don't count. It is the associativities alone that (because of our temporary assumption) determine the parsing actions. We capture this information in > sameCmp : Char -> Char -> Action > sameCmp.io.so = assoc.assocActionAl.ai > where ai = associativity.io -- same as associativity.so Now we are in a position to generalise to expressions with an arbitrary mix of operators. cmp takes an input operator, and the stack-top operator and returns the appropriate action. > cmp : Char -> Char -> Action > cmp.io.so > | pi > ps = Shift > | pi < ps = Reduce > | pi == ps && ai /= as = Error > | otherwise = sameCmp.io.so > where pi = precedence.io > ps = precedence.so > ai = associativity.io > as = associativity.so 6. State Transformation ----------------------- We now come to shift-reduce parsing proper. We observe that the computation proceeds in the context of a state consisting of three components: an input, a stack, and an output. > type ParseState = ([Char], [Char], [Exp]) The four parser actions transform one such state into another > type STrans = ParseState -> ParseState -- state-transformer The four actions are > acSop, acShift, acReduce, acError : STrans When an operand like '3' is output it must be made into the integer 3 and enclosed by the leaf-constructor for expression to make S.3 . Note: op better be an operand > acSop.(op::input, stk, out) = (input, stk, f.op :: out) > where f.x = S.(ord.x - ord.'0') Shift is simple, it moves the top of the input into the stack. Note: op better be an operator > acShift.(op::input, stk, out) = (input, op::stk, out) Before we can write reduce we must observe that a '+' in the input becomes a Add in the output and so on. > opCnstrAl = [('^', Pow), ('*', Mul), ('+', Add), ('<', Less), ('&', And)] which we use in the reduce action as follows: > acReduce.(input, op::stk, r::l::out) = (input, stk, rOpl :: out) > where rOpl = (assoc.opCnstrAl.op).l.r > acError.s@(input, stk, output) = error.("acError " ++ (show.s)) 7. Finally the parser! ---------------------- > parse : String -> Exp > parse.input = p.initState > where initState = (input, [], []) > p.s = case s of > ([], [], [result]) -> result > ([], optor::stk, out) -> p.(acReduce.s) > (i::input, [], out) > | isOperand.i -> p.(acSop.s) > | otherwise -> p.(acShift.s) > (i::input, o::stk, out) > | isOperand.i -> p.(acSop.s) > | otherwise -> case cmp.i.o of > Shift -> p.(acShift.s) > Reduce -> p.(acReduce.s) > Error -> p.(acError.s) > Sop -> error."Should not happen" 8. Some Other Needed Stuff -------------------------- > assoc.[].x = error.("assoc: Not found " ++ show.x) > assoc.((x,y)::al).x' > | x == x' = y > | otherwise = assoc.al.x' > instance Text.Associativity > instance Eq.Associativity where > x == y = gord.x == gord.y > instance Text.Exp > instance (Text.a, Text.b, Text.c) => Text.(a,b,c) > primitive gord "primCharToInt" : a -> Int ctype Exp where S : Int -> Exp Pow, Mul, Add, Less, And : Exp -> Exp -> Exp > evaluate.(S.x) = x > evaluate.(Add.l.r) = evaluate.l + evaluate.r > evaluate.(Mul.l.r) = evaluate.l * evaluate.r
Missing definitions starting with ctype mean the Haskell won't compile.
ReplyDeleteThats gofer with modifications: http://blog.languager.org/2014/09/pugofer.html
ReplyDeleteBasically ctype is data with GADT syntax – this was done 10 years before GADTs ie 25 years ago.
Also another pervasive changes that will be needed for haskell:
Interchange ':' ←→ '::'
Replace 'f.x' with 'f x'
If you wish I could haskellize it but need a few days.. travelling now