moon-peg 0.1.0

PEG packrat parser with direct and indirect left recursion support

Released 2016-07-07.

To install, run:

haxelib install moon-peg 0.1.0

See using Haxelib in Haxelib documentation for more information.

Maintainerprofound7
Websitehttps://github.com/profound7/moon-peg
Current version0.1.0
StatisticsInstalled 13 times
LicenseMIT
Tags cross, parser, utility

README.md

Moon PEG

The moon-peg lib is a parser expression grammar (PEG) library. It also has parser classes that you can extend to build a parser.

Parser Expression Grammar

moon.peg.grammar.Parser

PEG packrat parser with direct and indirect left recursion support. Recursive descent can't handle left recursive grammar. This parser is like recursive descent, but uses some techniques from the paper by Warth, Douglass, Millstein, such as memoization, to support both direct and indirect left recursion.

There's only ordered choice and greedy choice. No unordered choice (context-free grammar).

There's 2 ways of using this class:

  1. Use Parser class without class parameters. This allows you to load any grammar files at runtime.

`haxe var p = new Parser(peg); var ast = p.parse(codes);`

  1. Use Parser class with String parameter. This uses haxe's genericBuild, so the grammar file is processed at compile-time. You do not need the grammar file after compilation.

`haxe var p = new Parser<"data/lisp.peg">(); var ast = p.parse(codes);`

Grammar Examples

This is an example of an expression grammar that you might find online or in textbooks.

S = AB*
AB = A B
A = 'a'+
B = 'b'+

Direct and indirect recursion example:

$a = "x" / "y";

// direct recursion
s = s a / a;

// indirect recursion
r = t;
t = r a / a;

For another example, see LispTest.hx, LispParser.hx, and its corresponding grammar file Lisp.peg.

Grammar Rules

The ParseTree Enum
ParseTreeNotes
EmptyNo value
Value(v:String)terminal value
Tree(v:ParseTree)single value (for pass-thru)
Multi(a:Array<ParseTree>)multi-values (for seq, A*, A+, etc...)
Node(id:String, v:ParseTree)with child nodes
Basic Operations
CategoryExampleNotes
RuleS = A 
Regular Expression[A-Z]+ 
Rule ReferenceA 
EmptyepsilonAlways succeeds. Matches empty string.
Back Reference2Any integer. 2 refers to 2nd matched item
End of RuleA;Semicolons are automatically inserted. In ambiguous cases, you need to manually add the semicolon.
Ordered ChoiceA / BReturns the first success
Greedy Choice*A | BBoth evaluated. Result that consumes more is used.
SequenceA B 
Grouping(A | B) CSequence of (A or B) followed by C.
Zero or MoreA*Greedy match
One or MoreA+Greedy match
OptionalA? 
Look-ahead&ADoes not consume match
Negative Look-ahead!ADoes not consume match
Special Operations
CategoryExampleNotes
Hide@AMatches and consumes. If successful, return Empty instead.
Pass$AUnwrap result of A. i.e. $(Node("X", v)) ==> v
Anon%APrevent creation of Node to current rule.
TransformA:XWrap result of A to a Node
TransformA:","Flatten result to value separated by a String
TransformA:nIf n=0, return original result. Otherwise, return nth item that's matched
TransformA:(1 0)Create a Multi where the numbers are the indexes of the result of A
TransformA:$fCalls a custom transformation function
Anon

Prevent creation of Node to current rule. $A = B ==> A = %B

Usage: `A = B%C`
  • if B matches ==> Node("A", resultOfB)
  • if C matches ==> resultOfC
Transform

You can define transformations of nodes within the grammar.

Eg. A:(1 0) You can use this to resequence result. If resultOfA is Multi(["abc", "123"]) then,

  • A:(2 1) ==> Multi(["123", "abc"])
  • A:(2 1 0) ==> Multi(["123", "abc", "abc", "123"]) // 0 is "abc", "123"
  • A:(2 1 2 1) ==> Multi(["123", "abc", "123", "abc"])
  • A:(2 "-" 1) ==> Multi(["123", "abc-123", "abc"])

Transformations can be nested. The result of one transformation, can be further transformed, like in the following rule: - A:(0:"-" 1)

Todos

  • Line/character position info.
  • Refactor the macro that generates the parser at compile-time (ugly code done long ago).
  • Unit tests
  • Optimize

Contributing

Feel free to contribute. Contributions, bug fixes, and feature requests are welcomed.

Credits

  • Parser moon.peg.grammar (reference, ideas) Warth, Douglass, Millstein: http://www.vpri.org/pdf/tr2007002_packrat.pdf Mark Engelberg: https://github.com/Engelberg/instaparse

License

MIT