Link Search Menu Expand Document

Wisteria - Parser


As stated in the intro, Wisteria’s parser generator is based on Krishnaswami and Yallop’s paper A Typed Algebraic Approach to Parsing . This approach is similar to other parser combinator libraries commonly used in functional languages like Haskell and OCaml. The library they describe and implement abstracts parsers as functions without the messy state intrinsic in the automata used in more “classic” parsing algorithms.

My aim in implementing this parser generator is to bridge the gap between the abstract functional interface provided in the paper, and the more standard Backus-Naur form used to describe parsing grammars. However, as I will describe below, this is not always straightforward.

The main innovation that Krishnaswami and Yallop present is a type system which applies to context free grammars such that any well-typed grammar is unambiguous. This provides an easy way of finding any parsing conflict, and rejects any grammars that may be ambiguous. Specifically, it catches the following cases:

  1. Disjunctive non-determinism: When a grammar has an alternation with identical prefixes:

    \[\begin{align*} S := \ & a b\\ \mid\ & ac \end{align*}\]
  2. Sequential non-determinism: When the order in which a parser should reduce a sequence of terms is ambiguous:

    \[S := (a*)(a*)\]
  3. Left-Recursion: Whenever a grammar terms is recursive immediately on the left. This will make the generated recursive descent parser loop indefinitely:

    \[\begin{align*} S := \ & Sa\\ \mid\ & \epsilon \end{align*}\]

Moreover, the properties that the type system guarantees allows for the parsers generated from a well-typed grammar to have linear-time performance, while also being non-backtracking using a single token of lookahead.

From A Typed, Algebraic Approach to Parsing

image-20230920124440867

The original library that Krishnaswami and Yallop developed is written in OCaml and requires the user to manually build in OCaml the parsers using the parser combinators described in their library. This API is on the whole very cumbersome and does not capture well the expressions’ structures behind the grammar being defined. The user needs to build awkward chains of function applications (these representing parsers) while keeping track of which expressions are recursive which is not alway immediately apparent. Overall, it makes it very hard to tell the structure of the grammar being defined, especially when scaled up to the complexity of a full programming language like C.

Wisteria, thus, not only implements this library in Rust, but also provides a more intuitive and familiar front end to define a well-typed grammar, to then generate a subsequent parser. The Wisp grammar definition language is this front end that allows users to define their language’s grammar in Backus-Naur Form (BNF), which Wisteria then converts into a grammar that is type-checked for ambiguities.

However, the typed, algebraic parsers defined by Krishnaswami and Yallop are recursive-descent parsers, which places certain restrictions on the kinds of grammars allowed by the type system. Namely, grammars are restricted to being left-factored and non-left-recursive. This clashes strongly with BNF and the grammars allowed by LALR parsers which do permit left recursion. Therefore, a current limitation of Wisteria is that the user needs to transform their grammar to fit these restrictions. (Future development of Wisteria will focus on automatically making these transformations where possible).