Link Search Menu Expand Document

Wisteria - Wisp


Wisp is Wisteria’s frontend for generating a parser. A Wisp file is comprised of grammar term definitions that closely resemble Backus-Naur From. Specifically we use the following form, noting that any whitespace is always ignored:

name -> return_type :
  pattern { Rust construction }
| pattern { Rust construction }
...

Essentially, we define a grammar term’s name and the returning type of its production. We then give a series of patterns comprised of tokens or other grammar terms such that whenever we parse such a term, we can use the contents of the tokens parsed to produce a value matching the return type. Finally, we can add a $ prefix to the grammar term to indicate that this will be the starting term to be parsed.

To see more clearly how this is done let’s use the example we used in the explanation for Wisl. In this example, we had three different tokens: LPAREN, RPAREN and NUMBER(i32). Now we want to be able to parse nested parentheses and return the number inside:

$ parens -> i32 :
  LPAREN <p: parens> RPAREN { p }
| <n: NUMBER> { n }

In Wisp, when we want to match a token, we write this all in uppercase letters such as LPAREN or NUMBER. Whenever we want to match a grammar term (including a recursive match) we write this in lowercase. Note that the name of all the grammar terms must be in lowercase. Finally, we can use angle brackets <variable: pattern>, to bind a pattern of tokens to a variable that we can use inside the Rust code production. Thus, <n: NUMBER> binds the content of the NUMBER token being passed which we return later. Similarly, in the first pattern, we bind the return value of the inner parens to p and return it.

Now say we wish to extend our grammar to include any valid set of parentheses, i.e. we wish to allow ((1)(2)) as well. We could extend the grammar above like so:

$ parens -> Vec<i32> :
  LPAREN <p: parens> RPAREN { p }
| <p1: parens> LPAREN <p2: parens> RPAREN { 
    p1.append(&mut p2); 
    p1 
  }
| <n: NUMBER> { vec![n] }

Here, we now return values of type Vec<i32> so we change the NUMBER production appropriately. Also, for our new pattern we append the results of the second set of parentheses p2 to the first set p1. This type of grammar would normally be allowed by a LALR parser such as yacc. However, as mentioned in the Parser page, the parsers produced by Wisteria are recursive-descent parsers which do not allow left recursion. If this parser were allowed (although the type checker will prevent its existence), whenever it would attempt to parse the second pattern it would get stuck trying to infinitely parse a parens :

1 (2):
  => parens LPAREN parens RPAREN
  => (parens LPAREN parens RPAREN) LPAREN parens RPAREN
  => ((parens LPAREN parens RPAREN)) LPAREN parens RPAREN) LPAREN parens RPAREN
  => ...

This presents a big problem as it prevents any left-recursive grammars which are very common in programming languages in operator expressions such as addition and comparison. Wisp does provide the Kleene Star operator from regular expressions which would fix our problem above:

$ parens -> Vec<i32> :
  <ps: (LPAREN parens RPAREN)*> { ps.into_iter().flatten().collect() }
| <n: NUMBER> { vec![n] }

When a Kleene Star pattern is bound to a variable, here ps, the resulting type of the variable is a Vec of whatever is inside, so here a Vec<Vec<i32>> which is why we flatten ps.

A more complicated example of left recursion would be parsing arithmetic operations. So we assume we have a type in our Rust file:

pub enum Operation {
  Val(i32),
  Add(Box<Addition>, Box<Addition>),
  Mul(Box<Addition>, Box<Addition>)
}

And say, our lexer also includes the ADD and MUL tokens. Ideally we would want to describe our grammar like so:

$ addition -> Box<Operation> :
  <o1: addition> ADD <o2: multiplication> { Box::new(Operation::Add(o1, o2)) }
| <o: multiplication> { o }

multiplication -> Box<Operation> :
  <o1: multiplication> MUL <o2: NUMBER> {
    Box::new(Operation::Mul(o1, Box::new(Operation::Val(o2))))
  }
| <n: NUMBER> { Box::new(Operation::Val(n)) }

However, both addition and multiplication are left-recursive. Instead, what we can do is parse the expressions as being right-recursive (i.e. with a Kleene Star), and then unfold the Vec<Operation> produced to be left recursive:

$ addition -> Box<Operation> :
  <o1: multiplication> <os: (ADD multiplication)*> { 
    let mut result = o1;
    for o in os {
      result = Box::new(Operation::Add(result, o));
    }
    result
  }

multiplication -> Box<Operation> :
    <n1: NUMBER> <ns: (MUL NUMBER)*> { 
    let mut result = Box::new(Operation::Value(n1));
    for n in ns {
      result = Box::new(Operation::Mul(result, Box::new(Operation::Value(n))));
    }
    result
  }

This transformation is quite tedious, and the original library by Krishnaswami and Yallop includes a construct that automatically generates left recursive operation expressions. In future updates to Wisteria these will also be included.

Finally, Wisteria also requires grammars to be left-factored. This means that prefixes of patterns in alternations cannot be the same. Take the following example where we want to parse a unary negative operator that works on both integers and floats (assume the appropriate lex tokens and Rust enums exist):

negative_expr -> Negative :
  MINUS <i: INTEGER> { Negative::Int(i) }
| MINUS <f: FLOAT> { Negative::Float(f) }
  

Although this example is a bit contrived, it serves to illustrate the point that such a grammar does not pass the type check and must therefore be transformed:

& negative_expr -> Negative :
  MINUS <v: value> { Negative(v) }

value -> Value :
  <i: INTEGER> { Value::Int(i) }
| <f: FLOAT> { Value::Float(f) }

Thus, when writing our grammars but also the underlying Rust datatypes we want to construct, it is important to keep in mind the restrictions that Wisteria’s grammar type system imposes.