Parsing and evaluating expressions using ANTLR in Swift

Dmytro Anokhin
10 min readJun 11, 2020

--

Parsing and evaluating code is underrated topic in modern development. We have all the programming languages ready for us. However, it is not that hard to create a custom language. Recently I found ANTLR, a powerful tool for parsing text, available in Swift. So I decided to describe how to use it to build a calculator.

Why would there be an article how to make a calculator app? Well, this is not a simple calculator. It takes input as a plain text, parses it into a set of instructions, and evaluates them. Similar to how interactive programming environments do.

This topic introduces approaches and tools used to build a programming languages: creating a grammar, parsing text, and evaluating instructions.

The goal is to process arbitrary text, identify numbers and arithmetic operations, and evaluate them (and handle potential errors along the way). We could analyze characters stream and evaluate them one by one. But the key is to replicate what a programming environment would do:

  • Parse the code and transform it into instructions;
  • Evaluate instructions in a virtual machine.

Parsing text

The first step would be parsing text input into intermediate representation. Preface to this must be defining a grammar and an intermediate representation.

Grammar

Natural languages, English, Chinese, Ukrainian, etc., are ambiguous, individual words and phrases can mean different things. Computer languages are build in a way to avoid this ambiguity.

Formal grammar of a computer language can be described using notation, a meta language. One example you can find in the Swift Language Reference.

Backus–Naur form or Backus normal form (BNF) describes language as a set of rules. For the demo will use Extended Backus–Naur form:

NUMBER
: ('0' .. '9') + ('.' ('0' .. '9') +)? ;
expression
| expression ('+' | '-') expression
| ('+' | '-')* NUMBER ;

Number is defined as one or more digits (characters in range ‘0’ .. ‘9’); optional dot followed by one or more digits.

Expression is either:

  • expression plus or minus another expression;
  • a number, optionally prefixed by ‘+’’ or ‘-’.

This way we described addition and subtraction: 38 + 5 — 1.

Intermediate representation

Programming languages use data structure called Abstract Syntax Tree (AST) where nodes represent constructions in code.

AST is a binary tree, where (for arithmetic expression):

  • nodes contain operations;
  • leafs contain numbers (arguments).

Note: the moment we define what out tree is it’s no longer abstract, but I will still use term AST.

"38 + 5 - 1"      -
/ \
+ 1
/ \
38 5

AST is used for evaluation by traversing it in certain order. This is why it is important to construct the tree with correct order of operations. What would be the result of 38 + 2 * 2?

Depends on order of evaluation. Left to right (addition first, multiplication last) the result will be 80. However, algebraic expressions define order where multiplication operation precedes addition, so the result must be 42. AST for the expression:

"38 + 2 * 2"

+
/ \
38 *
/ \
2 2

Parser

Parser consumes input and produces AST. Before parsing, a sequence of characters must be transformed into a sequence of tokens. This is done by using lexer to perform lexical analysis:

"38 + 5–1"(number 38)
(symbol +)
(number 5)
(symbol -)
(number 1)

Lexer uses a set of lexer rules. This are the rules we defined in our grammar.

Parser also uses rules. There are two strategies to build a parser:

  • Left to right scan, using Left most derivation — LL;
  • Left to right scan, using Right most derivation — LR.

LL and LR scan a sequence from left to right. The difference is how they apply rules.

LL is also called top-down. For our grammar:

expression
| expression ('+' | '-') expression
| ('+' | '-')* NUMBER ;

and 38 + 5 — 1 string, it will first match expression as expression (‘+’ | ‘-’) expression. Than recursively match expression as NUMBER for 38. Repeat.

      - (1)
/ \
+ (2) 1 (5)
/ \
38 (3) 5 (4)

RR parser matches in bottom-up order, that is NUMBER first, than expression (‘+’ | ‘-’) expression, and expression last:

      - (5)
/ \
+ (2) 1 (4)
/ \
38 (1) 5 (3)

The benefit of LL parser is ease of implementation. Recursion is a disadvantage, that leads to growing time complexity and performance. On the other hand, LR parsers can be implemented very efficiently.

For our example we choose convenience over performance because we don’t want to code parser (and lexer) ourselves. ANTLR can generate LL parser for us.

ANTLR

ANTLR stands for ANother Tool for Language Recognition. This is a powerful parser generator, initially released back in 1992.

https://www.antlr.org

ANTLR provides ready grammars for a lot of languages: https://github.com/antlr/grammars-v4/

The Calculator app includes ANTLR and the grammar code ready to run, so no setup should be needed. Later, if you decide to explore the tool, run existing grammars, or make your grammar, you need to setup ANTLR on your machine.

The grammar in Extended Backus–Naur form:

grammar Arithmetic;root : expression EOF ;expression
: expression (MULT | DIV) expression
| expression (PLUS | MINUS) expression
| (PLUS | MINUS)* number ;
number : NUMBER ;NUMBER : ('0' .. '9') + ('.' ('0' .. '9') +)? ;PLUS : '+' ;
MINUS : '-' ;
MULT : '×' ;
DIV : '÷' ;
WS : [ \r\n\t] + -> skip ;

The grammar is in Arithmetic.g4 file. I included the file in the project for convenience, but it is not necessary to ship it with the app. Using ANTLR we can generate the parser code:

$ antlr4 -Dlanguage=Swift -message-format gnu -o Autogen Arithmetic.g4

This generates files and puts them into Autogen folder. We only need to include .swift files in our target.

We can also test grammar with ANTLR:

$ antlr4 Arithmetic.g4 && javac Arithmetic*.java && echo "1 + 2 + 3" | grun Arithmetic expression -tree(expression (expression (expression (number 1)) + (expression (number 2))) + (expression (number 3)))

And even render the AST to get better overview:

$ antlr4 Arithmetic.g4 && javac Arithmetic*.java && echo "1 + 2 + 3" | grun Arithmetic expression -gui
AST for 1+2+3 expression

How cool is that! Now we can use the parser in code:

let input = "1+2+3"let inputStream = ANTLRInputStream(input)
let lexer = ArithmeticLexer(inputStream)
let tokenStream = CommonTokenStream(lexer)
let parser = try ArithmeticParser(tokenStream)
let expressionContext = try parser.expression()

Here expressionContext object is the AST for expression token.

Let’s summarize the first step:

  • The input is text, the output is Abstract Syntax Tree;
  • We define the grammar in Extended Backus–Naur form;
  • AST contains operations and arguments in correct order;
  • Lexer transforms character input into tokens, Parser consumes tokens and produces AST;
  • The code for Lexer and Parser can be generated.

Evaluating instructions

Now the question is how to evaluate the AST from the previous step.

First thought that can come in mind is to create a routine per node and call it recursively. For "1 + 2" the routine can look like this:

func evaluate() -> Double {
left.evaluate() + right.evaluate()
}

This is easy to implement and should work for a simple gramma like ours. But for more complex grammars there is a fundamental flaw: the call stack will grow exponentially with the depth of recursion. It will prove impossible to implement recursive algorithms, such as merge sort, if we would to build a scripting language for instance.

Instead, for our example, we will generate code (instructions) from the AST and execute it on a stack machine.

The stack machine will evaluate instructions one by one, by taking arguments from the top of the stack, and pushing the result onto the stack.

For instructions I choose to use enum because there’s not many. For more complex set it would make sense to use a struct for each instruction.

enum Instruction {
/// Load (push) numeric value onto the stack
case load(_ number: Double)
/// Perform arithmetic operation
case arithmetic(_ operator: (Double, Double) -> Double)
/// Negate the sign of a number
case negate
}
extension Instruction { func execute(_ stack: inout [Double]) {
switch self {
case .load(let number):
stack.append(number)
case .arithmetic(let `operator`):
let arg1 = stack.removeLast()
let arg2 = stack.removeLast()
stack.append(`operator`(arg2, arg1))

case .negate:
let arg = stack.removeLast()
stack.append(-arg)
}
}
}

The virtual machine is a single routine that iterates over instructions:

struct VirtualMachine {    func execute(_ instructions: [Instruction]) -> Double {
var stack: [Double] = []
for instruction in instructions {
instruction.execute(&stack)
}
return stack.removeLast()
}
}

The last item in the stack is the result of expression. And real implementation also needs error handling.

Instructions for "1 + 2" will look like this:

[.load(1.0), .load(2.0), .arithmetic(+)]

To generate instructions we need to traverse AST following this rules:

  • For a node with a number generate load instruction;
  • For a node with an expression: generate arithmetic instruction with respective operator;
  • or negate instruction if this is a - node that also contains a number.

We can traverse AST in depth-first order using ArithmeticBaseListener. This is a generated class for our grammar. Because this is an abstract class we need to write implementation for it:

final class Listener : ArithmeticBaseListener {    private(set) var instructions: [Instruction] = []    override func exitExpression(_ ctx: ExpressionContext) {
// Generate instruction for expression
}
override func enterNumber(_ ctx: NumberContext) {
// Generate load instruction
}
}

Now taking our AST in a form of expressionContext object from previous step:

let listener = Listener()
try ParseTreeWalker().walk(listener, expressionContext)

Listener traverses AST in depth-first order. We use exit method for expressions and enter for numbers to reverse the order of instructions and have load preceeding artithmetic operations:

Let’s take a look at some examples. I will provide an expression, AST, and traversal order for it.

1+2

  +
/ \
1 2

Traversal:

  1. Enter + expression;
  2. Enter 1, generate .load(1.0) instruction;
  3. Enter 2, generate .load(2.0);
  4. Exit + expression, generate .arithmetic(+).

Result:

[.load(1.0), .load(2.0), .arithmetic(+)]

1+2×3

   +
/ \
1 ×
/ \
2 3

Traversal:

  1. Enter +;
  2. Enter 1, generate .load(1.0);
  3. Enter ×;
  4. Enter 2, generate .load(2.0);
  5. Enter 3, generate .load(3.0);
  6. Exit ×, generate .arithmetic(*);
  7. Exit +, generate .arithmetic(+);

Result:

[.load(1.0), .load(2.0), .load(3.0), .arithmetic(*), .arithmetic(+)]

Can you see the order of arithmetic operations? Let’s take a look how the virtual machine will execute this sequence:

  • Virtual machine executes instruction one by one. After executing first three instructions [.load(1.0), .load(2.0), .load(3.0)] the stack looks like this: [1.0, 2.0, 3.0].
  • .arithmetic(*) instruction takes top two arguments from the stack (last elements in the array): 2.0, 3.0. Multiplies and pushes back to the stack: [1.0, 6.0].
  • .arithmetic(+) takes next two arguments from the stack. Final stack looks like this: [7.0]. This is the result.

Conclusion

To revisit what we talked about, to build expression evaluation we need:

  • Lexer to convert a text to a sequence of tokens;
  • Parser to convert a sequence of tokens to Abstract Syntax Tree (AST);
  • Instruction set and a virtual machine to execute;
  • Traverse AST and generate instructions.

For a real programming language we need many more things, such as conditions, loops, variables, type system, etc. This is out of scope for this article. I hope the articles gives a simple introduction to a complex and interesting topic.

Take a look at the demo app and I also included some steps to setup development environment for ANTLR.

The demo app is created in Xcode 11.5 so you should probably have this or later version. The app uses SwiftUI, Swift Package Manager, and ANTLR v4.

Thank you for reading. If you like the article: clap 👏 , follow, and share. Hope to see you next time 😊

Setup ANTLR to use with Swift and Xcode

ANTLR consists of a generator and a runtime. Generator converts lexer or parser grammar in .g4 file into Swift code. To run this code we need the runtime.

The most convenient way to use the runtime is via Swift Package Manager. ANTLR provides a script to generate local Swift package. You need to have ANTLR v4 installed and than you can generate Swift Package to use it.

Please refer to ANTLR v4 github page for official documentation and updates.

ANTLR v4 Installation

Source: https://github.com/antlr/antlr4/blob/master/doc/getting-started.md

0. Install Java (version 1.6 or higher)
1. Download

$ cd /usr/local/lib
$ curl -O https://www.antlr.org/download/antlr-4.7.1-complete.jar

Or just download in browser from website:
[https://www.antlr.org/download.html](https://www.antlr.org/download.html)
and put it somewhere rational like `/usr/local/lib`.
2. Add `antlr-4.7.1-complete.jar` to your `CLASSPATH`:

$ export CLASSPATH=".:/usr/local/lib/antlr-4.7.1-complete.jar:$CLASSPATH"

It's also a good idea to put this in your `.bash_profile` or whatever your startup script is.
3. Create aliases for the ANTLR Tool, and `TestRig`.
$ alias antlr4='java -Xmx500M -cp "/usr/local/lib/antlr-4.7.1-complete.jar:$CLASSPATH" org.antlr.v4.Tool'
$ alias grun='java -Xmx500M -cp "/usr/local/lib/antlr-4.7.1-complete.jar:$CLASSPATH" org.antlr.v4.gui.TestRig'

Build ANTLR v4 Runtime Swift Package

Source: https://github.com/antlr/antlr4/blob/master/doc/swift-target.md

1. Download source code for ANTLR

git clone https://github.com/antlr/antlr4
2. Use boot.py script to generate Swift packagecd antlr4/runtime/Swift
python boot.py --gen-spm-module
The output must be[ANTLR] Created local repository.
[ANTLR] (swift-tools-version:3.0) Put .Package(url: "/private/tmp/Antlr4-tmp-1590271493", majorVersion: 4) in Package.swift.
[ANTLR] (swift-tools-wersion:4.0) Put .package(url: "/private/tmp/Antlr4-tmp-1590271493", from: "4.0.0") in Package.swift and add "Antlr4" to target dependencies.

One my machine the script created Swift package in the temporary directory at /private/tmp/Antlr4-tmp-1590271493. Now you can open Xcode and add Swift package. Make sure to include file:// prefix, like so:

file:///private/tmp/Antlr4-tmp-1590271493

You can also move the package folder to a persistent location.

Testing your grammar

You can test your grammar directly in the command line (Terminal):

$ antlr4 Arithmetic.g4 && javac Arithmetic*.java && echo “1 + 2 × 3” | grun Arithmetic root -tree(root (expression (expression (number 1)) + (expression (expression (number 2)) × (expression (number 3)))) <EOF>)

or using GUI:

$ antlr4 Arithmetic.g4 && javac Arithmetic*.java && echo “1 + 2 × 3” | grun Arithmetic root -gui

GUI is especially useful for instructions (code) generation, to see if AST matches your rules.

--

--

Dmytro Anokhin
Dmytro Anokhin

Written by Dmytro Anokhin

iOS Developer, here to share best practices learned through my experience. You can find me on Twitter: https://twitter.com/dmytroanokhin

Responses (1)