Supporting code for my talk entitled "Parsing Formal Languages with Swift"
This repository is a companion to my 2019 talk entitled Parsing Formal Languages with Swift (a copy of which is included in the repository).
The included Xcode project contains Swift implementations of a lexer, parser, formatter, interpreter and transpiler for a very simple language.
Neither the language nor the parser are particularly useful in their own right, but the techniques demonstrated in the code can be applied to more complex languages and applications.
The toy language implemented in this project has the following syntax:
Numbers
Both integers and floating point numbers are supported, up to the precision of a Swift Double
. Negative numbers are not supported, nor exponentials, hexadecimal, etc.
5
0.5
57000
.1
String literals
String literals must be enclosed in doubles quotes ("
). String literals can contain any character including a carriage return or linefeed. Double quotes must be escaped with a backslash (\
) as must the backslash character itself.
"Hello World"
"This string has a
linebreak in it"
"This string has \"escaped quotes\""
"The string has an escaped \\ (backslash)"
Variables
Variables names must begin with a letter, optionally followed by one or more letters or numbers. Other symbols such as _
or emoji are not supported.
Any valid name can be used as a variable apart from the keywords let
and print
which are reserved.
a
foo
Bar56
Expressions
An expression can contain one or more numbers, string literals or variables. Variables must be declared before they are used in an expression.
Expressions also support the infix +
operator, which performs an addition if both operands are numeric, or a concatenation if either operand is a string. Expressions may contain multiple +
operations.
Parentheses are not supported, nor are any other operators (unless you are using the operator-precedence branch, which supports the *
operator for multiplication of numbers).
foo
5
hello + "world"
1 + 2 + 3
"Jonny" + 5
Declarations
A declaration creates a variable and assigns a value to it. The value can be any valid expression, and can reference variables declared previously.
Variables have no intrinsic type beyond their currently stored value. You may redeclare the same variable multiple times with different values and types.
let bar = "cat"
let foo = 5 + bar
The print statement outputs a value to the console. As with declarations, the value can be any valid expression.
print "Hello World"
print 5 + 6
Whitespace
The language is whitespace-agnostic. Any number of linebreaks or spaces can appears between any pair of tokens. Multiple statements can be placed on the same line, and space is only required between tokens in cases where its omission would result in ambiguity (e.g. between a keyword and a variable).
Comments
There is no support for comments.
The Parsing.xcodeproj project requires Xcode 10.1 or higher. It builds a Swift framework with the following features:
Lexer
This is a lexer for the simple language described in the talk. The lexer is invoked via the tokenize()
function, which takes a String
as input and returns an array of Token
s. All code for the lexer is included in Lexer.swift and it has no other dependencies besides Foundation.
// usage
import Parsing
let input = "let foo = 5 \n print foo"
let tokens = try! tokenize(input)
print(tokens)
Parser
This is a parser for the language. The parser is invoked via the parse()
function, which takes a String
as input and returns an array of Statement
s. All code for the parser is included in Parser.swift, which depends only on Lexer.swift and has no other dependencies besides Foundation.
// usage
import Parsing
let input = "let foo = 5 \n print foo"
let statements = try! parse(input)
print(statements)
Formatter
This is a formatter or "pretty-printer" for the language, as mentioned in the talk. The formatter is invoked via the format()
function, which takes an array of Statement
s as input and returns a String
containing the formatted source code. All code for the formatter is included in Formatter.swift, which depends on Parser.swift, Lexer.swift and Foundation.
// usage
import Parsing
let input = "let foo = 5 \n print foo"
let statements = parse(input)
let output = try! format(statements)
print(output)
Interpreter
This is an interpreter for the language, as mentioned in the talk. The interpreter is invoked via the evaluate()
function, which takes an array of Statement
s as input and returns a String
containing the output of the program. All code for the interpreter is included in Interpreter.swift, which depends on Parser.swift, Lexer.swift and Foundation.
// usage
import Parsing
let input = "let foo = 5 \n print foo"
let statements = parse(input)
let output = try! evaluate(statements)
print(output)
Transpiler
This is a Swift transpiler for the language, as mentioned in the talk. The transpiler is invoked via the transpile()
function, which takes an array of Statement
s as input and returns a String
containing the equivalent Swift source code to run the program. All code for the transpiler is included in Transpiler.swift, which depends on Parser.swift, Lexer.swift and Foundation.
// usage
import Parsing
let input = "let foo = 5 \n print foo"
let statements = parse(input)
let output = try! transpile(statements)
try! output.write(toFile: "program.swift", atomically: true, encoding: .utf8)
There are several branches included in the git repository:
This contains the basic version of the parsing project, as described in the talk.
This branch includes an alternative lexer implementation that uses NSRegularExpression
.
This branch adds support for the multiplication operator, as detailed in the enhancements section of the talk.
This branch improves the error handling mechanism, as suggested in the enhancements section of the talk:
Tokens and AST nodes now include source range information.
Multiple lexing errors can now be detected and handled at a later stage of the parser.
Error types are now more specific and include sufficient information to derive the exact source location of the error, even at the interpreter stage.
As a bonus, the lexer also includes a method for converting a source string index to a line/column offset.