Parsing Save

Supporting code for my talk entitled "Parsing Formal Languages with Swift"

Project README

Travis Coveralls Swift 4.1 License Twitter

Table of Contents

Introduction

This repository is a companion to my 2019 talk entitled Parsing Formal Languages with Swift (a copy of which is included in the repository).

The included Xcode project contains Swift implementations of a lexer, parser, formatter, interpreter and transpiler for a very simple language.

Neither the language nor the parser are particularly useful in their own right, but the techniques demonstrated in the code can be applied to more complex languages and applications.

Language

The toy language implemented in this project has the following syntax:

  • Numbers

    Both integers and floating point numbers are supported, up to the precision of a Swift Double. Negative numbers are not supported, nor exponentials, hexadecimal, etc.

    5
    0.5
    57000
    .1
    
  • String literals

    String literals must be enclosed in doubles quotes ("). String literals can contain any character including a carriage return or linefeed. Double quotes must be escaped with a backslash (\) as must the backslash character itself.

    "Hello World"
    "This string has a
    linebreak in it"
    "This string has \"escaped quotes\""
    "The string has an escaped \\ (backslash)"
    
  • Variables

    Variables names must begin with a letter, optionally followed by one or more letters or numbers. Other symbols such as _ or emoji are not supported.

    Any valid name can be used as a variable apart from the keywords let and print which are reserved.

    a
    foo
    Bar56
    
  • Expressions

    An expression can contain one or more numbers, string literals or variables. Variables must be declared before they are used in an expression.

    Expressions also support the infix + operator, which performs an addition if both operands are numeric, or a concatenation if either operand is a string. Expressions may contain multiple + operations.

    Parentheses are not supported, nor are any other operators (unless you are using the operator-precedence branch, which supports the * operator for multiplication of numbers).

    foo
    5
    hello + "world"
    1 + 2 + 3
    "Jonny" + 5
    
  • Declarations

    A declaration creates a variable and assigns a value to it. The value can be any valid expression, and can reference variables declared previously.

    Variables have no intrinsic type beyond their currently stored value. You may redeclare the same variable multiple times with different values and types.

    let bar = "cat"
    let foo = 5 + bar
    
  • Print

    The print statement outputs a value to the console. As with declarations, the value can be any valid expression.

    print "Hello World"
    print 5 + 6
    
  • Whitespace

    The language is whitespace-agnostic. Any number of linebreaks or spaces can appears between any pair of tokens. Multiple statements can be placed on the same line, and space is only required between tokens in cases where its omission would result in ambiguity (e.g. between a keyword and a variable).

  • Comments

    There is no support for comments.

Project

The Parsing.xcodeproj project requires Xcode 10.1 or higher. It builds a Swift framework with the following features:

  • Lexer

    This is a lexer for the simple language described in the talk. The lexer is invoked via the tokenize() function, which takes a String as input and returns an array of Tokens. All code for the lexer is included in Lexer.swift and it has no other dependencies besides Foundation.

    // usage
    import Parsing
    let input = "let foo = 5 \n print foo"
    let tokens = try! tokenize(input)
    print(tokens)
    
  • Parser

    This is a parser for the language. The parser is invoked via the parse() function, which takes a String as input and returns an array of Statements. All code for the parser is included in Parser.swift, which depends only on Lexer.swift and has no other dependencies besides Foundation.

    // usage
    import Parsing
    let input = "let foo = 5 \n print foo"
    let statements = try! parse(input)
    print(statements)
    
  • Formatter

    This is a formatter or "pretty-printer" for the language, as mentioned in the talk. The formatter is invoked via the format() function, which takes an array of Statements as input and returns a String containing the formatted source code. All code for the formatter is included in Formatter.swift, which depends on Parser.swift, Lexer.swift and Foundation.

    // usage
    import Parsing
    let input = "let foo = 5 \n print foo"
    let statements = parse(input)
    let output = try! format(statements)
    print(output)
    
  • Interpreter

    This is an interpreter for the language, as mentioned in the talk. The interpreter is invoked via the evaluate() function, which takes an array of Statements as input and returns a String containing the output of the program. All code for the interpreter is included in Interpreter.swift, which depends on Parser.swift, Lexer.swift and Foundation.

    // usage
    import Parsing
    let input = "let foo = 5 \n print foo"
    let statements = parse(input)
    let output = try! evaluate(statements)
    print(output)
    
  • Transpiler

    This is a Swift transpiler for the language, as mentioned in the talk. The transpiler is invoked via the transpile() function, which takes an array of Statements as input and returns a String containing the equivalent Swift source code to run the program. All code for the transpiler is included in Transpiler.swift, which depends on Parser.swift, Lexer.swift and Foundation.

    // usage
    import Parsing
    let input = "let foo = 5 \n print foo"
    let statements = parse(input)
    let output = try! transpile(statements)
    try! output.write(toFile: "program.swift", atomically: true, encoding: .utf8)
    

Branches

There are several branches included in the git repository:

  • master

    This contains the basic version of the parsing project, as described in the talk.

  • regex-based-lexer

    This branch includes an alternative lexer implementation that uses NSRegularExpression.

  • operator-precedence

    This branch adds support for the multiplication operator, as detailed in the enhancements section of the talk.

  • better-errors

    This branch improves the error handling mechanism, as suggested in the enhancements section of the talk:

    • Tokens and AST nodes now include source range information.

    • Multiple lexing errors can now be detected and handled at a later stage of the parser.

    • Error types are now more specific and include sufficient information to derive the exact source location of the error, even at the interpreter stage.

    • As a bonus, the lexer also includes a method for converting a source string index to a line/column offset.

Open Source Agenda is not affiliated with "Parsing" Project. README Source: nicklockwood/Parsing
Stars
42
Open Issues
0
Last Commit
5 years ago
License
MIT

Open Source Agenda Badge

Open Source Agenda Rating