# Pyrocat101 Opal

Self-contained monadic parser combinators for OCaml

# Opal: Monadic Parser Combinators for OCaml

Opal is a minimum collection of useful parsers and combinators (~150 loc of OCaml) that makes writing parsers easier. It is designed to be small, self-contained, pure-functional, and only includes most essential parsers, so that one could include single file in the project or just embed it in other OCaml source code files.

I find myself writing lots of recursive-descent parsers from scratch in OCaml when I was solving Hackerrank FP challenges. That's why I wrote opal: to include it in the source code and build parsers on top of parser combinators easily.

## Example

Trivial arithmetic calculator:

``````open Opal

let parens = between (exactly '(') (exactly ')')
let integer = many1 digit => implode % int_of_string
let add = exactly '+' >> return ( + )
let sub = exactly '-' >> return ( - )
let mul = exactly '*' >> return ( * )
let div = exactly '/' >> return ( / )

let rec expr input = chainl1 term (add <|> sub) input
and term input = chainl1 factor (mul <|> div) input
and factor input = (parens expr <|> integer) input

let () =
let input = LazyStream.of_channel stdin in
match parse expr input with
| Some ans -> Printf.printf "%d\n" ans
| None -> print_endline "ERROR!"
``````

For non-trivial examples, see Hackerrank challenge solutions using opal in `examples/`.

## Documentation

The expressiveness of parser combinators are attributed to higher-order functions and the extensive use of currying. However, due to lack of `do` syntax, the bind operation of monad would not be as succinct as that in Haskell.

A parser monad is either `None` (indicates failure), or `Some` pair of result and unconsumed input, where the result is a user-defined value. The input is a lazy stream of arbitrary token type. A parser is a function that accepts an input and returns a parser monad. Although most parsers in opal is polymorphic over token type and result type, some useful parsers only accepts `char` as input token type.

Since combinators in opal are roughly based on Haskell's Parsec. The following documentation is somehow a rip-off of Parsec's doc.

### Lazy Stream

`type 'a LazyStream t`

Polymorphic lazy stream type.

`val LazyStream.of_stream : 'a Stream.t -> 'a LazyStream.t`

Build a lazy stream from stream.

`val LazyStream.of_function : (unit -> 'a) -> 'a LazyStream.t`

Build a lazy stream from a function `f`. The elements in the stream is populated by calling `f ()`.

`val LazyStream.of_string : string -> char LazyStream.t`

Build a char lazy stream from string.

`val LazyStream.of_channel : in_channel -> char LazyStream.t`

Build a char lazy stream from a input channel.

### Utilities

`val implode : char list -> bytes`

Implode character list into a string. Useful when used with `many`.

`val explode : bytes -> char list`

Explode a string into a character list.

`val ( % ) : ('a -> 'b) -> ('b -> 'c) -> 'a -> 'c`

Infix operator for left-to-right function composition. `(f % g % h) x` is equivalent to `h (g (f x))`.

`val parse : ('token, 'result) parser -> 'token LazyStream -> 'result option`

`parse parser input` parses `input` with `parser`, and returns `Some result` if succeed, or `None` on failure.

### Primitives

`type 'token input = 'token LazyStream.t`

`type ('token, 'result) monad = ('result * 'token input) option`

`type ('token, 'result) parser = 'token input -> ('token, 'result) monad`

A parser is a function that accepts an input and returns either `None` on failure, or `Some (result, input')`, where `result` is user-defined value and `input'` is unconsumed input after parsing.

`val return : 'result -> 'token input -> ('token, 'result) monad`

Accepts a value and an input, and returns a monad.

`val ( >>= ) : ('t, 'r) parser -> ('r -> ('t, 'r) monad) -> ('t, 'r) parser`

`x >>= f` returns a new parser that if parser `x` succeeds, applies function `f` on monad produced by `x`, and produces a new monad (a.k.a. `bind`).

`val ( let* ) : ('t, 'r) parser -> ('r -> ('t, 'r) monad) -> ('t, 'r) parser`

This operator is the same as `>>=` but using the `let` notation. It is usefull to avoid ugly sequences of bindings. For exemple, `p >>= fun x -> f x` can be rewritten `let* x = p in f x`. Combined with the `return` function, we can define complex parsers :

``````let tuple_parser =
let* x = digit in
let* _ = exactly ',' in
let* y = digit in
return (x, y)
``````

`val ( <|> ) : ('t, 'r) parser -> ('t, 'r) parser -> ('t, 'r) parser`

Choice combinator. The parser `p <|> q` first applies `p`. If it succeeds, the value of `p` is returned. If `p` fails, parser `q` is tried.

`val mzero : 'a -> ('t, 'r) monad`

A parser that always fails.

`val any : ('t, 'r) parser`

The parser succeeds for any token in the input. Consumes a token and returns it.

`val satisfy : ('t -> bool) -> ('t, 'r) parser`

The parser `satisfy test` succeeds for any token for which the supplied function `test` returns `true`. Returns the token that is actually parsed.

`val eof : 'a -> ('t, 'a) parser`

The parser `eof x` succeeds if the input is exhausted. Returns value `x`.

### Derived

`val ( => ) : ('t, 'r) parser -> ('r -> 'a) -> ('t, 'a) parser`

Map combinator. `x => f` parses `x`. If it succeeds, returns the value of `x` applied with `f`.

`val ( >> ) : ('t, 'a) parser -> ('t, 'b) parser -> ('t, 'b) parser`

Ignore-left combinator. `x >> y` parses `x` and then `y`. Returns the value returned by `y`.

`val ( << ) : ('t, 'a) parser -> ('t, 'b) parser -> ('t, 'a) parser`

Ignore-right combinator. `x >> y` parses `x` and then `y`. Returns the value returned by `x`.

`val ( <~> ) : ('t, 'r) parser -> ('t, 'r list) parser -> 't input -> ('t, 'r list) monad`

Cons combinator. `x <~> y` parses `x` and then `y`. Returns the value of `x` prepended to the value of `y` (a list).

``````let ident = letter <~> many alpha_num
``````

`val choice : ('t, 'r) parser list -> ('t, 'r) parser`

`choice ps` tries to apply the parsers in the list `ps` in order, until one of them succeeds. Returns the value of the succeeding parser.

`val count : int -> ('t, 'r) parser -> 't input -> ('t, 'r list) monad`

`count n` parses `n` occurrences of `p`. If `n` is smaller or equal to zero, the parser equals to `return []`. Returns a list of `n` values returned by `p`.

`between : ('t, 'a) parser -> ('t, 'b) parser -> ('t, 'r) parser -> ('t, 'r) parser`

`between open close p` parses `open`, followed by `p` and `close`. Returns the value returned by `p`.

``````let braces = between (exactly '{') (exactly '}')
``````

`val option : 'r -> ('t, 'r) parser -> ('t, 'r) parser`

`option default p` tries to apply parser `p`. If `p` fails, it returns the value `default`, otherwise the value returned by `p`.

``````let priority = option 0 (digit => String.make 1 % int_of_string)
``````

`val optional : 'r -> ('t, 'r) parser -> ('t, unit) parser`

`optional p` tries to apply parser `p`. It will parse `p` or nothing. It only fails if `p` fails. Discard the result of `p`.

`val skip_many : ('t, 'r) parser -> ('t, unit) parser`

`skip_many p` applies `p` zero or more times, skipping its result.

``````let spaces = skip_many space
``````

`val skip_many1 : ('t, 'r) parser -> ('t, unit) parser`

`skip_many1 p` applies `p` one or more times, skipping its result.

`val many : ('t, 'r) parser -> 't input -> ('t, 'r list) monad`

`many p` applies the parser `p` zero or more times. Returns a list of returned values of `p`.

`val many1 : ('t, 'r) parser -> 't input -> ('t, 'r list) monad`

`many1 p` applies the parser `p` one or more times. Returns a list of returned values of `p`.

`val sep_by : ('t, 'r) parser -> ('t, 'a) parser -> 't input -> ('t, 'r list) monad`

`sep_by p sep` parses zero or more occurrences of `p`, separated by `sep`. Returns a list of values returned by `p`.

``````let comma_sep p = sep_by p (token ",")
``````

`val sep_by1 : ('t, 'r) parser -> ('t, 'a) parser -> 't input -> ('t, 'r list) monad`

`sep_by1 p sep` parses one or more occurrences of `p`, separated by `sep`. Returns a list of values returned by `p`.

`val end_by: ('t, 'r) parser -> ('t, 'a) parser -> ('t, 'r) parser`

`end_by p sep` parses zero or more ocurrences of `p`, separated and ended by `sep`. Returns a list of values returned by `p`.

``````let statements = end_by statement (token ";")
``````

`val end_by1: ('t, 'r) parser -> ('t, 'a) parser -> ('t, 'r) parser`

`end_by1 p sep` parses one or more ocurrences of `p`, separated and ended by `sep`. Returns a list of values returned by `p`.

`val chainl : ('t, 'r) parser -> ('t, 'r -> 'r -> 'r) parser -> 'r -> ('t, 'r) parser`

`chainl p op default` parses zero or more occurrences of `p`, separated by `op`. Returns a value obtained by a left associative application of all functions by `op` to the values returned by `p`. If there are zero occurences of `p`, the value `default` is returned.

`val chainl1 : ('t, 'r) parser -> ('t, 'r -> 'r -> 'r) parser -> ('t, 'r) parser`

`chainl1 p op` parses one or more occurrences of `p`, separate by `op`. Returns a value obtained by a left associative application of all functions returned by `op` to the values returned by `p`. This parser can be used to eliminate left recursion which typically occurs in expression grammars. See the arithmetic caculator example above.

`val chainr : ('t, 'r) parser -> ('t, 'r -> 'r -> 'r) parser -> 'r -> ('t, 'r) parser`

`chainr p op default` parses zero or more occurrences of `p`, separated by `op`. Returns a value obtained by right associative application of all functions returned by `op` to the values returned by `p`. If there are no occurrences of `p`, the value `x` is returned.

`val chainr1 : ('t, 'r) parser -> ('t, 'r -> 'r -> 'r) parser -> ('t, 'r) parser`

`chainr p op` parses one or more occurrences of `p`, separated by `op`. Returns a value obtained by right associative application of all functions returned by `op` to the values returned by `p`.

### Singletons

`val exactly : 'r -> ('t, 'r) parser`

`exactly x` parses a single token `x`. Returns the parsed token (i.e. `x`).

``````let semi_colon = exactly ';'
``````

`val one_of : 'r list -> ('t, 'r) parser`

`one_of xs` succeeds if the current token is in the supplied list of tokens `xs`. Returns the parsed token.

``````let vowel = one_of ['a'; 'e'; 'i'; 'o'; 'u']
``````

`val none_of : 'r list -> ('t, 'r) parser`

As the dual of `one_of`, `none_of xs` succeeds if the current token not in the supplied list of tokens `xs`. Returns the parsed token.

``````let consonant = none_of ['a'; 'e'; 'i'; 'o'; 'u']
``````

`val range : 'r -> 'r -> ('t, 'r) parser`

`range low high` succeeds if the current token is in the range between `low` and `high` (inclusive). Returns the parsed token.

### Char Parsers

`val space = (char, char) parser`

Parses a white space character (`'\s\t\r\n'`). Returns the parsed character.

`val spaces = (char, unit) parser`

Skip zero or more white spaces characters.

`val newline = (char, char) parser`

Parses a newline character (`'\n'`). Returns a newline character.

`val tab = (char, char) parser`

Parses a tab character (`'\t'`). Returns a tab character.

`val upper = (char, char) parser`

Parses an upper case letter (a character between 'A' and 'Z'). Returns the parsed character.

`val lower = (char, char) parser`

Parses a lower case letter (a character between 'a' and 'z'). Returns the parsed character.

`val digit : (char, char) parser`

Parses a digit. Returns the parsed character.

`val letter = (char, char) parser`

Parses a letter (an upper case or lower case letter). Returns the parsed character.

`val alpha_num = (char, char) parser`

Parses a letter or digit. Returns the parser character.

`val hex_digit = (char, char) parser`

Parses a hexadecimal digit (a digit or a letter between 'a' and 'f' or 'A' and 'F'). Returns the parsed character.

`val oct_digit = (char, char) parser`

Parses an octal digit (a character between '0' and '7'). Returns the parsed character.

### Lex Helper

`val lexeme : (char, 'r) parser -> (char, 'r) parser`

`lexeme p` first applies `skip_many space` and then parser `p`. Returns the value returned by `p`.

`val token : string -> char input -> (char, char list) monad`

`token s` skips leading white spaces and parses a sequence of characters given by string `s`. Returns the parsed character sequence as a list.

``````div_or_mod = token "div" <|> token "mod"
``````
Open Source Agenda is not affiliated with "Pyrocat101 Opal" Project. README Source: pyrocat101/opal
Stars
122
Open Issues
2
Last Commit
6 months ago
Repository 