Chumsky Versions Save

Write expressive, high-performance parsers with ease.

1.0.0-alpha.0

1 year ago

This is the first released version of chumsky's 'zero-copy' rewrite.

This release has no precise changelog, although one will be added when 1.0.0 is eventually released in full form. For now, things are still fluctuating enough that a full changelog would inevitably be out of date in a few weeks.

Thanks

This release has been over a year in development and represents the work of a lot of people. In particular:

@CraftSpider, who effectively co-developed the rewrite with me and came up with large chunks of the core API
@wackbyte, who ported many combinators over to the new codebase as well as adding no_std support
@bew, who reworked many combinators around changes to the core API
@Zij-IT, who ported all of the text combinators across, as well as the (yet to be merged) pratt parser combinator created by @alvra
Many other contributors who worked on smaller items

How you can help

This is the first alpha release. Do not expect a finished product: many minor API features are still incomplete, missing, or subject to change. Some documentation is incomplete, or still refers to concepts from past versions of the crate. In particular, the tutorial has not yet been updated. You may experience bugs, API footguns, and more issues besides. That said, we're releasing this version because we believe the core of the rewrite is ready to be exposed to users and we want to find out what problems there are and catch them before a full release.

We'd like folks to open issues if they find:

Bugs
API oddities (things that don't look/feel right, or could be expressed more neatly)
Things that feel like they should work, but don't (lifetime issues, unnecessary cloning, etc.)
Missing features

If you're an even awesome-er sort of person and you feel like contributing to the crate, there's still a lot of work that needs doing in the following areas:

Documentation
Writing/updating examples
Filling API 'holes'
Porting old APIs over
Small improvements to existing combinators
Writing tests
API design: there's still work to be done on the context-sensitivity, recovery, and iterable parser APIs

All this aside, you'd be helping us out a bunch just by using this alpha release (especially porting existing chumsky parsers over to it) and telling us how you got on: what worked, what didn't work, what things you got stuck on or confused by, etc. If you'd like to give a more casual report like this, feel free to start a discussion.

What's new?

Needless to say, the crate has received a substantial upgrade, overhauling virtually every aspect of its API. It's substantially more capable than it ever was, and now supports the following:

Zero-copy parsing: parser outputs can hold references to the input
Nested parsing: parsers can handle nested data structures like token trees
Stateful parsing: parsers can be parameterised by state, allowing for the natural integration of arena allocators, string interners, etc.
Memoisation: parsers can opt into memoisation, allowing you to quickly parse awkward grammars that would normally produce exponential behaviour in a traditional recursive descent parser
Left recursion: the aforementioned memoisation feature can also properly handle left recursive grammars elegantly
Context-sensitive parsing: parsers can use built-in context sensitivity to carefully parameterise future parsers, allowing you to parse things like Rust-style raw strings, Pythonic indentation, and other context-sensitive syntax that context-free parsers traditionally struggle with
Iterable parsers: parsers that produce multiple outputs can now be turned into iterators, similar to logos lexers

Performance

On top of all of that, we've worked really hard to push performance as far as we can using an innovative use of Generic Associated Types (GATs) internally that allows chumsky to automatically detect when an output is never used (such as with .then_ignore(...)) and avoid generating it all in the first place. You can find some technical details about this approach in Niko Matsakis' blog where they discuss chumsky.

Our work on performance has paid off: chumsky's JSON benchmark is now extremely competitive, beating out nom and others, and even banging on the door of more traditional hand-written JSON parsers.

In general, you can probably expect this new release to be several times faster than older releases for similar parsers. The JSON benchmark is about 12x faster.

Conclusion

Pushing zero-copy to the point of a release was always going to be a very long road to walk, but we're finally approaching the end. Thanks for using chumsky, and - if you're fortunate enough to have the resources and kind enough to consider donating them - please support the other contributors I listed at the top of this release!

0.9

1 year ago

Added

A spill-stack feature that uses stacker to avoid stack overflow errors for deeply recursive parsers
The ability to access the token span when using select! like select! { |span| Token::Num(x) => (x, span) }
Added a skip_parser recovery strategy that allows you to implement your own recovery strategies in terms of other parsers. For example, .recover_with(skip_parser(take_until(just(';')))) skips tokens until after the next semicolon
A not combinator that consumes a single token if it is not the start of a given pattern. For example, just("\\n").or(just('"')).not() matches any char that is not either the final quote of a string, and is not the start of a newline escape sequence
A semantic_indentation parser for parsing indentation-sensitive languages. Note that this is likely to be deprecated/removed in the future in favour of a more powerful solution
#[must_use] attribute for parsers to ensure that they're not accidentally created without being used
Option<Vec<T>> and Vec<Option<T>> now implement Chain<T> and Option<String> implements Chain<char>
choice now supports both arrays and vectors of parsers in addition to tuples
The Simple error type now implements Eq

Changed

text::whitespace returns a Repeated instead of an impl Parser, allowing you to call methods like at_least and exactly on it.
Improved no_std support
Improved examples and documentation
Use zero-width spans for EoI by default
Don't allow defining a recursive parser more than once
Various minor bug fixes
Improved Display implementations for various built-in error types and SimpleReason
Use an OrderedContainer trait to avoid unexpected behaviour for unordered containers in combination with just

Fixed

Made several parsers (todo, unwrapped, etc.) more useful by reporting the parser's location on panic
Boxing a parser that is already boxed just gives you the original parser to avoid double indirection
Improved compilation speeds

0.8

2 years ago

Added

then_with combinator to allow limited support for parsing nested patterns
impl From<&[T; N]> for Stream
SkipUntil/SkipThenRetryUntil::skip_start/consume_end for more precise control over skip-based recovery

Changed

Allowed Validate to map the output type
Switched to zero-size End Of Input spans for default implementations of Stream
Made delimited_by take combinators instead of specific tokens
Minor optimisations
Documentation improvements

Fixed

Compilation error with --no-default-features
Made default behaviour of skip_until more sensible

0.7

2 years ago

Added

A new tutorial to help new users
select macro, a wrapper over filter_map that makes extracting data from specific tokens easy
choice parser, a better alternative to long or chains (which sometimes have poor compilation performance)
todo parser, that panics when used (but not when created) (akin to Rust's todo! macro, but for parsers)
keyword parser, that parses exact identifiers
from_str combinator to allow converting a pattern to a value inline, using std::str::FromStr
unwrapped combinator, to automatically unwrap an output value inline
rewind combinator, that allows reverting the input stream on success. It's most useful when requiring that a pattern is followed by some terminating pattern without the first parser greedily consuming it
map_err_with_span combinator, to allow fetching the span of the input that was parsed by a parser before an error was encountered
or_else combinator, to allow processing and potentially recovering from a parser error
SeparatedBy::at_most to require that a separated pattern appear at most a specific number of times
SeparatedBy::exactly to require that a separated pattern be repeated exactly a specific number of times
Repeated::exactly to require that a pattern be repeated exactly a specific number of times
More trait implementations for various things, making the crate more useful

Changed

Made just, one_of, and none_of significant more useful. They can now accept strings, arrays, slices, vectors, sets, or just single tokens as before
Added the return type of each parser to its documentation
More explicit documentation of parser behaviour
More doc examples
Deprecated seq (just has been generalised and can now be used to parse specific input sequences)
Sealed the Character trait so that future changes are not breaking
Sealed the Chain trait and made it more powerful
Moved trait constraints on Parser to where clauses for improved readability

Fixed

Fixed a subtle bug that allowed separated_by to parse an extra trailing separator when it shouldn't
Filled a 'hole' in the Error trait's API that conflated a lack of expected tokens with expectation of end of input
Made recursive parsers use weak reference-counting to avoid memory leaks