Write expressive, high-performance parsers with ease.
This is the first released version of chumsky
's 'zero-copy' rewrite.
This release has no precise changelog, although one will be added when 1.0.0
is eventually released in full form. For now, things are still fluctuating enough that a full changelog would inevitably be out of date in a few weeks.
This release has been over a year in development and represents the work of a lot of people. In particular:
This is the first alpha release. Do not expect a finished product: many minor API features are still incomplete, missing, or subject to change. Some documentation is incomplete, or still refers to concepts from past versions of the crate. In particular, the tutorial has not yet been updated. You may experience bugs, API footguns, and more issues besides. That said, we're releasing this version because we believe the core of the rewrite is ready to be exposed to users and we want to find out what problems there are and catch them before a full release.
We'd like folks to open issues if they find:
If you're an even awesome-er sort of person and you feel like contributing to the crate, there's still a lot of work that needs doing in the following areas:
All this aside, you'd be helping us out a bunch just by using this alpha release (especially porting existing chumsky parsers over to it) and telling us how you got on: what worked, what didn't work, what things you got stuck on or confused by, etc. If you'd like to give a more casual report like this, feel free to start a discussion.
Needless to say, the crate has received a substantial upgrade, overhauling virtually every aspect of its API. It's substantially more capable than it ever was, and now supports the following:
logos
lexersOn top of all of that, we've worked really hard to push performance as far as we can using an innovative use of Generic Associated Types (GATs) internally that allows chumsky
to automatically detect when an output is never used (such as with .then_ignore(...)
) and avoid generating it all in the first place. You can find some technical details about this approach in Niko Matsakis' blog where they discuss chumsky
.
Our work on performance has paid off: chumsky
's JSON benchmark is now extremely competitive, beating out nom
and others, and even banging on the door of more traditional hand-written JSON parsers.
In general, you can probably expect this new release to be several times faster than older releases for similar parsers. The JSON benchmark is about 12x faster.
Pushing zero-copy to the point of a release was always going to be a very long road to walk, but we're finally approaching the end. Thanks for using chumsky, and - if you're fortunate enough to have the resources and kind enough to consider donating them - please support the other contributors I listed at the top of this release!
spill-stack
feature that uses stacker
to avoid stack overflow errors for deeply recursive parsersselect!
like select! { |span| Token::Num(x) => (x, span) }
skip_parser
recovery strategy that allows you to implement your own recovery strategies in terms of other
parsers. For example, .recover_with(skip_parser(take_until(just(';'))))
skips tokens until after the next semicolonnot
combinator that consumes a single token if it is not the start of a given pattern. For example,
just("\\n").or(just('"')).not()
matches any char
that is not either the final quote of a string, and is not the
start of a newline escape sequencesemantic_indentation
parser for parsing indentation-sensitive languages. Note that this is likely to be
deprecated/removed in the future in favour of a more powerful solution#[must_use]
attribute for parsers to ensure that they're not accidentally created without being usedOption<Vec<T>>
and Vec<Option<T>>
now implement Chain<T>
and Option<String>
implements Chain<char>
choice
now supports both arrays and vectors of parsers in addition to tuplesSimple
error type now implements Eq
text::whitespace
returns a Repeated
instead of an impl Parser
, allowing you to call methods like at_least
and
exactly
on it.no_std
supportDisplay
implementations for various built-in error types and SimpleReason
OrderedContainer
trait to avoid unexpected behaviour for unordered containers in combination with just
todo
, unwrapped
, etc.) more useful by reporting the parser's location on panicthen_with
combinator to allow limited support for parsing nested patternsSkipUntil/SkipThenRetryUntil::skip_start/consume_end
for more precise control over skip-based recoveryValidate
to map the output typeStream
delimited_by
take combinators instead of specific tokens--no-default-features
skip_until
more sensibleA new tutorial to help new users
select
macro, a wrapper over filter_map
that makes extracting data from specific tokens easy
choice
parser, a better alternative to long or
chains (which sometimes have poor compilation performance)
todo
parser, that panics when used (but not when created) (akin to Rust's todo!
macro, but for parsers)
keyword
parser, that parses exact identifiers
from_str
combinator to allow converting a pattern to a value inline, using std::str::FromStr
unwrapped
combinator, to automatically unwrap an output value inline
rewind
combinator, that allows reverting the input stream on success. It's most useful when requiring that a
pattern is followed by some terminating pattern without the first parser greedily consuming it
map_err_with_span
combinator, to allow fetching the span of the input that was parsed by a parser before an error
was encountered
or_else
combinator, to allow processing and potentially recovering from a parser error
SeparatedBy::at_most
to require that a separated pattern appear at most a specific number of times
SeparatedBy::exactly
to require that a separated pattern be repeated exactly a specific number of times
Repeated::exactly
to require that a pattern be repeated exactly a specific number of times
More trait implementations for various things, making the crate more useful
just
, one_of
, and none_of
significant more useful. They can now accept strings, arrays, slices, vectors,
sets, or just single tokens as beforeseq
(just
has been generalised and can now be used to parse specific input sequences)Character
trait so that future changes are not breakingChain
trait and made it more powerfulParser
to where clauses for improved readabilityseparated_by
to parse an extra trailing separator when it shouldn'tError
trait's API that conflated a lack of expected tokens with expectation of end of input