Statically-checked inline matching on regular expressions in Scala
Statically-typed inline pattern matching on regular expressions
Kaleidoscope is a small library to make pattern matching against strings more pleasant. Regular expressions can be written directly in patterns, and capturing groups bound directly to variables, typed according to the group's repetition. Here is an example:
case class Email(user: Text, domain: Text)
email match
case r"$user([^@]+)@$domain(.*)" => Email(name, domain)
Strings are widely used to carry complex data, when it's wiser to use structured objects. Kaleidoscope makes it easier to move away from strings.
List
s or Vacuous Optional
s) of variable-length capturing groupsKaleidoscope is available as a binary for Scala 3.4.0 and later, from Maven
Central. To include it in an sbt
build, use
the coordinates:
libraryDependencies += "dev.soundness" % "kaleidoscope-core" % "0.1.0"
Kaleidoscope is included in the kaleidoscope
package, and exported to the
soundness
package.
To use Kaleidoscope alone, you can include the import,
import kaleidoscope.*
or to use it with other Soundness libraries, include:
import soundness.*
Note that Kaleidoscope uses the Text
type from
Anticipation and the Optional
type from Vacuous. These offer some
advantages over String
and Option
, and they can be easily converted:
Text#s
converts a Text
to a String
and Optional#option
converts an
Optional
value to its equivalent Option
. The necessary imports are shown in
the examples.
You can then use a Kaleidoscope regular expression—a string prefixed with
the letter r
—anywhere you can pattern match against a string in Scala. For example,
import anticipation.Text
def describe(path: Text): Unit =
path match
case r"/images/.*" => println("image")
case r"/styles/.*" => println("stylesheet")
case _ => println("something else")
or,
import vacuous.{Optional, Unset}
def validate(email: Text): Optional[Text] = email match
case r"^[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,6}$$" => email
case _ => Unset
Such patterns will either match or not, however should they match, it is
possible to extract parts of the matched string using capturing groups. The
pattern syntax is exactly as described in the Java Standard
Library,
with the exception that a capturing group (enclosed within (
and )
) may be
bound to an identifier by placing it, like an interpolated string substitution,
immediately prior to the capturing group, as $identifier
or ${identifier}
.
Here is an example of using a pattern match against filenames:
enum FileType:
case Image(text: Text)
case Stylesheet(text: Text)
def identify(path: Text): FileType = path match
case r"/images/${img}(.*)" => FileType.Image(img)
case r"/styles/$styles(.*)" => FileType.Stylesheet(styles)
Alternatively, as with patterns in general, this can be extracted directly in a
val
definition.
Here is an example of matching an email address:
val r"^[a-z0-9._%+-]+@$domain([a-z0-9.-]+\.$tld([a-z]{2,6}))$$" =
"[email protected]": @unchecked
The @unchecked
annotation ascribed to the result is standard Scala, and
acknowledges to the compiler that the match is partial and may fail at
runtime.
If you try this example in the Scala REPL, it would bind the following values:
> domain: Text = t"example.com"
> tld: Text = t"com"
In addition, the syntax of the regular expression will be checked at compile-time, and any issues will be reported then.
A normal, unitary capturing group, like domain
and tld
above, will
extract into Text
values. But if a capturing group has a repetition suffix,
such as *
or +
, then the extracted type will be a List[Text]
. This also
applies to repetition ranges, such as {3}
, {2,}
or {1,9}
.
Note that {1}
will still extract a Text
value. The type is determined
statically from the pattern, and not dynamically from the runtime scrutinee.
A capture group may be marked as optional, meaning it can appear either zero or
one times. This will extract a value with the type Optional[Text]
; that is,
if it present it will be a Text
value, and if not, it will be Unset
.
For example, see how init
is extracted as a List[Text]
, below:
import gossamer.{drop, Rtl}
def parseList(): List[Text] = "parsley, sage, rosemary, and thyme" match
case r"$only([a-z]+)" => List(only)
case r"$first([a-z]+) and $second([a-z]+)" => List(first, second)
case r"$init([a-z]+, )*and $last([a-z]+)" => init.map(_.drop(2, Rtl)) :+ last
Note that inside an extractor pattern string, whether it is single- (r"..."
)
or triple-quoted (r"""..."""
), special characters, notably \
, do not need
to be escaped, with the exception of $
which should be written as $$
.
It is still necessary, however, to follow the regular expression escaping
rules, for example, an extractor matching a single opening parenthesis would be
written as r"\("
or r"""\("""
.
Globs offer a simplified and limited form of regular expression. You can use
these in exactly the same way as a standard regular expresion, using the
g"..."
interpolator instead.
Kaleidoscope is classified as maturescent. For reference, Soundness projects are categorized into one of the following five stability levels:
1.0.0
or laterProjects at any stability level, even embryonic projects, can still be used, as long as caution is taken to avoid a mismatch between the project's stability level and the required stability and maintainability of your own project.
Kaleidoscope is designed to be small. Its entire source code currently consists of 530 lines of code.
Kaleidoscope will ultimately be built by Fury, when it is published. In the meantime, two possibilities are offered, however they are acknowledged to be fragile, inadequately tested, and unsuitable for anything more than experimentation. They are provided only for the necessity of providing some answer to the question, "how can I try Kaleidoscope?".
Copy the sources into your own project
Read the fury
file in the repository root to understand Kaleidoscope's build
structure, dependencies and source location; the file format should be short
and quite intuitive. Copy the sources into a source directory in your own
project, then repeat (recursively) for each of the dependencies.
The sources are compiled against the latest nightly release of Scala 3. There should be no problem to compile the project together with all of its dependencies in a single compilation.
Build with Wrath
Wrath is a bootstrapping script for building Kaleidoscope and other projects in
the absence of a fully-featured build tool. It is designed to read the fury
file in the project directory, and produce a collection of JAR files which can
be added to a classpath, by compiling the project and all of its dependencies,
including the Scala compiler itself.
Download the latest version of
wrath
, make it
executable, and add it to your path, for example by copying it to
/usr/local/bin/
.
Clone this repository inside an empty directory, so that the build can
safely make clones of repositories it depends on as peers of kaleidoscope
.
Run wrath -F
in the repository root. This will download and compile the
latest version of Scala, as well as all of Kaleidoscope's dependencies.
If the build was successful, the compiled JAR files can be found in the
.wrath/dist
directory.
Contributors to Kaleidoscope are welcome and encouraged. New contributors may like to look for issues marked beginner.
We suggest that all contributors read the Contributing Guide to make the process of contributing to Kaleidoscope easier.
Please do not contact project maintainers privately with questions unless there is a good reason to keep them private. While it can be tempting to repsond to such questions, private answers cannot be shared with a wider audience, and it can result in duplication of effort.
Kaleidoscope was designed and developed by Jon Pretty, and commercial support and training on all aspects of Scala 3 is available from Propensive OÜ.
Kaleidoscope is named after the optical instrument which shows pretty patterns to its user, while the library also works closely with patterns.
In general, Soundness project names are always chosen with some rationale, however it is usually frivolous. Each name is chosen for more for its uniqueness and intrigue than its concision or catchiness, and there is no bias towards names with positive or "nice" meanings—since many of the libraries perform some quite unpleasant tasks.
Names should be English words, though many are obscure or archaic, and it should be noted how willingly English adopts foreign words. Names are generally of Greek or Latin origin, and have often arrived in English via a romance language.
The logo is a loose allusion to a hexagonal pattern, which could appear in a kaleidoscope.
Kaleidoscope is copyright © 2024 Jon Pretty & Propensive OÜ, and is made available under the Apache 2.0 License.