Cats Effect Versions Save

The pure asynchronous runtime for Scala

v3.4.1

1 year ago

This is the thirty-first release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.4.x release. The primary purpose of this release is to address a minor link-time regression which manifested when extending IOApp with a class (not a trait) which was in turn extended by another class. In this scenario, the resulting main class would hang on exit if the intervening extension class had not been recompiled against Cats Effect 3.4.0. Note that this issue with separate compilation and IOApp does remain in a limited form: the MainThread executor is inaccessible when linked in this fashion. The solution is to ensure that all compilation units which extend IOApp (directly or indirectly) are compiled against Cats Effect 3.4.0 or later.

User-Facing Pull Requests

  • #3254 – Workaround for IOApp deadlock (@armanbilge)
  • #3255, #3253 – Documentation fixes and improvements (@iRevive)

Thank you, everyone!

v3.4.0

1 year ago

This is the thirtieth release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.4.x release. Note that source compatibility has been broken with 3.3.x in some minor areas. Since those changes require active choice on the part of users to decide the best adjusted usage for their specific scenario, we have chosen to not provide scalafixes which automatically patch the affected call sites.

A Note on Release Cadence

While Cats Effect minor releases are always guaranteed to be fully backwards compatible with prior releases, they are not forwards compatible with prior releases, and partially as a consequence of this, can (and often do) break source compatibility. In other words, sources which compiled and linked successfully against prior Cats Effect releases will continue to do so, but recompiling those same sources may fail against a subsequent minor release.

For this reason, we seek to balance the inconvenience this imposes on downstream users against the need to continually improve and advance the ecosystem. Our target cadence for minor releases is somewhere between once every three months and once every six months, with frequent patch releases shipping forwards compatible improvements and fixes in the interim.

Unfortunately, Cats Effect 3.3.0 was released over ten months ago, meaning that the 3.4.0 cycle has required considerably more time than usual to come to fruition. There are several reasons for this, but long and short is that this is expected to be an unusual occurrence. We currently expect to release Cats Effect 3.5.0 sometime in Spring 2023, in line with our target cadence.

Major Changes

As this has been a longer than usual development stretch (between 3.3.0 and 3.4.0), this release contains a large number of significant changes and improvements. Additionally, several improvements that we're very excited about didn't quite make the cutoff and have been pushed to 3.5.0. This section details some of the more impactful changes in this release.

High Performance Queue

One of the core concurrency utilities in Cats Effect is Queue. Despite its ubiquity in modern applications, the implementation of Queue has always been relatively naive, based entirely on immutable data structures, Ref, and Deferred. In particular, the core of the bounded Queue implementation since 3.0 looks like the following:

final class BoundedQueue[F[_]: Concurrent, A](capacity: Int, state: Ref[F, State[F, A]])

final case class State[F[_], A](
    queue: ScalaQueue[A],
    size: Int,
    takers: ScalaQueue[Deferred[F, Unit]],
    offerers: ScalaQueue[Deferred[F, Unit]])

The ScalaQueue type refers to scala.collection.immutable.Queue, which is a relatively simple Bankers Queue implementation within the Scala standard library. All end-user operations (e.g. take) within this implementation rely on Ref#modify to update internal state, with Deferred functioning as a signalling mechanism when take or offer need to semantically block (because the queue is empty or full, respectively).

This implementation has several advantages. Notably, it is quite simple and easy to reason about. This is actually an important property since lock-free queues, particularly multi-producer multi-consumer queues, are extremely complex to implement correctly. Additionally, as it is built entirely in terms of Ref and Deferred, it is usable in any context which has a Concurrent constraint on F[_], allowing for a significant amount of generality and abstraction within downstream frameworks.

Despite its simplicity, this implementation also does surprisingly well on performance metrics. Anecdotal use of Queue within extremely hot I/O processing loops shows that it is rarely, if ever, the bottleneck on performance. This is somewhat surprising precisely because it's implemented in terms of these purely functional abstractions, meaning that it is relatively representative of the kind of performance you can expect out of Cats Effect as an end user when writing complex concurrent logic in terms of the Concurrent abstraction.

Despite all this though, we always knew we could do better. Persistent, immutable data structures are not known for getting the absolute top end of performance out of the underlying hardware. Lock-free queues in particular have a very rich legacy of study and optimization, due to their central position in most practical applications, and it would be unquestionably beneficial to take advantage of this mountain of knowledge within Cats Effect. The problem has always been two fold: first, the monumental effort of implementing an optimized lock-free async queue essentially from scratch, and second, how to achieve this kind of implementation without leaking into the abstraction and forcing an Async constraint in place of the Concurrent one.

The constraint problem is particularly thorny, since numerous downstream frameworks have built around the fact that the naive Queue implementation only requires Concurrent, and it would not make much sense to force an Async constraint when no surface functionality is being changed or added (only performance improvements). However, any high-performance implementation would require access to Async, both to directly implement asynchronous suspension (rather than redirecting through Deferred) and to safely suspend the side-effects required to manipulate mutable data structures.

This problem has been solved by using runtime casing on the Concurrent instance behind the scenes. In particular, whenever you construct a Queue.bounded, the runtime type of that instance is checked to see if it is secretly an Async. If it is, the higher performance implementation is transparently used instead of the naive one. In practice, this should apply at almost all possible call sites, meaning that the new implementation represents an entirely automatic and behind the scenes performance improvement.

As for the implementation, we chose to start from the foundation of the industry-standard JCTools Project. In particular, we ported the MpmcArrayQueue implementation from Java to Scala, making slight adjustments along the way. In particular:

  • The pure Scala implementation can be cross-compiled to Scala.js (and Scala Native), avoiding the need for extra special casing
  • Several minor optimizations have been elided, most notably those which rely on sun.misc.Unsafe for manipulation of directional memory fences
  • Through the use of a statically allocated exception as a signalling mechanism, we were able to add support for null values without introducing extra boxing
  • Sizes are not quantized to powers of 2. This imposes a small but measurable cost on all operations, which must use modular arithmetic rather than bit masking to map around the ring buffer

All credit goes to Nitsan Wakart (and other JCTools contributors) for this data structure.

This implementation is used to contain the fundamental data within the queue, and it handles an enormous number of very subtle corner cases involving numerous producers and consumers all racing against each other to read from and write to the same underlying data, but it is insufficient on its own to implement the Cats Effect Queue. In particular, when offer fails on MpmcArrayQueue (because the queue is full), it simply rejects the value. When offer fails on Cats Effect's Queue, the calling fiber is blocked until space is available, encoding a form of backpressure that sits at the heart of many systems.

In order to achieve this semantic, we had to not only implement a fast bounded queue for the data, but also a fast unbounded queue to contain any suspended fibers which are waiting a condition on the queue. We could have used ConcurrentLinkedQueue (from the Java standard library) for this, but we can do even better on performance with a bit of specialization. Additionally, due to cancelation, each listener needs to be able to efficiently remove itself from the queue, regardless of how far along it is in line. To resolve these issues, Viktor Klang and myself have built a more optimized implementation based on atomic pointer chaining. It's actually possible to improve on this implementation even further (among other things, by removing branching), which should arrive in a future release.

Congratulations on traversing this entire wall of text! Have a pretty performance chart as a reward:

This has been projected onto a linear relative scale. You can find the raw numbers here. In summary, the new queues are between 2x and 4x faster than the old ones.

The bottom line on all of this is that any application which relies on queues (which is to say, most applications) should see an automatic improvement in performance of some magnitude. As mentioned at the top, the queue data structure itself does not appear to be the performance bottleneck in any practical application, but every bit helps, and free performance is still free performance!

Hardened Queue Semantics

As a part of the rework of the core data structures, it was decided to make a very subtle change to the semantics of the Queue data structure while under heavy load, particularly in true multi-producer, multi-consumer (MPMC) scenarios. Under certain circumstances, the previous implementation of Queue could actually lose data. This manifested when one fiber enqueued a value, while another fiber dequeued that value and was canceled during the dequeue. When this happened, it was possible for the value to have been removed from the underlying data structure but not fully returned from the poll effect, meaning that it could be lost without user-land code having any chance to access it within a finalizer.

This sounds like a relatively serious issue, though it's important to understand that the race condition which gives rise to this was vanishingly rare (to the point where no one has ever, to our knowledge, encountered this in the wild). However, fixing this semantic required reworking a lot of the core guarantees offered by the data structure. In particular, it is now no longer strictly guaranteed in all cases while under contention that elements read from a queue by multiple concurrent consumers will be read in exactly insertion order.

More specifically, imagine a situation where you have two consumers and two producers on an empty queue. Consumer A attaches first (using poll), followed by consumer B. Immediately after this, the first producer writes value 1, followed by the second producer writing value 2. Critically, both the first and second producer need to write to the queue at nearly exactly the same moment.

With the previous implementation of Queue, users could rely on an ironclad guarantee that consumer A would get value 1, while consumer B would get value 2. Now, this is no longer strictly guaranteed. It is possible for B to get 1 while A gets 2. In fact, there is an even stranger version of this race condition which only involves a single producer but still generates a similar outcome: consumer A calls poll, and sometime later consumer B calls poll at the same moment that the single producer offers item 1. When this scenario arises, it is possible for B to get item 1 and A to get nothing at all, despite the fact that A has been waiting patiently for some significant length of time.

More precisely, the new Queue no longer strictly guarantees fairness across multiple consumers when under concurrent contention. This loss of fairness can, under certain circumstances, manifest as a corruption of ordering, though one which is unobservable except if the user were to somehow coordinate precise timestamps across multiple consuming fibers. And, as it turns out, the weakening of these guarantees are directly connected to the fix for the (rare) loss of data during fiber cancelation.

To be clear, multi-consumer scenarios are rather rare to begin with, and I cannot think of a single circumstance under which someone would have a multi-consumer Queue and have any expectation of strong ordering or fairness between their consumers. As an appeal to authority, this kind of loss of fairness is extremely standard across all MPMC queue implementations in other languages and runtimes, specifically because data loss is a much more dangerous and impactful outcome and must be avoided at all costs.

To that end, it is considered very unlikely that users will even notice this change, but it is still a significant and subtle adjustment in the core semantics of Queue. The upside of all of this is users can now rely on the guarantee that, if an effect offer(a) completes successfully, then the value a will be "in the queue" and will be later readable by a poll effect. Additionally, if and only if poll removes the element, a, from the queue, it will complete successfully even if externally canceled; conversely, if poll is canceled before it removes a from the queue, then a will remain available for subsequent polls. Thus, data loss is avoided in all cases.

More Robust Dispatcher (and Supervisor!)

Dispatcher was one of the most significant changes from Cats Effect 2 to 3. In particular, it addresses a long-standing annoyance when working with effect types: the tongue-and-cheek termed "Soviet interop" case, where unsafe code calls you. In previous versions of Cats Effect, this scenario was handled by the ConcurrentEffect typeclass and the universally confusing runAsync method.

The way in which Dispatcher works is effectively as a fiber-level event dispatch pattern: a single fiber (the dispatcher) polls an asynchronous queue which contains IO[Any] values (the units of work), and when a new work unit is acquired, the dispatcher spawns a fiber for that unit and continues polling. This type of pattern is extremely general: it doesn't matter how long the work units need to complete, they cannot interfere with each other because each is proactively relocated to its own fiber.

Additionally, when CE3 was released, we weren't entirely certain how users wanted to use Dispatcher in practical applications. It was believed likely that most users would create a single top-level Dispatcher for their entire application, and thus the implementation of the event dispatch fibers was optimized with the assumption that a single Dispatcher instance would be under heavy concurrent load. These optimizations are fairly robust, but they do come with pair of costs: there is no guarantee of ordering between two sequentially-submitted work units (IO[Any] values), and every unit of work must pay the price of spawning a new fiber regardless of how long that work unit needs to execute. The former issue is well-exemplified by the following:

Dispatcher[IO] use { disp =>
  for {
    _ <- IO(disp.unsafeRunAndForget(ioa))
    _ <- IO(disp.unsafeRunAndForget(iob))
    // ... wait around for stuff…
  } yield ()
}

In the above, we submit ioa strictly before we submit iob, but iob may actually execute first! This creates a whole series of strange issues that users must account for in common Dispatcher scenarios, particularly when using it as a mechanism for inserting ordered items into Queue from impure event handlers. Accounting for this ordering issue often imposes significant overhead on user code, more than undoing the benefits of Dispatcher's own optimizations. Additionally, if ioa and iob are extremely cheap (e.g. q.offer(a)), the overhead of calling .start to create a wrapping fiber for each will exceed the total runtime of the operation itself. Fiber spawning is extremely cheap, but it's not as cheap as inserting into a queue!

For all of these reasons, Dispatcher has been adjusted to have two major modes: parallel and sequential. The previous default mode of operation corresponds to the parallel mode. When you aren't sure which to pick, select this one. The sequential mode adjusts Dispatcher's optimization mode for more localized usage (e.g. one per request, which is a common paradigm in practice), offers strong ordering guarantees (in the above example, ioa will run before iob, guaranteed), and much more efficient work unit execution (by removing the fiber wrapping). The danger is that units of work can interfere with each other, and thus sequential is not an appropriate mode for Dispatchers which are shared across an entire application.

If that weren't enough, Dispatcher has also received a new configuration option that applies to both parallel and sequential modes: await = true. In the above example, there is a deceptively annoying comment: // wait around for stuff…. Most people who have used Dispatcher in anger have received the dreaded dispatcher already shutdown error message. This happens when the use scope for the Dispatcher resource is closed before the work unit finishes. When this happens, Dispatcher invalidates its internal state, cancels all current work fibers, and shuts down. This is a very safe default, but as it turns out, this is often not what people want.

The general expectation is often that Dispatcher will simply wait for all outstanding work to finish before allowing the use block to terminate, rather than aggressively canceling all outstanding tasks. With the addition of the new await = true parameter, this is now possible. In 3.4.0, we can rewrite the above example in a more natural fashion, such that it has the guarantees we expect:

Dispatcher.sequential[IO](await = true) use { disp =>
  for {
    _ <- IO(disp.unsafeRunAndForget(ioa))
    _ <- IO(disp.unsafeRunAndForget(iob))
  } yield ()
}

There is no need to explicitly wait at the end of use: Dispatcher will handle the waiting on our behalf. Meanwhile, ioa and iob will be run in a strictly sequential fashion, with strong ordering between the two. If these actions are, for example, inserting elements into a Queue, those elements will arrive in the target queue in exactly the order issued above.

In order to accommodate these changes, the old Dispatcher.apply constructor method has been deprecated. The simplest way to migrate old usage to the new API is to simply replace all Dispatcher[IO] call sites with Dispatcher.parallel[IO]. However, users are invited to carefully consider what semantics they need, since it is likely that the newly added configurations will be more optimal for their specific use-case.

Experimental(!!) Scala Native Support

Cats Effect has supported both the JVM and Scala.js as first class citizens of the ecosystem ever since its inception. This has brought with it several challenges, owing to the fact that JavaScript runtimes operate at a very different level of abstraction from the JVM, and notably only support a single thread of execution. However, support for JavaScript based environments has undeniably improved the robustness and generality of the framework, as well as opened up significant opportunities within the Typelevel ecosystem as a whole (for example).

With the release of Cats Effect 3.4.0, we are officially adding experimental support for a third execution environment: Scala Native. In some sense, Scala Native is similar to JavaScript in that it runs in a single thread of execution with very limited support for the Java Standard Library. At the same time, Scala Native is also considerably lower-level than either JavaScript or the JVM. The lack of multi-threading together with the lack of a high-level asynchronous runtime means that it was, until Cats Effect 3.4, actually quite challenging to even formulate an application which could benefit from asynchronous support, since it was not particularly meaningful to discuss asynchronous I/O itself.

Addressing this issue directly has resulted in a large series of discussions and re-thinking around the core of Cats Effect's fiber-aware runtime. Big plans are in the works which should result in massive performance and stability benefits for JVM users of the Typelevel ecosystem. In the meantime, Cats Effect represents the very first full-fledged green threads runtime for Scala Native, and thanks to the hard work of Arman Bilge, the majority of the Typelevel ecosystem comes along with it. It is now possible to write an HTTP microservice in Scala and compile it to a native executable which runs entirely without the JVM, with startup times and memory overhead dramatically lower than those required by Graal Native Image.

With all of that said, this work is still in its early stages. We invite users to try it out and give us feedback! It is important to understand though that Scala Native itself is still relatively experimental, and users should not expect a guarantee that future semantics and usage-patterns will be consistent with those offered today. We will continue to iterate extensively on this support, as well as its implications for the existing JVM and JavaScript environments (which remain fully supported).

Disabled Tracing (by default) in Browser Environments

One of the major additions in Cats Effect 3.3.0 was support for fiber tracing when running under Scala.js. This is an incredibly useful feature which provides enhanced exceptions and even fiber dumps on JavaScript runtimes. Unfortunately, it also comes with a much higher cost than on the JVM.

To provide some context, fiber tracing on the JVM imposes a runtime penalty on most operations of around 25%. This is a lot less than it seems, since these operations are themselves extremely fast. For example, IO#flatMap executes in around 7 nanoseconds on my (x86) laptop, and thus a 25% overhead means that the tracing is increasing the runtime of this operation by less than 2 nanoseconds. In practice, this overhead is simply unnoticeable in application level metrics, which is why it tracing is enabled by default.

Unfortunately, on V8, JavaScript tracing overhead is much closer to 90-100% of the operation's runtime. The reasons for this seem to be somewhat complex and are likely related to optimization assumptions in V8 itself around exceptions, but this investigation is still ongoing. Now, despite this significantly higher overhead, tracing is still well below the performance threshold at which it would appear outside of microbenchmarks in a V8-based runtime… at least, when run as a server-side process.

Browsers are considerably more performance-sensitive, and as it turns out, the added overhead of tracing produces very user-visible artifacts and delays in the UI rendering process for most browser-based applications. For that reason, tracing is now disabled by default in all browser-based environments, while remaining enabled when running on the JVM or under Node.js. In the future, if we are able to find a way to reduce tracing overhead down to the point where it is imperceptible in browser environments, this default will be reversed.

Improved Runtime Performance on Foreign Runnables

When the high-performance fiber runtime was first designed for Cats Effect 3, it was optimized for fibers specifically. However, because of the Cats Effect API, it was necessary to still provide support for executing arbitrary Runnables. Users can easily access this functionality by sequencing the exeuctionContext effect, either from IO or Async:

IO.executionContext flatMap { ec =>
  IO.delay(ec.execute(new Runnable { def run() = println("sup?") }))
}

Anyway, the way in which this was implemented by the runtime, all Runnables submitted in this fashion would get wrapped in a new dedicated fiber, and then that fiber would be submitted to the runtime. Literally, the following:

def handleRunnable(r: Runnable): IO[Unit] = IO.delay(r.run())

While this is relatively fast, it isn't as fast as just… running the Runnable directly. The trick is in achieving this simplification without inserting extra complexity (and cost!) into the core fiber runtime.

In 3.4.0, this optimization has been achieved. In fact, the ExecutionContext you can obtain using IO.executionContext is now the fastest and most optimized general purpose thread pool on the JVM, and can be safely used in all contexts which do not require thread blocking.

Serializable Support

A long-standing annoyance in Cats Effect, dating back all the way to 1.0, is the fact that neither IO nor anything related to it is Serializable. This limitation is generally invisible in normal usage of the framework, but it becomes crippling when used with frameworks like Spark which rely heavily on serialization under the surface. This limitation has now been resolved, and any programs written with Cats Effect, whether in a concrete or a polymorphic style, should behave gracefully when serialization is required.

Fixed Console on Scala.js

Console has always been in a rather weird place, particularly on Scala.js. While the println and errorln effects are reasonably straightforward when envisioned as purely asynchronous mechanisms, the readLine effect is just not something that can exist on JavaScript, and even on the JVM it has some notable issues. Amusingly, readLine never actually worked on Scala.js, and it only compiled because Scala.js itself implements a stub version of System.in which does nothing. Thus, we are very confident that there are no users impacted by this change in their production code, since that code would not have worked in the first place!

To that end, readLine on Scala.js has been entirely deprecated. While it is possible to implement something a bit like readLine for Node.js, it would require global state to match the current type signature, and even then there would be significant caveats. Users are advised to consider the fs2.io.stdin stream within the Fs2 framework for a Cats Effect powered version of the standard input stream which works correctly on Node.js. There is no readLine equivalent for browser environments.

In order to fix some of the other unfortunate consequences of the println and errorln implementations on JavaScript runtimes, we have taken the additional step of reimplementing these primitives to correctly manage buffering and backpressure. Unfortunately, this does mean that the materialization constraint on Console has been narrowed from Sync to Async. Binary compatibility is maintained through the use of private stubs, and these stubs will continue to function as before for previously-linked code, but materialization sites for Console[F] which only have a Sync[F] constraint will now fail to compile on Scala.js. Narrowing the constraint to Async[F] will resolve this issue and results in a runtime semantic which is more in line with expectations.

Longer term, we are considering making more significant changes to the readLine effect, since it generally does not work the way that anyone expects it to on any platform (including the JVM), and this behavior is simply fundamental in the way these types of streams operate.

SecureRandom

Certain applications require stronger (usually cryptographic) entropic properties from random number generators. Unfortunately, Cats Effect has only provided a single "one size fits all" Random capability, which can and has been often implemented in ways which are not cryptographically strong and thus may represent a security vulnerability in the right scenario.

To resolve this limitation, 3.4.0 introduces the SecureRandom capability. SecureRandom inherits from Random and thus may be used in all cases where Random was previously expected. However, the only constructors provided are based on the platform-specific secure RNGs of the underlying runtime, rather than other potentially faster or more convenient sources.

Unfortunately, there is no way we can guarantee, statically, that there is no possible insecure implementation of the SecureRandom trait. It is hoped, however, that the name alone along with the existence of Random will discourage such abuse. At the very least, this allows API authors the ability to encode the desire for secure randomness in their type signatures, whereas previously the best that could be done was Random alone.

Incorporate MapRef

Chris Davenport's MapRef has long been a major component of the broader Cats Effect ecosystem under the banner of the Davenverse. Its relatively simple functionality basically reduces to a function K => Ref[F, V], making it easy to get a set of keyed refs or even view an existing Map or (Map-like data structure) as if it were a set of keyed refs.

Due to the convenience and universality of Ref, this kind of use-case shows up a lot in practice. For this reason, Chris has kindly agreed to donate his implementation to the Cats Effect std module, removing the need for users to pull in a separate library to obtain this functionality.

CPU Starvation Checking

As with all schedulers, Cats Effect must take great care to ensure that fibers are not starved out of access to underlying carrier threads. This kind of starvation condition most frequently happens when blocking tasks are inadvertently scheduled on the compute pool (rather than safely shunted using blocking or interruptible), but it can also happen if the unit of granularity of CPU-bound tasks is poorly chosen (see: the IO.cede scaladoc for a longer discussion on this possibility and how it can be remedied).

Because of several core elements of Cats Effect's design, starvation problems are not particularly common in applications written within the ecosystem, but they are not unheard of. Unfortunately, due to the nature of starvation, it can be exceptionally difficult to detect that the problem is even happening in practice. Up until now, the primary signal of some sort of starvation scenario has been end-to-end metrics showing a high degree of responsiveness jitter (in microservice terms, this would translate into oddly high tail latencies relative to base load).

In this release, we have taken direct inspiration from Akka and added a starvation checker with a tunable expectation which prints a warning to standard error if there is reason to believe that the Cats Effect scheduler is starving, regardless of the cause:

[WARNING] Your CPU is probably starving. Consider increasing the granularity
of your delays or adding more cedes. This may also be a sign that you are
unintentionally running blocking I/O operations (such as File or InetAddress)
without the blocking combinator.

By default, the starvation checker will run once per second, starting ten seconds after the application launches. Every time it iterates, it measures the amount of time that was required for the underlying runtime to actually schedule the check and compares that to the intended interval (defaulting to one second). If the real time exceeds 110% of the intended time, that's a good indication that the scheduler is unable to keep up with new incoming work and the warning is generated. Put another way, the checker is measuring the delay that would also apply to any new incoming connection, any response to an outgoing connection, etc etc. When that delay exceeds the threshold, the scheduler is considered to be "starving".

All of these intervals and thresholds are tunable, and the ideal values will vary somewhat from case to case. We chose a relatively conservative threshold (100 milliseconds) as the default. For extremely well-tuned applications, it may be possible to decrease this threshold meaningfully, producing a more sensitive measurement. Additionally, it is possible to disable the check entirely by configuring the cpuStarvationCheckInitialDelay value within IORuntimeConfig to Duration.Inf. The starvation check imposes a microscopic amount of additional load on the system; this is unlikely to be measurable even in extremely sensitive environments, though the configuration is still provided in the event this proves false.

All credit here is due to the Akka team, which developed the original version of this concept. We simply ported it to Cats Effect.

User-Facing Pull Requests

  • #3236 – Record CPU starvation metrics (@janstenpickle)
  • #3230 – Add missing poll to bothOutcome (@armanbilge)
  • #3237 – Added MPSC Queue benchmarks (@djspiewak)
  • #3223 – Optimize NonFatal(...) checks (@armanbilge)
  • #3228 – Optimized ByteStack for JS (@armanbilge)
  • #3222 – Optimized ArrayStack for JS (@armanbilge)
  • #3224 – Relax constraints for Monad[Resource] and Monoid[Resource] (@armanbilge)
  • #3218 – Implement immutable-Map backed MapRefs with Ref.lens (@armanbilge)
  • #3215 – Fixed race condition in which canceled offers could result in lost takes (@djspiewak)
  • #3211 – Fallback to concurrent queue for large capacities (@armanbilge)
  • #3196 – Improve trace dumps of short-lived fibers (@RafalSumislawski)
  • #3204 – Fix IOApp bincompat (@armanbilge)
  • #3090, #3202 – CPU starvation checker (@TimWSpence)
  • #3197 – Avoid spurious wake-up in Concurrent queues at capacity (@djspiewak)
  • #3193 – Fixed a few issues in the async queue (@djspiewak)
  • #3194 – Reimplemented Queue.synchronous to resolve fifo issues (@djspiewak)
  • #3186 – Fix NullPointerException in RingBuffer#toList (@RafalSumislawski)
  • #3183 – Fix deregistration of fiber from monitoring when IOCont.Get gets cancelled (@RafalSumislawski)
  • #3182 – Fix issue with fibers not getting deregistered from monitoring... (@RafalSumislawski)
  • #3157 – Suppress supervisor checkRestart (@djspiewak)
  • #3156 – Fixed Dispatcher cancelation interactions with new modes (@djspiewak)
  • #3175 – Fix NullPointerException during fiber dump (@RafalSumislawski)
  • #3174 – Update to Scala 3.2.0 (@armanbilge)
  • #3165 – Enabled configurability of unhandled fiber exceptions (@djspiewak)
  • #3143 – Add Mutex (@BalmungSan)
  • #3057 – Scala Native (@armanbilge)
  • #3098 – Accept Duration for all time-based combinators (@armanbilge)
  • #3154 – Install custom IOApp#runtime as IORuntime.global (@armanbilge)
  • #3127 – resume interrupt of calling thread during shutdown (@blondacz)
  • #3110 – Handle ClosedByInterruptException properly in interruptible (@durban)
  • #3080 – Override memoize for Resource (@armanbilge)
  • #3102 – Remove the IO.defer from IOFiber#cancel (@vasilmkd)
  • #3062 – MapRef: small improvements (@durban)
  • #2464 – Incorporate MapRef into Std (@ChristopherDavenport)
  • #2917 – High(er) performance tryTakeN for Queue.bounded (@djspiewak)
  • #2965 – Fixed TestControl#nextInterval and added tickFor (@djspiewak)
  • #2954 – Generalize timeout and friends to take a Duration (@b3nk3i)
  • #2976 – add useEval method to Resource (@mberndt123)
  • #2914 – High(er) performance unbounded queue (@djspiewak)
  • #2885 – High(er) performance async Queue (@djspiewak)
  • #2807 – Enable configuration of non-daemon thread logging within IOApp (@djspiewak)
  • #3000 – Harden Queue cancelation semantics (@djspiewak)
  • #3034 – Update cats to 2.8.0 (@typelevel-steward)
  • #2905 – Add a SecureRandom algebra (@rossabaker)
  • #2901 – Relax Ref#access semantics if set more than once (@armanbilge)
  • #3008 – Print fatal errors (@sbly)
  • #3007 – Add combinators for choosing random elements of collections (@cb372)
  • #2869 – Added support for configuring compute pool error reporter (@djspiewak)
  • #3011 – Disable tracing for browsers by default on JS (@armanbilge)
  • #3003 – PQueue: add tryTakeN and tryOfferN (@caiormt)
  • #3001 – Add Async#executor (@armanbilge)
  • #2967 – Add Ref.empty Monoid-based shortcut (@ivan)
  • #2966 – Relax LensRef constraints (@armanbilge)
  • #2959 – Add configurable Dispatcher (@iRevive)
  • #2941 – Add GenTemporal#cachedRealTime (@brendanmaguire)
  • #2951 – Implemented configurability for Supervisor scope termination (@djspiewak)
  • #2895 – Use Performance API for high-precision time on JS (@armanbilge)
  • #2940 – Added orElse combinator to IO (@walesho)
  • #2918 – Make Queue#tryTakeN, tryTakeFrontN and tryTakeBackN return F[List[A]] (@leusgalvan)
  • #2416 – Support microsecond precision in realTime and realTimeInstant on the JVM (@brendanmaguire)
  • #2749 – Adds CommutativeApplicative instance for Resource.Par (@skennedy)
  • #2881 – Handle exceptions of foreign Runnables (@vasilmkd)
  • #2873 – The WSTP can run Futures just as fast as ExecutionContext.global (@vasilmkd)
  • #2846 – Add unsafeRunSync* syntax for JS Dispatcher (@armanbilge)
  • #2360 – Add serialization support (@gagandeepkalra)
  • #2359 – Add embedError convenience function (@ChristopherDavenport)
  • #2864 – Exclude completed fibers from the live runtime snapshot (@iRevive)
  • #2732 – Expose an IOApp#MainThread executor (@djspiewak)
  • #2829 – Indicate a completed state in IOFiber#toString (@iRevive)
  • #2852 – Add IO.randomUUID convenience method (@Daenyth)
  • #2843 – Add Semaphore#tryPermit (@TimWSpence)
  • #2835 – Add Async#syncStep (@armanbilge)
  • #2853 – Configurable caching of blocking threads (@vasilmkd)
  • #2811 – Support shutting down multiple runtimes (@vasilmkd)
  • #2809 – Refine derived type classes (@ybasket)
  • #2820 – Move the class lookup in the correct branch (@vasilmkd)
  • #2819 – Put instances for IO.Par into implicit scope (@armanbilge)
  • #2657 – Introduce IORuntimeBuilder (@majk-p)
  • #2787 – Lock-free test context (@vasilmkd)
  • #2798 – Be more paranoid when making JS Console (@armanbilge)
  • #2604 – Fix Console on JS (@armanbilge)
  • #2785 – Simplify randomness in TestContext (@vasilmkd)
  • #2780 – Add recover and recoverWith to IO (@kamilkloch)
  • #2777 – Add andWait method to IO (@iRevive)
  • #2650 – Implement std.Env[F] for cross-platform environment variables (@armanbilge)
  • #2757 – Add IO.rethrow and IO.debug (@BalmungSan)
  • #2670 – Add batch offer/take methods to Queue and Dequeue (@etspaceman)
  • #2688 – Make Resource#allocatedCase public (@alexandrustana)
  • #2613 – Sync run convenience function (@djspiewak)
  • #2643 – Adds joinWithUnit for Fiber (@etspaceman)
  • #2659 – Add fromCompletionStage to Async / IO (@etspaceman)
  • #3166, #3163, #3160, #3153, #3084, #3139, #3113, #3122, #3071, #3134, #3141, #3120, #3112, #3115, #3114, #3106, #3107, #3082, #3046, #3009, #2994, #2960, #2883, #2870, #2834, #2806, #2781, #2786, #2758 – Documentation fixes and improvements (@floating-cat, @daniel-ciocirlan, @samspills, @barryoneill, @yoshinorin, @bplommer, @armanbilge, @s5bug, @benhutchison, @martinprobson, @rossabaker, @b-vennes, @armanbilge, @iRevive, @ivanlp10n2, @jisantuc, @arturaz, @djspiewak)

Very special thanks to all of you!

v3.4.0-RC2

1 year ago

This is the thirtieth release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.4.x release. Note that source compatibility has been broken with 3.3.x in some minor areas. Since those changes require active choice on the part of users to decide the best adjusted usage for their specific scenario, we have chosen to not provide scalafixes which automatically patch the affected call sites.

For a more comprehensive treatment of all changes between 3.3.x and 3.4.0, please see the RC1 release notes. The following notes only cover the changes between RC1 and RC2.

User-Facing Pull Requests

  • #3197 – Avoid spurious wake-up in Concurrent queues at capacity (@djspiewak)
  • #3193 – Fixed a few issues in the async queue (@djspiewak)
  • #3194 – Reimplemented Queue.synchronous to resolve fifo issues (@djspiewak)
  • #3186 – Fix NullPointerException in RingBuffer#toList (@RafalSumislawski)
  • #3183 – Fix deregistration of fiber from monitoring when IOCont.Get gets cancelled (@RafalSumislawski)
  • #3182 – Fix issue with fibers not getting deregistered from monitoring... (@RafalSumislawski)

A very special thanks to all!

v3.4.0-RC1

1 year ago

This is the thirtieth release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.4.x release. Note that source compatibility has been broken with 3.3.x in some minor areas. Since those changes require active choice on the part of users to decide the best adjusted usage for their specific scenario, we have chosen to not provide scalafixes which automatically patch the affected call sites.

With this release, we're taking the unusual step of going through a release candidate cycle prior to 3.4.0 final. This process is designed to make it easier for the downstream ecosystem to try the new release and identify subtle incompatibilities or real world issues that are hard for us to entirely eliminate in-house. Binary- and source-compatibility is not guaranteed between release candidates, or between RCs and the final release, though major changes are very unlikely. If you represent a downstream framework or application, please do take the time to try out this release candidate and report any issues! We're particularly interested in feedback from applications which make heavy use of Queue.

A Note on Release Cadence

While Cats Effect minor releases are always guaranteed to be fully backwards compatible with prior releases, they are not forwards compatible with prior releases, and partially as a consequence of this, can (and often do) break source compatibility. In other words, sources which compiled and linked successfully against prior Cats Effect releases will continue to do so, but recompiling those same sources may fail against a subsequent minor release.

For this reason, we seek to balance the inconvenience this imposes on downstream users against the need to continually improve and advance the ecosystem. Our target cadence for minor releases is somewhere between once every three months and once every six months, with frequent patch releases shipping forwards compatible improvements and fixes in the interim.

Unfortunately, Cats Effect 3.3.0 was released over ten months ago, meaning that the 3.4.0 cycle has required considerably more time than usual to come to fruition. There are several reasons for this, but long and short is that this is expected to be an unusual occurrence. We currently expect to release Cats Effect 3.5.0 sometime in Spring 2023, in line with our target cadence.

Major Changes

As this has been a longer than usual development stretch (between 3.3.0 and 3.4.0), this release contains a large number of significant changes and improvements. Additionally, several improvements that we're very excited about didn't quite make the cutoff and have been pushed to 3.5.0. This section details some of the more impactful changes in this release.

High Performance Queue

One of the core concurrency utilities in Cats Effect is Queue. Despite its ubiquity in modern applications, the implementation of Queue has always been relatively naive, based entirely on immutable data structures, Ref, and Deferred. In particular, the core of the bounded Queue implementation since 3.0 looks like the following:

final class BoundedQueue[F[_]: Concurrent, A](capacity: Int, state: Ref[F, State[F, A]])

final case class State[F[_], A](
    queue: ScalaQueue[A],
    size: Int,
    takers: ScalaQueue[Deferred[F, Unit]],
    offerers: ScalaQueue[Deferred[F, Unit]])

The ScalaQueue type refers to scala.collection.immutable.Queue, which is a relatively simple Bankers Queue implementation within the Scala standard library. All end-user operations (e.g. take) within this implementation rely on Ref#modify to update internal state, with Deferred functioning as a signalling mechanism when take or offer need to semantically block (because the queue is empty or full, respectively).

This implementation has several advantages. Notably, it is quite simple and easy to reason about. This is actually an important property since lock-free queues, particularly multi-producer multi-consumer queues, are extremely complex to implement correctly. Additionally, as it is built entirely in terms of Ref and Deferred, it is usable in any context which has a Concurrent constraint on F[_], allowing for a significant amount of generality and abstraction within downstream frameworks.

Despite its simplicity, this implementation also does surprisingly well on performance metrics. Anecdotal use of Queue within extremely hot I/O processing loops shows that it is rarely, if ever, the bottleneck on performance. This is somewhat surprising precisely because it's implemented in terms of these purely functional abstractions, meaning that it is relatively representative of the kind of performance you can expect out of Cats Effect as an end user when writing complex concurrent logic in terms of the Concurrent abstraction.

Despite all this though, we always knew we could do better. Persistent, immutable data structures are not known for getting the absolute top end of performance out of the underlying hardware. Lock-free queues in particular have a very rich legacy of study and optimization, due to their central position in most practical applications, and it would be unquestionably beneficial to take advantage of this mountain of knowledge within Cats Effect. The problem has always been two fold: first, the monumental effort of implementing an optimized lock-free async queue essentially from scratch, and second, how to achieve this kind of implementation without leaking into the abstraction and forcing an Async constraint in place of the Concurrent one.

The constraint problem is particularly thorny, since numerous downstream frameworks have built around the fact that the naive Queue implementation only requires Concurrent, and it would not make much sense to force an Async constraint when no surface functionality is being changed or added (only performance improvements). However, any high-performance implementation would require access to Async, both to directly implement asynchronous suspension (rather than redirecting through Deferred) and to safely suspend the side-effects required to manipulate mutable data structures.

This problem has been solved by using runtime casing on the Concurrent instance behind the scenes. In particular, whenever you construct a Queue.bounded, the runtime type of that instance is checked to see if it is secretly an Async. If it is, the higher performance implementation is transparently used instead of the naive one. In practice, this should apply at almost all possible call sites, meaning that the new implementation represents an entirely automatic and behind the scenes performance improvement.

As for the implementation, we chose to start from the foundation of the industry-standard JCTools Project. In particular, we ported the MpmcArrayQueue implementation from Java to Scala, making slight adjustments along the way. In particular:

  • The pure Scala implementation can be cross-compiled to Scala.js (and Scala Native), avoiding the need for extra special casing
  • Several minor optimizations have been elided, most notably those which rely on sun.misc.Unsafe for manipulation of directional memory fences
  • Through the use of a statically allocated exception as a signalling mechanism, we were able to add support for null values without introducing extra boxing
  • Sizes are not quantized to powers of 2. This imposes a small but measurable cost on all operations, which must use modular arithmetic rather than bit masking to map around the ring buffer

All credit goes to Nitsan Wakart (and other JCTools contributors) for this data structure.

This implementation is used to contain the fundamental data within the queue, and it handles an enormous number of very subtle corner cases involving numerous producers and consumers all racing against each other to read from and write to the same underlying data, but it is insufficient on its own to implement the Cats Effect Queue. In particular, when offer fails on MpmcArrayQueue (because the queue is full), it simply rejects the value. When offer fails on Cats Effect's Queue, the calling fiber is blocked until space is available, encoding a form of backpressure that sits at the heart of many systems.

In order to achieve this semantic, we had to not only implement a fast bounded queue for the data, but also a fast unbounded queue to contain any suspended fibers which are waiting a condition on the queue. We could have used ConcurrentLinkedQueue (from the Java standard library) for this, but we can do even better on performance with a bit of specialization. Additionally, due to cancelation, each listener needs to be able to efficiently remove itself from the queue, regardless of how far along it is in line. To resolve these issues, Viktor Klang and myself have built a more optimized implementation based on atomic pointer chaining. It's actually possible to improve on this implementation even further (among other things, by removing branching), which should arrive in a future release.

Congratulations on traversing this entire wall of text! Have a pretty performance chart as a reward:

This has been projected onto a linear relative scale. You can find the raw numbers here. In summary, the new queues are between 2x and 4x faster than the old ones.

The bottom line on all of this is that any application which relies on queues (which is to say, most applications) should see an automatic improvement in performance of some magnitude. As mentioned at the top, the queue data structure itself does not appear to be the performance bottleneck in any practical application, but every bit helps, and free performance is still free performance!

Hardened Queue Semantics

As a part of the rework of the core data structures, it was decided to make a very subtle change to the semantics of the Queue data structure while under heavy load, particularly in true multi-producer, multi-consumer (MPMC) scenarios. Under certain circumstances, the previous implementation of Queue could actually lose data. This manifested when one fiber enqueued a value, while another fiber dequeued that value and was canceled during the dequeue. When this happened, it was possible for the value to have been removed from the underlying data structure but not fully returned from the poll effect, meaning that it could be lost without user-land code having any chance to access it within a finalizer.

This sounds like a relatively serious issue, though it's important to understand that the race condition which gives rise to this was vanishingly rare (to the point where no one has ever, to our knowledge, encountered this in the wild). However, fixing this semantic required reworking a lot of the core guarantees offered by the data structure. In particular, it is now no longer strictly guaranteed in all cases while under contention that elements read from a queue by multiple concurrent consumers will be read in exactly insertion order.

More specifically, imagine a situation where you have two consumers and two producers on an empty queue. Consumer A attaches first (using poll), followed by consumer B. Immediately after this, the first producer writes value 1, followed by the second producer writing value 2. Critically, both the first and second producer need to write to the queue at nearly exactly the same moment.

With the previous implementation of Queue, users could rely on an ironclad guarantee that consumer A would get value 1, while consumer B would get value 2. Now, this is no longer strictly guaranteed. It is possible for B to get 1 while A gets 2. In fact, there is an even stranger version of this race condition which only involves a single producer but still generates a similar outcome: consumer A calls poll, and sometime later consumer B calls poll at the same moment that the single producer offers item 1. When this scenario arises, it is possible for B to get item 1 and A to get nothing at all, despite the fact that A has been waiting patiently for some significant length of time.

More precisely, the new Queue no longer strictly guarantees fairness across multiple consumers when under concurrent contention. This loss of fairness can, under certain circumstances, manifest as a corruption of ordering, though one which is unobservable except if the user were to somehow coordinate precise timestamps across multiple consuming fibers. And, as it turns out, the weakening of these guarantees are directly connected to the fix for the (rare) loss of data during fiber cancelation.

To be clear, multi-consumer scenarios are rather rare to begin with, and I cannot think of a single circumstance under which someone would have a multi-consumer Queue and have any expectation of strong ordering or fairness between their consumers. As an appeal to authority, this kind of loss of fairness is extremely standard across all MPMC queue implementations in other languages and runtimes, specifically because data loss is a much more dangerous and impactful outcome and must be avoided at all costs.

To that end, it is considered very unlikely that users will even notice this change, but it is still a significant and subtle adjustment in the core semantics of Queue. The upside of all of this is users can now rely on the guarantee that, if an effect offer(a) completes successfully, then the value a will be "in the queue" and will be later readable by a poll effect. Additionally, if and only if poll removes the element, a, from the queue, it will complete successfully even if externally canceled; conversely, if poll is canceled before it removes a from the queue, then a will remain available for subsequent polls. Thus, data loss is avoided in all cases.

More Robust Dispatcher (and Supervisor!)

Dispatcher was one of the most significant changes from Cats Effect 2 to 3. In particular, it addresses a long-standing annoyance when working with effect types: the tongue-and-cheek termed "Soviet interop" case, where unsafe code calls you. In previous versions of Cats Effect, this scenario was handled by the ConcurrentEffect typeclass and the universally confusing runAsync method.

The way in which Dispatcher works is effectively as a fiber-level event dispatch pattern: a single fiber (the dispatcher) polls an asynchronous queue which contains IO[Any] values (the units of work), and when a new work unit is acquired, the dispatcher spawns a fiber for that unit and continues polling. This type of pattern is extremely general: it doesn't matter how long the work units need to complete, they cannot interfere with each other because each is proactively relocated to its own fiber.

Additionally, when CE3 was released, we weren't entirely certain how users wanted to use Dispatcher in practical applications. It was believed likely that most users would create a single top-level Dispatcher for their entire application, and thus the implementation of the event dispatch fibers was optimized with the assumption that a single Dispatcher instance would be under heavy concurrent load. These optimizations are fairly robust, but they do come with pair of costs: there is no guarantee of ordering between two sequentially-submitted work units (IO[Any] values), and every unit of work must pay the price of spawning a new fiber regardless of how long that work unit needs to execute. The former issue is well-exemplified by the following:

Dispatcher[IO] use { disp =>
  for {
    _ <- IO(disp.unsafeRunAndForget(ioa))
    _ <- IO(disp.unsafeRunAndForget(iob))
    // ... wait around for stuff…
  } yield ()
}

In the above, we submit ioa strictly before we submit iob, but iob may actually execute first! This creates a whole series of strange issues that users must account for in common Dispatcher scenarios, particularly when using it as a mechanism for inserting ordered items into Queue from impure event handlers. Accounting for this ordering issue often imposes significant overhead on user code, more than undoing the benefits of Dispatcher's own optimizations. Additionally, if ioa and iob are extremely cheap (e.g. q.offer(a)), the overhead of calling .start to create a wrapping fiber for each will exceed the total runtime of the operation itself. Fiber spawning is extremely cheap, but it's not as cheap as inserting into a queue!

For all of these reasons, Dispatcher has been adjusted to have two major modes: parallel and sequential. The previous default mode of operation corresponds to the parallel mode. When you aren't sure which to pick, select this one. The sequential mode adjusts Dispatcher's optimization mode for more localized usage (e.g. one per request, which is a common paradigm in practice), offers strong ordering guarantees (in the above example, ioa will run before iob, guaranteed), and much more efficient work unit execution (by removing the fiber wrapping). The danger is that units of work can interfere with each other, and thus sequential is not an appropriate mode for Dispatchers which are shared across an entire application.

If that weren't enough, Dispatcher has also received a new configuration option that applies to both parallel and sequential modes: await = true. In the above example, there is a deceptively annoying comment: // wait around for stuff…. Most people who have used Dispatcher in anger have received the dreaded dispatcher already shutdown error message. This happens when the use scope for the Dispatcher resource is closed before the work unit finishes. When this happens, Dispatcher invalidates its internal state, cancels all current work fibers, and shuts down. This is a very safe default, but as it turns out, this is often not what people want.

The general expectation is often that Dispatcher will simply wait for all outstanding work to finish before allowing the use block to terminate, rather than aggressively canceling all outstanding tasks. With the addition of the new await = true parameter, this is now possible. In 3.4.0, we can rewrite the above example in a more natural fashion, such that it has the guarantees we expect:

Dispatcher.sequential[IO](await = true) use { disp =>
  for {
    _ <- IO(disp.unsafeRunAndForget(ioa))
    _ <- IO(disp.unsafeRunAndForget(iob))
  } yield ()
}

There is no need to explicitly wait at the end of use: Dispatcher will handle the waiting on our behalf. Meanwhile, ioa and iob will be run in a strictly sequential fashion, with strong ordering between the two. If these actions are, for example, inserting elements into a Queue, those elements will arrive in the target queue in exactly the order issued above.

In order to accommodate these changes, the old Dispatcher.apply constructor method has been deprecated. The simplest way to migrate old usage to the new API is to simply replace all Dispatcher[IO] call sites with Dispatcher.parallel[IO]. However, users are invited to carefully consider what semantics they need, since it is likely that the newly added configurations will be more optimal for their specific use-case.

Experimental(!!) Scala Native Support

Cats Effect has supported both the JVM and Scala.js as first class citizens of the ecosystem ever since its inception. This has brought with it several challenges, owing to the fact that JavaScript runtimes operate at a very different level of abstraction from the JVM, and notably only support a single thread of execution. However, support for JavaScript based environments has undeniably improved the robustness and generality of the framework, as well as opened up significant opportunities within the Typelevel ecosystem as a whole (for example).

With the release of Cats Effect 3.4.0, we are officially adding experimental support for a third execution environment: Scala Native. In some sense, Scala Native is similar to JavaScript in that it runs in a single thread of execution with very limited support for the Java Standard Library. At the same time, Scala Native is also considerably lower-level than either JavaScript or the JVM. The lack of multi-threading together with the lack of a high-level asynchronous runtime means that it was, until Cats Effect 3.4, actually quite challenging to even formulate an application which could benefit from asynchronous support, since it was not particularly meaningful to discuss asynchronous I/O itself.

Addressing this issue directly has resulted in a large series of discussions and re-thinking around the core of Cats Effect's fiber-aware runtime. Big plans are in the works which should result in massive performance and stability benefits for JVM users of the Typelevel ecosystem. In the meantime, Cats Effect represents the very first full-fledged green threads runtime for Scala Native, and thanks to the hard work of Arman Bilge, the majority of the Typelevel ecosystem comes along with it. It is now possible to write an HTTP microservice in Scala and compile it to a native executable which runs entirely without the JVM, with startup times and memory overhead dramatically lower than those required by Graal Native Image.

With all of that said, this work is still in its early stages. We invite users to try it out and give us feedback! It is important to understand though that Scala Native itself is still relatively experimental, and users should not expect a guarantee that future semantics and usage-patterns will be consistent with those offered today. We will continue to iterate extensively on this support, as well as its implications for the existing JVM and JavaScript environments (which remain fully supported).

Disabled Tracing (by default) in Browser Environments

One of the major additions in Cats Effect 3.3.0 was support for fiber tracing when running under Scala.js. This is an incredibly useful feature which provides enhanced exceptions and even fiber dumps on JavaScript runtimes. Unfortunately, it also comes with a much higher cost than on the JVM.

To provide some context, fiber tracing on the JVM imposes a runtime penalty on most operations of around 25%. This is a lot less than it seems, since these operations are themselves extremely fast. For example, IO#flatMap executes in around 7 nanoseconds on my (x86) laptop, and thus a 25% overhead means that the tracing is increasing the runtime of this operation by less than 2 nanoseconds. In practice, this overhead is simply unnoticeable in application level metrics, which is why it tracing is enabled by default.

Unfortunately, on V8, JavaScript tracing overhead is much closer to 90-100% of the operation's runtime. The reasons for this seem to be somewhat complex and are likely related to optimization assumptions in V8 itself around exceptions, but this investigation is still ongoing. Now, despite this significantly higher overhead, tracing is still well below the performance threshold at which it would appear outside of microbenchmarks in a V8-based runtime… at least, when run as a server-side process.

Browsers are considerably more performance-sensitive, and as it turns out, the added overhead of tracing produces very user-visible artifacts and delays in the UI rendering process for most browser-based applications. For that reason, tracing is now disabled by default in all browser-based environments, while remaining enabled when running on the JVM or under Node.js. In the future, if we are able to find a way to reduce tracing overhead down to the point where it is imperceptible in browser environments, this default will be reversed.

Improved Runtime Performance on Foreign Runnables

When the high-performance fiber runtime was first designed for Cats Effect 3, it was optimized for fibers specifically. However, because of the Cats Effect API, it was necessary to still provide support for executing arbitrary Runnables. Users can easily access this functionality by sequencing the exeuctionContext effect, either from IO or Async:

IO.executionContext flatMap { ec =>
  IO.delay(ec.execute(new Runnable { def run() = println("sup?") }))
}

Anyway, the way in which this was implemented by the runtime, all Runnables submitted in this fashion would get wrapped in a new dedicated fiber, and then that fiber would be submitted to the runtime. Literally, the following:

def handleRunnable(r: Runnable): IO[Unit] = IO.delay(r.run())

While this is relatively fast, it isn't as fast as just… running the Runnable directly. The trick is in achieving this simplification without inserting extra complexity (and cost!) into the core fiber runtime.

In 3.4.0, this optimization has been achieved. In fact, the ExecutionContext you can obtain using IO.executionContext is now the fastest and most optimized general purpose thread pool on the JVM, and can be safely used in all contexts which do not require thread blocking.

Serializable Support

A long-standing annoyance in Cats Effect, dating back all the way to 1.0, is the fact that neither IO nor anything related to it is Serializable. This limitation is generally invisible in normal usage of the framework, but it becomes crippling when used with frameworks like Spark which rely heavily on serialization under the surface. This limitation has now been resolved, and any programs written with Cats Effect, whether in a concrete or a polymorphic style, should behave gracefully when serialization is required.

Fixed Console on Scala.js

Console has always been in a rather weird place, particularly on Scala.js. While the println and errorln effects are reasonably straightforward when envisioned as purely asynchronous mechanisms, the readLine effect is just not something that can exist on JavaScript, and even on the JVM it has some notable issues. Amusingly, readLine never actually worked on Scala.js, and it only compiled because Scala.js itself implements a stub version of System.in which does nothing. Thus, we are very confident that there are no users impacted by this change in their production code, since that code would not have worked in the first place!

To that end, readLine on Scala.js has been entirely deprecated. While it is possible to implement something a bit like readLine for Node.js, it would require global state to match the current type signature, and even then there would be significant caveats. Users are advised to consider the fs2.io.stdin stream within the Fs2 framework for a Cats Effect powered version of the standard input stream which works correctly on Node.js. There is no readLine equivalent for browser environments.

In order to fix some of the other unfortunate consequences of the println and errorln implementations on JavaScript runtimes, we have taken the additional step of reimplementing these primitives to correctly manage buffering and backpressure. Unfortunately, this does mean that the materialization constraint on Console has been narrowed from Sync to Async. Binary compatibility is maintained through the use of private stubs, and these stubs will continue to function as before for previously-linked code, but materialization sites for Console[F] which only have a Sync[F] constraint will now fail to compile on Scala.js. Narrowing the constraint to Async[F] will resolve this issue and results in a runtime semantic which is more in line with expectations.

Longer term, we are considering making more significant changes to the readLine effect, since it generally does not work the way that anyone expects it to on any platform (including the JVM), and this behavior is simply fundamental in the way these types of streams operate.

SecureRandom

Certain applications require stronger (usually cryptographic) entropic properties from random number generators. Unfortunately, Cats Effect has only provided a single "one size fits all" Random capability, which can and has been often implemented in ways which are not cryptographically strong and thus may represent a security vulnerability in the right scenario.

To resolve this limitation, 3.4.0 introduces the SecureRandom capability. SecureRandom inherits from Random and thus may be used in all cases where Random was previously expected. However, the only constructors provided are based on the platform-specific secure RNGs of the underlying runtime, rather than other potentially faster or more convenient sources.

Unfortunately, there is no way we can guarantee, statically, that there is no possible insecure implementation of the SecureRandom trait. It is hoped, however, that the name alone along with the existence of Random will discourage such abuse. At the very least, this allows API authors the ability to encode the desire for secure randomness in their type signatures, whereas previously the best that could be done was Random alone.

Incorporate MapRef

Chris Davenport's MapRef has long been a major component of the broader Cats Effect ecosystem under the banner of the Davenverse. Its relatively simple functionality basically reduces to a function K => Ref[F, V], making it easy to get a set of keyed refs or even view an existing Map or (Map-like data structure) as if it were a set of keyed refs.

Due to the convenience and universality of Ref, this kind of use-case shows up a lot in practice. For this reason, Chris has kindly agreed to donate his implementation to the Cats Effect std module, removing the need for users to pull in a separate library to obtain this functionality.

User-Facing Pull Requests

  • #3157 – Suppress supervisor checkRestart (@djspiewak)
  • #3156 – Fixed Dispatcher cancelation interactions with new modes (@djspiewak)
  • #3175 – Fix NullPointerException during fiber dump (@RafalSumislawski)
  • #3174 – Update to Scala 3.2.0 (@armanbilge)
  • #3165 – Enabled configurability of unhandled fiber exceptions (@djspiewak)
  • #3143 – Add Mutex (@BalmungSan)
  • #3057 – Scala Native (@armanbilge)
  • #3098 – Accept Duration for all time-based combinators (@armanbilge)
  • #3154 – Install custom IOApp#runtime as IORuntime.global (@armanbilge)
  • #3127 – resume interrupt of calling thread during shutdown (@blondacz)
  • #3110 – Handle ClosedByInterruptException properly in interruptible (@durban)
  • #3080 – Override memoize for Resource (@armanbilge)
  • #3102 – Remove the IO.defer from IOFiber#cancel (@vasilmkd)
  • #3062 – MapRef: small improvements (@durban)
  • #2464 – Incorporate MapRef into Std (@ChristopherDavenport)
  • #2917 – High(er) performance tryTakeN for Queue.bounded (@djspiewak)
  • #2965 – Fixed TestControl#nextInterval and added tickFor (@djspiewak)
  • #2954 – Generalize timeout and friends to take a Duration (@b3nk3i)
  • #2976 – add useEval method to Resource (@mberndt123)
  • #2914 – High(er) performance unbounded queue (@djspiewak)
  • #2885 – High(er) performance async Queue (@djspiewak)
  • #2807 – Enable configuration of non-daemon thread logging within IOApp (@djspiewak)
  • #3000 – Harden Queue cancelation semantics (@djspiewak)
  • #3034 – Update cats to 2.8.0 (@typelevel-steward)
  • #2905 – Add a SecureRandom algebra (@rossabaker)
  • #2901 – Relax Ref#access semantics if set more than once (@armanbilge)
  • #3008 – Print fatal errors (@sbly)
  • #3007 – Add combinators for choosing random elements of collections (@cb372)
  • #2869 – Added support for configuring compute pool error reporter (@djspiewak)
  • #3011 – Disable tracing for browsers by default on JS (@armanbilge)
  • #3003 – PQueue: add tryTakeN and tryOfferN (@caiormt)
  • #3001 – Add Async#executor (@armanbilge)
  • #2967 – Add Ref.empty Monoid-based shortcut (@ivan)
  • #2966 – Relax LensRef constraints (@armanbilge)
  • #2959 – Add configurable Dispatcher (@iRevive)
  • #2941 – Add GenTemporal#cachedRealTime (@brendanmaguire)
  • #2951 – Implemented configurability for Supervisor scope termination (@djspiewak)
  • #2895 – Use Performance API for high-precision time on JS (@armanbilge)
  • #2940 – Added orElse combinator to IO (@walesho)
  • #2918 – Make Queue#tryTakeN, tryTakeFrontN and tryTakeBackN return F[List[A]] (@leusgalvan)
  • #2416 – Support microsecond precision in realTime and realTimeInstant on the JVM (@brendanmaguire)
  • #2749 – Adds CommutativeApplicative instance for Resource.Par (@skennedy)
  • #2881 – Handle exceptions of foreign Runnables (@vasilmkd)
  • #2873 – The WSTP can run Futures just as fast as ExecutionContext.global (@vasilmkd)
  • #2846 – Add unsafeRunSync* syntax for JS Dispatcher (@armanbilge)
  • #2360 – Add serialization support (@gagandeepkalra)
  • #2359 – Add embedError convenience function (@ChristopherDavenport)
  • #2864 – Exclude completed fibers from the live runtime snapshot (@iRevive)
  • #2732 – Expose an IOApp#MainThread executor (@djspiewak)
  • #2829 – Indicate a completed state in IOFiber#toString (@iRevive)
  • #2852 – Add IO.randomUUID convenience method (@Daenyth)
  • #2843 – Add Semaphore#tryPermit (@TimWSpence)
  • #2835 – Add Async#syncStep (@armanbilge)
  • #2853 – Configurable caching of blocking threads (@vasilmkd)
  • #2811 – Support shutting down multiple runtimes (@vasilmkd)
  • #2809 – Refine derived type classes (@ybasket)
  • #2820 – Move the class lookup in the correct branch (@vasilmkd)
  • #2819 – Put instances for IO.Par into implicit scope (@armanbilge)
  • #2657 – Introduce IORuntimeBuilder (@majk-p)
  • #2787 – Lock-free test context (@vasilmkd)
  • #2798 – Be more paranoid when making JS Console (@armanbilge)
  • #2604 – Fix Console on JS (@armanbilge)
  • #2785 – Simplify randomness in TestContext (@vasilmkd)
  • #2780 – Add recover and recoverWith to IO (@kamilkloch)
  • #2777 – Add andWait method to IO (@iRevive)
  • #2650 – Implement std.Env[F] for cross-platform environment variables (@armanbilge)
  • #2757 – Add IO.rethrow and IO.debug (@BalmungSan)
  • #2670 – Add batch offer/take methods to Queue and Dequeue (@etspaceman)
  • #2688 – Make Resource#allocatedCase public (@alexandrustana)
  • #2613 – Sync run convenience function (@djspiewak)
  • #2643 – Adds joinWithUnit for Fiber (@etspaceman)
  • #2659 – Add fromCompletionStage to Async / IO (@etspaceman)
  • #3166, #3163, #3160, #3153, #3084, #3139, #3113, #3122, #3071, #3134, #3141, #3120, #3112, #3115, #3114, #3106, #3107, #3082, #3046, #3009, #2994, #2960, #2883, #2870, #2834, #2806, #2781, #2786, #2758 – Documentation fixes and improvements (@floating-cat, @daniel-ciocirlan, @samspills, @barryoneill, @yoshinorin, @bplommer, @armanbilge, @s5bug, @benhutchison, @martinprobson, @rossabaker, @b-vennes, @armanbilge, @iRevive, @ivanlp10n2, @jisantuc, @arturaz, @djspiewak)

Very special thanks to all of you!

v3.3.14

1 year ago

This is the twenty-ninth release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.3.x release. Note that source compatibility has been broken with 3.2.x in some minor areas. Scalafixes are available and should be automatically applied by Scala Steward if relevant.

This release contains significant fixes for the interruptibleMany function, which could (under certain circumstances) result in a full runtime deadlock.

User-Facing Pull Requests

  • #3081 – Improved granularity of interruptible loops (@durban)
  • #3074 – Resolve race condition in interruptibleMany after interruption (@djspiewak)
  • #3064 – Handle Uncancelable and OnCancel in syncStep interpreter (@armanbilge)
  • #3069 – Documentation fixes and improvements (@TonioGela)

Special thanks to all of you!

v3.3.13

1 year ago

This is the twenty-eighth release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.3.x release. Note that source compatibility has been broken with 3.2.x in some minor areas. Scalafixes are available and should be automatically applied by Scala Steward if relevant.

User-Facing Pull Requests

  • #3053 – Updated native image config for GraalVM 21.0 (@djspiewak)
  • #3054 – Fix new blocking worker thread naming change for bincompat (@djspiewak)
  • #3012 – Rename worker threads in blocking regions (@aeons)
  • #3036 – Properly declare constants on Scala.js (@armanbilge)
  • #3056, #2897, #3047 – Documentation fixes and improvements (@djspiewak, @Daenyth, @TimWSpence)

Thank you very much!

v3.3.12

1 year ago

This is the twenty-seventh release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.3.x release. Note that source compatibility has been broken with 3.2.x in some minor areas. Scalafixes are available and should be automatically applied by Scala Steward if relevant.

User-Facing Pull Requests

  • #2991 – Resource#evalOn should use provided EC for both acquire and release (@armanbilge)
  • #2972 – Fix leaking array ref in Random (@catostrophe)
  • #2963 – Override racePair in _asyncForIO (@durban)
  • #2993, #2990, #2955 – Documentation fixes and improvements (@TimWSpence, @bplommer)

Thank you, all of you!

v2.5.5

1 year ago

This is the eighteenth release in the Cats Effect 2.x lineage. It is fully binary compatible with all 2.x.y releases.

No further maintenance is planned in this series, though we will consider exceptions for security patches or other tales of woe.

User-Facing Pull Requests

  • #2395 – Update sbt-scalajs to version 1.7.1 (@vasilmkd)
  • #2420 – Intoduce parReplicateAN (@RafalSumislawski)
  • #2457 – Fix of 2.x series tutorial paragraph about error promotion of join call (@lrodero)
  • #2593 – Swap System.exit with Runtime#halt (@alexandrustana)
  • #2594 – Update cats-core, cats-laws to 2.7.0 (@scala-steward)
  • #2775 – Cats effect testing does not support munit (@Guisanpea)

Very special thanks to all of you!

New Contributors

Full Changelog: https://github.com/typelevel/cats-effect/compare/v2.5.4...v2.5.5

v3.3.11

2 years ago

This is the twenty-sixth release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.3.x release. Note that source compatibility has been broken with 3.2.x in some minor areas. Scalafixes are available and should be automatically applied by Scala Steward if relevant.

User-Facing Pull Requests

  • #2945 – Securely implement UUIDGen for Scala.js (@armanbilge)

Thank you so much!

v3.3.10

2 years ago

This is the twenty-fifth release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.3.x release. Note that source compatibility has been broken with 3.2.x in some minor areas. Scalafixes are available and should be automatically applied by Scala Steward if relevant.

This release resolves a rare issue in which IO could continue executing for a short time following a fatal error (such as OutOfMemoryError) taking null as a result value. This was more relevant on Scala.js than on the JVM, but it was at least theoretically observable on both platforms.

User-Facing Pull Requests

  • #2935 – Resolved issue with fatal errors being eaten (@djspiewak)