Cats Effect Versions Save

The pure asynchronous runtime for Scala

v3.3.9

2 years ago

This is the twenty-fourth major release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.3.x release. Note that source compatibility has been broken with 3.2.x in some minor areas. Scalafixes are available and should be automatically applied by Scala Steward if relevant.

This release relieves a contention point in the default Supervisor implementation, which is itself used in the implementation of Dispatcher, resulting in better throughput in parallel workloads. The release also fixes a cancelation backpressure flaw in java.util.concurrent.CompletableFuture interop, as well as some Scala JS linking issues, which we were unaware of, prior to today.

User-Facing Pull Requests

  • #2876 – Use a ConcurrentHashMap for Supervisor state if F is Async (@wjoel)
  • #2906 – Implement Random#javaSecuritySecureRandom on Scala.js (@armanbilge)
  • #2907 – Fix Async#fromCompletableFuture cancelation backpressure (@bplommer and @armanbilge)

Special thanks to each and every one of you!

v3.3.8

2 years ago

This is the twenty-third major release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.3.x release. Note that source compatibility has been broken with 3.2.x in some minor areas. Scalafixes are available and should be automatically applied by Scala Steward if relevant.

User-Facing Pull Requests

  • #2868 – Implemented unhandled error reporting for orphan fibers (@djspiewak)
  • #2880 – Use Sync[F].blocking instead of Sync[F].delay for randomUUID (@BalmungSan)
  • #2872 – Count new blocking threads (@vasilmkd)
  • #2884, #2805, #2371 – Documentation fixes and improvements (@armanbilge, @kamilkloch, @bastewart)

Special thanks to each and every one of you!

v3.3.7

2 years ago

This is the twenty-second major release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.3.x release. Note that source compatibility has been broken with 3.2.x in some minor areas. Scalafixes are available and should be automatically applied by Scala Steward if relevant.

User-Facing Pull Requests

  • #2859 – Fix IO#syncStep double-execution (@armanbilge)

Thank you!

v3.3.6

2 years ago

This is the twenty-first major release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.3.x release. Note that source compatibility has been broken with 3.2.x in some minor areas. Scalafixes are available and should be automatically applied by Scala Steward if relevant.

User-Facing Pull Requests

  • #2844 – Thread local fallback weak bag (@vasilmkd)
  • #2838 – Close cancelation leak in fromCompletableFuture (@TimWSpence)
  • #2784 – Create and use java.util.ConcurrentThreadLocalRandom on the same flatMap call (@PiotrBosak)
  • #2845, #2826, #2796, #2790, #2504 – Documentation fixes and improvements (@armanbilge, @vasilmkd, @Koroeskohr, @djspiewak, @armanbilge)

Thank you so very much!

v3.3.5

2 years ago

This is the twentieth major release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.3.x release. Note that source compatibility has been broken with 3.2.x in some minor areas. Scalafixes are available and should be automatically applied by Scala Steward if relevant.

User-Facing Pull Requests

  • #2776 – Faster runnables on the WSTP (@vasilmkd)
  • #2769 – Simplify the transfer of WorkerThread data structures when blocking (@vasilmkd)
  • #2743 – Workaround for lampepfl/dotty#14240 (@armanbilge)
  • #2746 – Only run the thread logger if forked (@vasilmkd)
  • #2733, #2752 – Documentation fixes and improvements (@TimWSpence, @jmgimeno)

Special thanks to all of you!

v3.3.4

2 years ago

This is the nineteenth major release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.3.x release. Note that source compatibility has been broken with 3.2.x in some minor areas. Scalafixes are available and should be automatically applied by Scala Steward if relevant.

User-Facing Pull Requests

  • #2735 – Run blocking code on the compute pool more carefully (@vasilmkd)
  • #2724 – Handle interruption around blocking waits (@vasilmkd)
  • #2729 – Concurrent.memoize: be more careful with the result (@durban)
  • #2731 – Invoke process.exit within a Try in IOApp.js (@armanbilge)
  • #2719 – Attempt to detect host environments (like unforked sbt) (@djspiewak)
  • #2721 – Leaked fiber test for JS IOApp (@armanbilge)
  • #2714 – Set the callback unsafely without memory barriers and allocations (@vasilmkd)

Special thanks and appreciation to all of you!

v3.3.3

2 years ago

This is the eighteenth major release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.3.x release. Note that source compatibility has been broken with 3.2.x in some minor areas. Scalafixes are available and should be automatically applied by Scala Steward if relevant.

This release contains a fix for a regression introduced in 3.3.0 related to IOApps which exit with non-fatal errors when run from within Sbt with fork set to false. In that scenario, the runtime worker threads would end up hung in a busy-wait loop and eat up all available CPU despite returning control to the Sbt shell. Despite this fix, it is still recommended that you set run / fork := true in Sbt to work around other bugs in Sbt itself (specifically related to both Ctrl-C and System.exit suppression).

User-Facing Pull Requests

  • #2705 – Detect sbt thread cleaner and terminate workers (@djspiewak)
  • #2707 – Hard exit in Node.js IOApp (@armanbilge)

Thank you so much!

v3.3.2

2 years ago

This is the seventeenth major release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.3.x release. Note that source compatibility has been broken with 3.2.x in some minor areas. Scalafixes are available and should be automatically applied by Scala Steward if relevant.

This patch release focuses primarily on performance improvements in two major areas: blocking/interruptible and suspended fiber tracking.

In the former area, the Cats Effect fiber runtime has long had support for the scala.concurrent.blocking construct within any code which is scheduled on its worker threads. When such a block is hit, the runtime takes it as a signal that it is about to lose a functioning worker thread and thus spawns a new one, seamlessly putting it into rotation to ensure the pool is not starved by the current worker thread being blocked. This trick works very well, but wasn't particularly recommended in user code because the performance was worse than the native IO.blocking operation.

In this release, Vasil has changed the behavior of the pool to seamlessly shift worker state when a blocking section is hit, effectively morphing another thread into the exact state as the now-blocked thread. Additionally, spare threads constructed when blocking operations are hit are now cached for one minute before being cleaned up if still idle, ensuring that they're still around if a subsequent blocking operation is hit in short order.

These improvements, taken together, mean that scala.concurrent.blocking inside of delay is actually faster than the IO.blocking operation by a significant margin, meaning that we can reap immediate performance benefits by converting IO.blocking and IO.interruptible to use this native mechanism rather than an ancillary thread pool.

Screen Shot 2021-12-30 at 11 53 45 AM

Please note that the above is plotted on a log scale to make it easier to see the relative differences in each scenario. For reference, the improvements in the "fine grained" benchmark represent the test running 141x faster! (not a percent sign) Blocking is still bad for throughput, but it's a lot less bad now. You can find all of these benchmarks in the repository.

As if that weren't enough, we've reimplemented the tracking mechanism for suspended fibers which underlies the new fiber dump feature introduced in 3.3.0. This feature was and is implemented using a thread local set-like data structure which maintains weak references to any suspended fiber. The weak references are necessary for two reasons. First, it ensures that any fiber which is suspended and then the callback is "lost" can still be garbage collected normally. Second, it allows us to avoid the extra memory barriers associated with backtracking to the suspending thread when the fiber is resumed, making the whole mechanism significantly faster.

Unfortunately, this comes with a cost: these weak references must be examined and ultimately cleaned by the garbage collector, which means that we're effectively taking synchronous work out of the main code path and moving it asynchronously into the garbage collector. This in turn can mean that certain types of workflows which already put significant pressure on the GC may have seen diminished performance with the update to 3.3.0.

This release significantly reduces the GC overhead by simplifying and specializing the data structure to reduce the number of weak references and allocations involved in the tracking itself. The results should be unnoticeable in most optimized workloads, but for applications which are creating a significant amount of short-lived objects within their hot path, these changes should produce a substantial speed-up relative to 3.3.1.

User-Facing Pull Requests

  • #2699 – Removed interruptible dependency on an explicit ExecutionContext (@djspiewak)
  • #2687 – Blocking mechanism with cached threads (@vasilmkd)
  • #2673 – Cross platform weak bag implementation (@vasilmkd)

Thank you so much!

v3.3.1

2 years ago

This is the sixteenth major release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.3.x release. Note that source compatibility has been broken with 3.2.x in some minor areas. Scalafixes are available and should be automatically applied by Scala Steward if relevant.

This release contains bug fixes and performance enhancements to tracing and some other areas. Both enhanced exceptions and fiber dumps should now behave better on the latest versions of the JVM (which use a different top-level package identifier). Tracing on Scala.js can now be fully disabled if its overhead is too high, whereas previously some bookkeeping was retained even when tracing was configured to off.

Most significantly, Resource.uncancelable previously contained a significant bug in which error states were rewritten within the block. This ultimately stemmed from a limitation in the original Resource API with respect to full representation of outcomes, and it indirectly impacted all use of Async[Resource]. Thanks to the efforts of @TimWSpence, these bugs have now been completely squashed with the addition of a new Resource interpreter: allocatedCase (originally proposed by @kubukoz). In the interest of maintaining forward-compatibility, this function is currently marked as package-private, but will be marked as public in Cats Effect 3.4.0.

User-Facing Pull Requests

  • #2617 – Add Resource#allocatedCase (@TimWSpence)
  • #2665 – Fixes cancelling CompletableFuture on fiber cancellation (@LightSystem)
  • #2662 – micro-optimization: use Map.empty instead of Map.apply (@yanns)
  • #2642 – Only use FiberAwareExecutionContext if tracing is enabled (@armanbilge)
  • #2623 – Add jdk. to trace filter (@armanbilge)
  • #2609 – Fix trace filtering (@alexandrustana)
  • #2607 – Duplicated fibers in fiber dump (@TimWSpence)
  • #2600 – Handle interruption gracefully and halt the runtime on fatal errors (@vasilmkd)
  • #2654, #2635, #2647, #2629, #2591 – Documentation fixes and improvements (@danicheg, @zarthross, @danicheg, @xuwei-k, @djspiewak)

Heartfelt thanks; you are all amazing!

v3.3.0

2 years ago

This is the fifteenth major release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.3.x release. Note that source compatibility has been broken with 3.2.x in some minor areas (detailed below). Scalafixes are available and should be automatically applied by Scala Steward if relevant.

The theme of this release has been improving observability, testability, and debuggability of all applications using Cats Effect 3. This has resulted in a massive set of new functionality, tweaks, and improvements which make 3.3.0 the most significant Cats Effect release ever apart from 3.0.0 itself. The developer experience has been significantly improved, particularly in tricky areas such as diagnosing deadlocks and deterministically testing functionality involving timers and clocks. Additionally, new functionality has been brought to Scala.js, including full support for tracing!

Finally, we took the opportunity to continue to build on IO's best-in-class performance with significant improvements to the fiber runtime (including dynamic workload fairness self-tuning) and continued dramatic slimming of the fiber memory footprint. Embarrassingly, we even discovered that the build was using an unnecessary Scala compiler flag which inhibited performance in some scenarios (particularly involving large numbers of fibers) by up to 15%! All of these improvements should add up into a very noticeable leap forward for application metrics in nearly all real-world scenarios.

Notable Changes

Thread Fiber Dumps

One of the most annoying and difficult problems to resolve in any asynchronous application is the asynchronous deadlock. This scenario happens when you have a classic deadlock of some variety, where one fiber is waiting for a second fiber which is in turn waiting for the first (in the simplest case). Due to the asynchronous nature of the runtime, whenever a fiber blocks, all references to it are removed from the internal runtime, meaning that a deadlock generally leaves absolutely no residue whatsoever, and the only recourse as a developer is to just start sprinkling IO.println expressions around the code to see if you can figure out where it's getting stuck.

This is very much in contrast to a conventional deadlock in a synchronous runtime, where we have JVM-level tools such as thread dumps to suss out where things are stuck. In particular, thread dumps are a commonly-applied low level tool offered by the JVM which can serve to inform users of what threads are active at that point in time and what each of their call stacks are. This tool is generally quite useful, but it becomes even more useful when the threads are deadlocked: the call stacks show exactly where each thread is blocked, making it relatively simple to reconstruct what they are blocked on and thus how to untie the knot.

Fiber dumps are a similar construct for Cats Effect applications. Even better, you don't need to change anything in order to take advantage of this functionality. As a simple example, here is an application which trivially deadlocks:

import cats.effect.{IO, IOApp}

object Deadlock extends IOApp.Simple {
  val run =
    for {
      latch <- IO.deferred[Unit]

      body = latch.get
      fiber <- body.start
      _ <- fiber.join

      _ <- latch.complete(())
    } yield ()
}

The main fiber is waiting on fiber.join, which will only be completed once latch is released, which in turn will only happen on the main fiber after the child fiber completes. Thus, both fibers are deadlocked on each other. Prior to fiber dumps, this situation would be entirely invisible. Manually traversing the IO internals via a heap dump would be the only mechanism for gathering clues as to the problem, which is far from user-friendly and also generally fruitless.

As of Cats Effect 3.3.0, users can now simply trigger a fiber dump to get the following diagnostic output printed to standard error:

cats.effect.IOFiber@56824a14 WAITING
 ├ flatMap @ Deadlock$.$anonfun$run$2(Deadlock.scala:26)
 ├ flatMap @ Deadlock$.$anonfun$run$1(Deadlock.scala:25)
 ├ deferred @ Deadlock$.<clinit>(Deadlock.scala:22)
 ├ flatMap @ Deadlock$.<clinit>(Deadlock.scala:22)
 ╰ run$ @ Deadlock$.run(Deadlock.scala:19)
 
cats.effect.IOFiber@6194c61c WAITING
 ├ get @ Deadlock$.$anonfun$run$1(Deadlock.scala:24)
 ╰ get @ Deadlock$.$anonfun$run$1(Deadlock.scala:24)
 
Thread[io-compute-14,5,run-main-group-6] (#14): 0 enqueued
Thread[io-compute-12,5,run-main-group-6] (#12): 0 enqueued
Thread[io-compute-6,5,run-main-group-6] (#6): 0 enqueued
Thread[io-compute-5,5,run-main-group-6] (#5): 0 enqueued
Thread[io-compute-8,5,run-main-group-6] (#8): 0 enqueued
Thread[io-compute-9,5,run-main-group-6] (#9): 0 enqueued
Thread[io-compute-11,5,run-main-group-6] (#11): 0 enqueued
Thread[io-compute-7,5,run-main-group-6] (#7): 0 enqueued
Thread[io-compute-10,5,run-main-group-6] (#10): 0 enqueued
Thread[io-compute-4,5,run-main-group-6] (#4): 0 enqueued
Thread[io-compute-13,5,run-main-group-6] (#13): 0 enqueued
Thread[io-compute-0,5,run-main-group-6] (#0): 0 enqueued
Thread[io-compute-2,5,run-main-group-6] (#2): 0 enqueued
Thread[io-compute-3,5,run-main-group-6] (#3): 0 enqueued
Thread[io-compute-1,5,run-main-group-6] (#1): 0 enqueued
Thread[io-compute-15,5,run-main-group-6] (#15): 0 enqueued
 
Global: enqueued 0, foreign 0, waiting 2

A fiber dump prints every fiber known to the runtime, regardless of whether they are suspended, blocked, yielding, active on some foreign runtime (via evalOn), or actively running on a worker thread. You can see an example of a larger dump in this gist. Each fiber given a stable unique hexadecimal ID and paired with its status as well as its current trace, making it extremely easy to identify problems such as our earlier deadlock: the first fiber is suspended at line 26 (fiber.join) while the second fiber is suspended at line 24 (latch.get). This gives us a very good idea of what's happening and how to fix it.

Note that most production applications have a lot of fibers at any point in time (millions and even tens of millions are possible even on consumer hardware), so the dump may be quite large. It's also worth noting that this is a statistical snapshot mechanism. The data it is aggregating is spread across multiple threads which may or may not have all published into main memory at a given point in time. Thus, it isn't necessarily an instantaneously consistent view of the runtime. Under some circumstances, trace information for a given fiber may be behind its actual position, or a fiber may be reported as being in one state (e.g. YIELDING) when in fact it is in a different one (e.g. WAITING). Under rare circumstances, newly-spawned fibers may be missed. These circumstances are considerably more common on ARM architectures than they are under x86 due to store order semantics.

Summary statistics for the global fiber runtime are printed following the fiber traces. In the above example, these statistics are relatively trivial, but in a real-world application this can give you an idea of where your fibers are being scheduled.

Triggering the above fiber dump is a matter of sending a POSIX signal to the process using the kill command. The exact signal is dependent on the JVM (and version thereof) and operating system under which your application is running. Rather than attempting to hard-code all possible compatible signal configurations, Cats Effect simply attempts to register both INFO and USR1 (for JVM applications) or USR2 (for Node.js applications). In practice, INFO will most commonly be used on macOS and BSD, while USR1 is more common on Linux. Thus, kill -INFO <pid> on macOS and kill -USR1 <pid> on Linux (or USR2 for Node.js applications). POSIX signals do not exist on Windows (except under WSL, which behaves exactly like a normal Linux), and thus the mechanism is disabled.

Since INFO is the signal used on macOS and BSD, this combined with a quirk of Apple's TTY implementation means that anyone running a Cats Effect application on macOS can simply hit Ctrl-T within the active application to trigger a fiber dump, similar to how you can use Ctrl-\ to trigger a thread dump. Note that this trick only works on macOS, since that is the only platform which maps a particular keybind to either the INFO or USR1 signals.

In the event that you're either running on a platform which doesn't support POSIX signals, or the signal registration failed for whatever reason, Cats Effect on the JVM will also automatically register an MBean under cats.effect.unsafe.metrics.LiveFiberSnapshotTriggerMBean which can produce a string representation of the fiber dump when its only method is invoked.

This entire mechanism has no performance impact (well, it probably would if you kept printing the dump in a loop, but don't do that). It is controlled by the same configuration as tracing.

And in case you were wondering, yes, it does work on Node.js applications!

cats.effect.IOFiber@d WAITING
 
cats.effect.IOFiber@9 WAITING
 ╰ deferred @ <jscode>.null.$c_LDeadlock$(/workspace/cats-effect/example/js/src/main/scala/cats/effect/example/Example.scala:22)
 
cats.effect.IOFiber@a WAITING
 ├ flatMap @ <jscode>.null.<anonymous>(/workspace/cats-effect/example/js/src/main/scala/cats/effect/example/Example.scala:26)
 ├ flatMap @ <jscode>.null.<anonymous>(/workspace/cats-effect/example/js/src/main/scala/cats/effect/example/Example.scala:25)
 ├ deferred @ <jscode>.null.$c_LDeadlock$(/workspace/cats-effect/example/js/src/main/scala/cats/effect/example/Example.scala:22)
 ╰ flatMap @ <jscode>.null.$c_LDeadlock$(/workspace/cats-effect/example/js/src/main/scala/cats/effect/example/Example.scala:22)
 
cats.effect.IOFiber@b WAITING
 
Global: enqueued 0, waiting 4

You'll notice that there are two extra fibers in this example. These are actually internal to Cats Effect itself, since IOApp on JavaScript maintains a keep-alive fiber and races it against the main fiber. The two fibers in the middle line up pretty closely with those from the JVM dump of the same program, though some of the tracing information is a little different. This is happening because Scala.js itself is very aggressive about inlining across call-sites, and Cats Effect is written in such a way that much of it does get inlined away. This in turn confuses the tracing slightly, since there's simply less information to work with. In practice, this isn't too much of an issue, since most practical fibers will have more than enough tracing information to pin down what's going on, but it's something to keep in mind.

Note that this functionality is currently only available on applications running under Node.js, since it isn't entirely clear how to trigger a fiber dump within a browser runtime. If you have thoughts or inspirations on this point, feel free to open an issue or discuss in Discord!

Better Fiber toString

println-based debugging is the staple of a lot of tricky program diagnosis, particularly in the realm of highly-concurrent programs. Unfortunately, prior to this release, printing out a Fiber would more often than not simply print false (and more rarely, true) due to some subtle implementation details. This was actually worse than the default inherited from AnyRef which would at least produce an instance-identifying numeric value which could be used to correlate fibers in different places.

This property has been restored, along with several other pieces of useful debugging information, on the new toString for the default Fiber implementation in IO. For example:

import cats.effect.{IO, IOApp}
import scala.concurrent.duration._

object ToString extends IOApp.Simple {
  val run = {
    lazy val loop: IO[Unit] = IO.cede >> loop
    (loop.start <* IO.sleep(100.millis)).flatMap(IO.println(_))
  }
}

In this example, we're starting a fiber which loops infinitely, sleeping for 100 milliseconds, and then simply printing the toString of that fiber and exiting. The results will be something like the following:

cats.effect.IOFiber@217df74f RUNNING: >> @ ToString$.loop$lzycompute$1(ToString.scala:22)

The hexadecimal value following the @ is stable and unique to the instance (though it will not be consistent between JVM executions), and will in fact be the same value printed during fiber dumps. The status will either be RUNNING or SUSPENDED, since fibers themselves have access to less status context than the fiber dump mechanism does.

The final component of the toString is the top-most frame from the fiber trace, if available. Note that, much as with fiber dumps, we're leveraging some tricky memory barriers to retrieve this information without performance cost. This in turn means that it may be slightly out of date, depending on the thread which is calling toString, and depending on the underlying CPU architecture (again, x86 will produce more accurate results here than ARM).

Since this value is the toString for the fiber and not an additional debug function, it should be of significant aid to IDE debuggers (such as IntelliJ or BSP), which generally rely on the toString of an object when rendering. If tracing is disabled, only the first half of the string will be produced (everything prior to the :).

Fiber Runtime Observability

In keeping with this release's focus on developer usability and productionalization, we've significantly improved the observability of all Cats Effect applications out of the box by adding a number of JMX MBeans which provide metrics on the underlying runtime. These metrics are implemented without any impact to performance and thus can be safely incorporated into automated reporting, alerting, and monitoring systems. They can also be a useful local diagnostic tool when viewed with Mission Control:

All MBeans are automatically registered and enabled by default. This functionality can be disabled using the same configuration that disables tracing support. In the event that JMX is unsupported by the underlying runtime (e.g. Graal native-image), Cats Effect will silently continue without active monitoring. Also please note that MBean observability is only supported for the global IORuntime, as well as any runtime managed by IOApp.

It's worth keeping in mind that nearly all metrics are only periodically published to main memory, meaning that samples are not guaranteed to be internally self-consistent nor are they guaranteed to represent an instantaneously-accurate snapshot of the runtime. All metrics will generally exhibit higher accuracy on x86 than they will on ARM for architectural reasons.

A number of metrics are available right now, but just to provide a brief taste:

  • Total number of blocked workers
  • Number of fibers currently suspended
  • Total number of workers
  • Total spilled fibers for each worker
  • Total number of stolen fibers for each worker
  • Current fiber dump represented as an Array[String]
  • ...and much more!

Please note that these metrics should be considered a first draft and we will likely continue to iterate and change them over time. We may remove or rename some existing metrics within the 3.x release lineage, though if this happens, it will be reserved for a minor version bump (rather than a patch version).

Be sure to let us know if the new observability proves useful, and also if you have any additional ideas for metrics which can be safely exposed by the runtime.

Mock Time and Improved Support for Testing IO

One of the many changes brought about by Cats Effect 3.0.0 has been the consolidation of numerous formerly-ancillary functions into the core typeclass hierarchy. Deferred and Ref are two obvious examples of this, but the now-deleted ContextShift and the heavily reworked Clock typeclasses are also quite significant. The former was superseded by enriched functionality within the Async typeclass – in particular, evalOn and stronger laws around async – while the latter still exists but has become explicitly part of the hierarchy, rather than serving as an additional element on the side.

By and large, this change simplifies usage of Cats Effect in that it is no longer necessary to manually thread a Clock instance through layers and layers of call sites: you can simply invoke the realTime or monotonic functions on either IO or whatever typeclass constraint you happen to have (e.g. Temporal, Sync, or Async). Unfortunately, this integration and canonicalization also brought about an unintended consequence: mocking time for the purposes of testing became effectively impossible.

To an extent, some of this regression was to be expected. After all, one of Cats Effect 3's major focus areas was in improving the safety and uniformity of the runtime semantics. Mocking time in Cats Effect 2 by creating alternative Clock instances was always fraught with caveats even when you could be sure that the underlying code didn't just do something hacky like Sync[F].delay(System.currentTimeMillis()). Among other things, timeouts and races behaved inconsistently with respect to this source of time, meaning that code might have wildly different semantics under test.

However, none of this changes the fact that these kinds of use-cases are unavoidable. Testing code involving timeouts in particular is a tricky scenario which arises often. Prior to Cats Effect 3.3.0, this could only really be done by using wall clock time, which usually results in long running and unreliable unit tests. The Ember project (a purely functional HTTP network layer built on Fs2 and Cats Effect) has suffered particularly acutely from this limitation.

The solution to all of these problems is to provide a new mechanism which overrides the runtime evaluation semantics of IO such that all time-related functionality (including realTime, monotonic, and sleep) behave consistently, deterministically, and in a fashion which is ultimately under external user control for the purposes of testing. This mechanism should be otherwise as consistent as possible with the production IO runtime, ensuring that code under test can be relied upon to behave as close as possible to how it would in a production environment, except that its view of time as a whole would be redefined.

In Cats Effect 3.3.0, this mechanism is now available in the form of TestControl, a new type in the cats-effect-testkit module. This wraps and revises the functionality previously available in a less ergonomic form within TestContext, a utility that has been a part of Cats Effect since before 1.0.0. TestControl is much more opinionated about how you interact with the runtime, and this in turn allows it to provide a more deterministic and granular set of controls to users.

Note that this tool is only intended for use in test suites! It is not a production runtime.

You can find an intricate example of applying TestControl to a complex scenario on the project website. As a very quick example though using MUnit syntax, consider testing the timeout function:

import cats.effect.testkit.TestControl

test("timeout should take the winner") {
  val program = IO.sleep(5.seconds).timeout(10.seconds)
  TestControl.executeEmbed(program).assertEquals(())
}

test("timeout should halt the loser") {
  val program = IO.sleep(10.seconds).timeout(5.seconds)
  TestControl.executeEmbed(program).intercept[TimeoutException]
}

Both of these tests will deterministically pass within milliseconds, even on CI. You can get even finer grain control over time passage by using the TestControl.execute function, though this is beyond the scope of these notes.

This functionality fully subsumes the old Clock mocking technique in Cats Effect 2, as well as all prior use of TestContext (which has now been changed subtly to support TestControl and is not recommended for direct use).

Improved interruptible

Judging by questions in Discord, most people seem to still be unaware of the interruptible function and its implications. Considering that this function was also, until this release, almost entirely undocumented, this is perhaps not a surprising turn of events!

In 3.3.0, we've slightly reworked the IO.interruptible and Sync#interruptible APIs (jointly). In particular, interruptible previously took a Boolean parameter which governed whether or not the state machine would repeatedly attempt to call Thread#interrupt or if it would just send a single signal and await termination. Far and away, the most common use-case here would be to simply pass false, but this led to a rather clunky and unintuitive syntax: IO.interruptible(false) { ... }.

In an attempt to fix this, interruptible has been split into two methods: interruptible and interruptibleMany, with the former behaving exactly like interruptible(false) previously behaved. This change was carefully encoded to ensure binary compatibility with previous Cats Effect 3.x versions, which unfortunately means that the old interruptible signature remains available on the IO companion, albeit marked with @deprecated.

As an aside, interruptible is very similar to blocking in that each can be used to wrap a side-effect which hard-blocks a Thread (think: file I/O, use of java.util.concurrent primitives, etc). However, unlike blocking and delay, interruptible side-effects can be canceled just like any other fiber stage! This mechanism works by invoking the Thread#interrupt method on the underlying carrier thread which evaluates the side-effect. Note that not all JVM code respects thread interruption (notably, most file I/O libraries do not), and only thread suspension points are generally interruptible. Thus, the effectiveness of this mechanism is somewhat limited, but for cases where it is helpful it can be employed:

import cats.effect.{IO, IOApp}
import scala.concurrent.duration._

object InterruptExample extends IOApp.Simple {
  val run = {
    val sleepy = IO interruptible {
      Thread.sleep(10000)
      println("slept!")
    }

    sleepy.timeoutTo(100.millis, IO.println("abort abort abort"))
  }
}

The above program will terminate within about 100 milliseconds printing abort abort abort. Replacing interruptible with blocking would be semantically correct, but in that case the program would run for the full 10 seconds before printing slept! followed by abort abort abort. The latter semantic happens because blocking (like delay) is effectively uncancelable, meaning that the timeoutTo cancelation is unable to terminate the action early.

interruptible is not a safe drop-in replacement for blocking in all cases, since it is extremely common for JVM code, whether written in Scala or in Java, to be not interruption-safe. You should be careful to make sure that the side-effect in question will respond properly to interruption without leaking resources before taking advantage of this functionality. There are also some minor performance implications (blocking is slightly faster), though they are not expected to have a meaningful impact in most applications.

Tracing on Scala.js

Asynchronous fiber traces have been available on Cats Effect on the JVM since 2.3.0 (with a brief hiatus during the 3.0 era), but it's always been known that it should be possible (at least in theory) to bring the same technology to Scala.js. Tracing works by constructing a Throwable at IO call sites and walking the stack trace to determine the most semantically reasonable caller for a given combinator. That stack frame is then pushed into a ring buffer and carried along with the fiber. Of course, constructing Throwables in this fashion is quite slow, so we recover performance by caching this process based on the java.lang.Class instance of whatever thunk or Function value was passed to the combinator. This cache converges quite quickly on most applications (since it is only proportional to the size of the calling code), which in turn means that tracing in practice has a very minimal performance cost. (roughly 30% on worst-case synthetic benchmarks, closer to 5-10% in more realistic compute-bound tests, and entirely unmeasurable in I/O-bound applications)

These implementation details are relevant because, at least in theory, all of these same mechanisms exist on Scala.js! We can construct exceptions and walk their call stacks on that platform just as well as we can on the JVM, and while it isn't meaningful to ask for the Class of an AnonFunction, we can get access to its source code, which gives us at least something we can hang our hats on.

Thanks to the hard work of Arman Bilge, we have now done exactly this! Cats Effect applications on Scala.js are tracing-enabled and support all of the same features as the JVM tracing, including enhanced exceptions and fiber dumps. Unfortunately, JS tracing is a little less "set and forget" than JVM tracing. In practice, enabling tracing in a Node.js application imposes a nearly 100% performance penalty. The exact reasons for this are less than clear, but it seems to have something to do with the way that most JavaScript runtimes (but particularly V8) are optimized to assume that exceptions are very rare. Constructing an exception seems to cause the runtime to actively deoptimize itself, which has follow-on ripple effects across the application.

Additionally, while traces are very cool, they're not particularly helpful if they cannot be linked back to the original source code. This is relatively easy on the JVM since debug symbols are embedded in class files and parsed into Throwable stack traces by the runtime, but it's less straightforward on Scala.js where the actual exception source references would normally be to the generated JavaScript and not to the original Scala.

This problem is resolved through the use of source maps, a bit of metadata which allows the generated JavaScript to maintain a map which links back into the original source files which produced it. Cats Effect tracing is fully aware of this mechanism, and if it is enabled, will automatically use it as part of its trace output.

All of this comes together to give us the following!

java.lang.Throwable: Boom!
  at null.<anonymous>(/workspace/cats-effect/example/js/src/main/scala/cats/effect/example/Example.scala:35:55)
  at apply @ <jscode>.null.<anonymous>(/workspace/cats-effect/example/js/src/main/scala/cats/effect/example/Example.scala:35)
  at ifM @ <jscode>.null.<anonymous>(/workspace/cats-effect/example/js/src/main/scala/cats/effect/example/Example.scala:35)
  at apply @ <jscode>.null.<anonymous>(/workspace/cats-effect/example/js/src/main/scala/cats/effect/example/Example.scala:34)
  at flatMap @ <jscode>.null.<anonymous>(/workspace/cats-effect/example/js/src/main/scala/cats/effect/example/Example.scala:34)
  at apply @ Example$.fib(/workspace/cats-effect/example/js/src/main/scala/cats/effect/example/Example.scala:24)
  at flatMap @ Example$.fib(/workspace/cats-effect/example/js/src/main/scala/cats/effect/example/Example.scala:24)
  at apply @ Example$.fib(/workspace/cats-effect/example/js/src/main/scala/cats/effect/example/Example.scala:24)
  at flatMap @ Example$.fib(/workspace/cats-effect/example/js/src/main/scala/cats/effect/example/Example.scala:24)
  at apply @ Example$.fib(/workspace/cats-effect/example/js/src/main/scala/cats/effect/example/Example.scala:24)
  at flatMap @ Example$.fib(/workspace/cats-effect/example/js/src/main/scala/cats/effect/example/Example.scala:24)
  at apply @ Example$.fib(/workspace/cats-effect/example/js/src/main/scala/cats/effect/example/Example.scala:24)
  at flatMap @ Example$.fib(/workspace/cats-effect/example/js/src/main/scala/cats/effect/example/Example.scala:24)
  at apply @ Example$.fib(/workspace/cats-effect/example/js/src/main/scala/cats/effect/example/Example.scala:24)
  at flatMap @ Example$.fib(/workspace/cats-effect/example/js/src/main/scala/cats/effect/example/Example.scala:24)
  at apply @ Example$.fib(/workspace/cats-effect/example/js/src/main/scala/cats/effect/example/Example.scala:24)
  at flatMap @ Example$.fib(/workspace/cats-effect/example/js/src/main/scala/cats/effect/example/Example.scala:24)
  at apply @ Example$.fib(/workspace/cats-effect/example/js/src/main/scala/cats/effect/example/Example.scala:24)
  at flatMap @ Example$.fib(/workspace/cats-effect/example/js/src/main/scala/cats/effect/example/Example.scala:24)

This aligns very nicely with the output produced by the same program on the JVM:

java.lang.Throwable: Boom!
        at Example$.$anonfun$program$5(Example.scala:35)
        at apply @ Example$.$anonfun$program$3(Example.scala:35)
        at ifM @ Example$.$anonfun$program$3(Example.scala:35)
        at apply @ Example$.$anonfun$program$1(Example.scala:34)
        at flatMap @ Example$.$anonfun$program$1(Example.scala:34)
        at apply @ Example$.fib(Example.scala:24)
        at flatMap @ Example$.fib(Example.scala:24)
        at apply @ Example$.fib(Example.scala:24)
        at flatMap @ Example$.fib(Example.scala:24)
        at apply @ Example$.fib(Example.scala:24)
        at flatMap @ Example$.fib(Example.scala:24)
        at apply @ Example$.fib(Example.scala:24)
        at flatMap @ Example$.fib(Example.scala:24)
        at apply @ Example$.fib(Example.scala:24)
        at flatMap @ Example$.fib(Example.scala:24)
        at apply @ Example$.fib(Example.scala:24)
        at flatMap @ Example$.fib(Example.scala:24)
        at apply @ Example$.fib(Example.scala:24)
        at flatMap @ Example$.fib(Example.scala:24)
        at apply @ Example$.fib(Example.scala:24)
        at flatMap @ Example$.fib(Example.scala:24)
        at apply @ Example$.fib(Example.scala:24)
        at flatMap @ Example$.fib(Example.scala:24)

Due to the performance costs associated with tracing on JavaScript, it is only enabled under fastOptJS. It is assumed that any production deployment of an application will use fullOptJS, which automatically disables tracing and restores full performance (all tracing call sites will be inlined away by the JavaScript runtime).

Dynamically Self-Tuning Fairness

While the Cats Effect fiber scheduler was originally derived from algorithms and data structures pioneered by the Tokio project, it has since evolved in several significant areas which result in better performance and tighter optimization with respect to both the JVM in general and conventional Cats Effect usage patterns specifically. One of these areas is the batched queue.

In any work-stealing scheduler, there are at least n + 1 work queues: one shared "external" queue, and one unshared local queue for each worker. The whole system is optimized around using the local queues for as much as possible, and keeping work in the local queues whenever there is an option to do so. Workers steal from each other's local queues when they run out of work, and periodically pull from the shared external queue, which in turn is only used for new incoming work which came from entirely outside of the thread pool (such as async continuations).

One of the key insights coming from Tokio is the encoding of a local queue which is efficient to steal from, efficient to locally poll/push, and removes contention points between all three operations. In order to do this, the local queues are all fixed to a total capacity of 256 fibers, and they can never contain any more than this. Of course, at some point, a local queue may overflow, meaning that a worker has more than 256 fibers which it needs to manage. This can happen either due to internal forking (e.g. use of the start operation) or polling from the external queue (which can bring a fiber into an already-full local queue). When this happens, we need a place to send those fibers.

Tokio solves this by taking 128 fibers (half of the local queue) and pushing each one individually onto the external queue. The halving property means that this kind of overflow happens relatively infrequently, and while it does sacrifice fairness for that back half of the queue, it at least avoids starving other sources which are attempting to enqueue work. Cats Effect improves upon this process to increase its efficiency and reduce the impact to fairness.

In particular, when this overflow happens, Cats Effect's fiber runtime places the overflowed fibers onto a batched queue, which maintains the fibers as a large batch in a shared space. Just like the external queue, all fibers can pull from this batched queue, and when they do they will receive a large set of fibers rather than pulling just one at a time from the external queue. In order to maintain fairness with the external queue, workers alternate between pulling from the batched queue or the external queue once every 64 steps.

This optimization works quite well on average, but it risks over-fitting to artificial benchmarks. In particular, it makes a number of assumptions. For one thing, it assumes that internal start operations (which also occur as part of par operators and in other circumstances) are dramatically more common than external async continuations. This is often true in Cats Effect applications, but it is not a guarantee, and in any application with a lot of async and very little start, the async continuations will take longer to complete than they could since workers will be wasting time polling the batched queue unnecessarily often. Conversely, if an application has even more starting than expected, but very little async, the external queue would be the one taking unnecessary attention away from the batched queue.

The goal should be to allow the pull to dynamically self-tune between these extremes depending on workload. An application with a lot of starting should see more time devoted to batch processing, while the inverse should be true for an application with a lot of async. The solution here is to build a virtuous cycle into the data structures by collapsing the external and batched queues into a single polymorphic queue which can contain either Fiber or Array[Fiber] elements, with the latter representing batches that had previously been spilled. This queue is polled once every 64 steps. When workers spill to it, they take 128 of their local fibers and split them into four batches of 32, enqueuing each of the four batches onto the polymorphic external queue.

This approach maintains the performance benefits of batching (optimizing for start-heavy workloads, which seem more common in Cats Effect than in Tokio applications), while maintaining the flexibility needed to dynamically optimize for async-heavy workloads when needed. Additionally, monotonicity, fairness, and amortization arguments are all preserved by the careful choice of coefficients (4 batches, 32 fibers, 64 steps, 128 fibers spilled, 256 fibers in local queue). This in turn ensures that while a fiber may have its execution slot delayed considerably in pathological cases, it is always guaranteed to come to the front of a worker eventually, preserving progress and performance simultaneously.

IO Run Loop No Longer Nests During Finalization

This is a rather strange bug that has been around since the earliest builds of Cats Effect 3 that included IO itself. Specifically, it has to do with the way in which IO rebuilds the run loop for a fiber when that fiber is being canceled. What used to happen is a brand new run loop would be created, clearing out and reusing the data structures which managed the old one, and then this process would repeat for each finalizer (evaluated inside-out). This meant that any fiber which was canceled within an arbitrarily long stack of finalizers (a situation which is relatively easy to construct, albeit with some contrivance) would in turn result in arbitrarily many copies of the run loop, each nested within each other.

As if this weren't bad enough, this process of recreating the run loop also would reset the counters which are used to insert fairness boundaries, meaning that a fiber could be constructed such that its cancelation could hog a worker thread for some arbitrary length of time, starving the pool.

We're very confident that no one has observed this in production, but it's still certainly a severe enough bug that it merits special attention. The fix should also have the pleasing side-effect of improving both throughput and fairness in cancelation-heavy workloads.

User-Facing Pull Requests

  • #2543 – Inline the head and the tail of the local queue while avoiding false sharing (@vasilmkd)
  • #2545 – Implement signalled fiber dumps for JS (@armanbilge)
  • #2578 – Do not update LocalQueue counters if tracing is disabled (@vasilmkd)
  • #2577 – Ignore MBean deregistration exceptions (@vasilmkd)
  • #2413 – Augment IOFiber#toString w/ trace from IO (@armanbilge)
  • #2556 – MBeans (@vasilmkd)
  • #2557 – JS trace filtering fixes (@armanbilge)
  • #2555 – Remove evalOnR as it is a duplicate of autoCedeR (@vasilmkd)
  • #2548 – Reflectively register signal handling if available (@vasilmkd)
  • #2549 – Optional active fiber (@vasilmkd)
  • #2550 – Print fiber dump incrementally rather than in one large string (@djspiewak)
  • #2547 – Restore join and cancel behavior to -Xcheckinit days (@vasilmkd)
  • #2546 – Define and implement live fiber snapshot MBean (@vasilmkd)
  • #2542 – Reformatted fiber dump (@djspiewak)
  • #2531 – Handle live fiber snapshot signals on the JVM (@vasilmkd)
  • #2541 – Undo workarounds for bitmap$init (@armanbilge)
  • #2534 – Compute pool metrics (@vasilmkd)
  • #2540 – Remove -Xcheckinit compiler flag (@armanbilge)
  • #2533 – Fix totalFiberCount on enqueueBatch (@vasilmkd)
  • #2528 – Use the tracing constants for speed (@vasilmkd)
  • #2525 – Order snapshot data by trustworthiness (@vasilmkd)
  • #2523 – Live fiber snapshot on the JVM (@vasilmkd, @djspiewak)
  • #2246 – Implemented timeoutAndForget for leaky timeout scenarios (@djspiewak)
  • #2520 – Set of foreign fibers (@vasilmkd)
  • #2518 – Suspended fibers snapshot (@vasilmkd)
  • #2517 – A snapshot of live fibers on the work stealing thread pool (@vasilmkd)
  • #2471 – Allocate callback wrapper once on push (@vasilmkd)
  • #2513 – Implement the active fiber tracking mechanism (@vasilmkd)
  • #2516 – Snapshots of queues (@vasilmkd)
  • #2510 – Fix weak bag for the blocking mechanism (@vasilmkd)
  • #2509 – Only monitor suspended fibers if tracing is enabled (@vasilmkd)
  • #2507 – Track fibers across IO.blocking and IO#evalOn (@vasilmkd)
  • #2505 – Rename SuspendedFiberBag to FiberMonitor (@vasilmkd)
  • #2506 – Create fiber-aware EC for JS (@armanbilge)
  • #2492 – Worker-thread-local weak bag for tracking suspended fibers (@vasilmkd)
  • #2488 – Disable JS SuspendedFiberBag in production (@armanbilge)
  • #2487 – Monitor suspended fibers on all platforms (@vasilmkd)
  • #2433 – Implement SuspendedFiberBag for JS (@armanbilge)
  • #2465 – Reorder branches in run loop (@vasilmkd)
  • #2463 – Add configuration forwarders to IORuntime and remove prefetched values in IOFiber (@vasilmkd)
  • #2429 – Introduce parReplicateAN (@RafalSumislawski)
  • #2453 – Masking without an initMask field (@vasilmkd)
  • #2459 – Remove the InterruptibleMany prefetch (@vasilmkd)
  • #2462 – Remove the bespoke execution contexts stack (@vasilmkd)
  • #2452 – Remove trace case and restore tableswitch in immediate interpreters (@vasilmkd)
  • #2443 – Inline continuations which call succeded/failed (@vasilmkd)
  • #2438 – Stop nesting run loops on cancelation (@vasilmkd)
  • #2434 – Initialize tracing buffer if needed (@vasilmkd)
  • #2411 – Async cancelation should first re-enqueue the fiber (@vasilmkd)
  • #2408 – Global GC fiber bag implementation (@vasilmkd)
  • #2422 – Remove duplicate resumption states (@vasilmkd)
  • #2394 – Add a configurable timeout for the JVM shutdown hook (@vasilmkd)
  • #2375 – Discipline ParallelF (@armanbilge)
  • #2410 – Override IOFiber#toString (@djspiewak)
  • #2414 – Streamline EC#execute usage in IOFiber (@vasilmkd)
  • #2409 – Use explicit isInstanceOf followed by asInstanceOf (@vasilmkd)
  • #2396 – Eliminate branching and unreachable JS code when tracing (@vasilmkd)
  • #2361 – Bye bye helper thread (@vasilmkd)
  • #2403 – Fixed PureConc finalizers and further optimized par (@djspiewak)
  • #2383 – Revised LocalQueue metrics (@vasilmkd)
  • #2367 – Make JS IORuntime externally configurable (@armanbilge)
  • #2390 – Add support for long sleeps in JS scheduler (@armanbilge)
  • #2366 – Disable JS tracing in production (@armanbilge)
  • #2343 – Split interruptible (@alexandrustana)
  • #2276 – Added TestControl, with first-class support for time mocking (@djspiewak)
  • #2345 – Overflow and batched queue unification (@vasilmkd)
  • #2348 – Clear the tracingEvents ring buffer when a fiber ends (@milanvdm)
  • #2341 – Test laws for StateT[IO] (@vasilmkd)
  • #2338 – Replacing IOFiber#execute with IOFiber#scheduleFiber (@ixthe0)
  • #2273 – Fix #2236 implement Async.fromThenable (take 2) (@armanbilge)
  • #2322 – Do not alter the resume tag in execR (@vasilmkd)
  • #2294 – Prevent eager creation of TimeoutException in IO.timeout (@ixthe0)
  • #2316 – Replaced PolyfillExecutionContext with macrotask-executor (@djspiewak)
  • #2250 – Added a measurement for timer precision (@djspiewak)
  • #2302 – Remove unnecessary trait extensions (@vasilmkd)
  • #2207 – Implement tracing for JS (@armanbilge)
  • #2576, #2568, #2365, #2522, #2535, #2515, #2501, #2480, #2426, #2421, #2398, #2415, #2399, #2392, #2382, #2373, #2356, #2335, #2307, #2289 – Documentation fixes and improvements (@armanbilge, @alexandrustana, @kamilkloch, @djspiewak, @LLCampos, @lrodero, @i10416, @dtuttleo, @amohrland, @Baccata, @nikiforo, @vasilmkd, @gruggiero, @sh0hei)

Special thanks to each and every one of you! You are all awesome.