Cadence Versions Save

Cadence is a distributed, scalable, durable, and highly available orchestration engine to execute asynchronous long-running business logic in a scalable and resilient way.

v1.2.8

1 month ago

What's Changed

Added

Changed

Fixed

  • Set proper max reset points by @neil-xie in #5623
  • Put a timeout for timer task deletion loop during shutdown by @taylanisikdemir in #5626
  • Catch unit test failures in make test by @Groxx in #5635
  • fix: get messages between query over message_id typo by @zedongh in #5607
  • Fix context leak in tests by @munahaf in #5377
  • Make sure task processing rate limiter is only done in the active side by @sankari165 in https://github.com/uber/cadence/pull/5654
  • Fix Pinot query validator bug when user pass in not equal query with value missing by @neil-xie in #5662
  • Update Pinto query validator failed log, minor refactor pinot visibility store to remove panics by @neil-xie in https://github.com/uber/cadence/pull/5664
  • Fix context leak in pinot integration test by @neil-xie in #5682
  • Fix SignalWithStartWorkflow API by @Shaddoll in #5671
  • Fix wrong migration paths in example by @kotcrab in #5668
  • Fix comment in workflow id cache config by @sankari165 in #5661
  • Fix the local integration test docker-compose file by @jakobht in https://github.com/uber/cadence/pull/5695
  • Do not get workflow execution from database when shard is closed by @Shaddoll in https://github.com/uber/cadence/pull/5697

Removed

New Contributors

Full Changelog: https://github.com/uber/cadence/compare/v1.2.7...v1.2.8

v1.2.7

3 months ago

What's Changed

Added

Fixed

Changed

Removed

New Contributors

Full Changelog: https://github.com/uber/cadence/compare/v1.2.6...v1.2.7

v1.2.6

5 months ago

What's Changed

Added

  • Added range query support for Pinot json index by @bowenxia (#5426)
  • Implemented GetTaskListSize method at persistence layer by @Shaddoll (#5442, #5447)
  • Added a framework for the Task validator service by @agautam478 (#5446)
  • Added nit comments describing the Update workflow cycle @agautam478 (#5432)
  • Added log user query param by @bowenxia (#5437)
  • Added CODEOWNERS file by @taylanisikdemir (#5453)
  • Added a function to evict all elements older than the cache TTL by @jakobht (#5464)

Fixed

  • Fixed workflow replication for reset workflow by @Shaddoll (#5412)
  • Fixed visibility mode for admin when use Pinot visibility by @neil-xie (#5441)
  • Fixed workflow started metric by @ketsiambaku (#5443)
  • Fixed timer-fixer, unfortunately broken in 1.2.5 by @Groxx (#5433)
  • Fixed confusing comment in matching handler by @jakobht (#5450)

Changed

  • Cassandra version is changed from 3.11 to 4.1.3 by @taylanisikdemir (#5461)
    • If your machine already has ubercadence/server:master-auto-setup image then you need to repull so it works with latest docker-compose*.yml files
  • Move dynamic ratelimiter to its own file by @jakobht (#5451)
  • Create and use a limiter struct instead of just passing a function by @jakobht (#5454)
  • Dynamic ratelimiter factories by @jakobht (#5455)
  • Update github action for image publishing to released by @3vilhamster (#5460)
  • Update matching to emit metric for tasklist backlog size by @Shaddoll (#5448)
  • Change variable name from SecondsSinceEpoch into EventTimeMs by @bowenxia (#5463)

Removed

  • Get rid of noisy task adding failure log in matching service by @taylanisikdemir (#5445)

New Contributors

Full Changelog: https://github.com/uber/cadence/compare/v1.2.5...v1.2.6

v1.2.5

6 months ago

What's Changed

Added

  • Scanner / Fixer changes by @Groxx in https://github.com/uber/cadence/pull/5361
    • Stale-workflow detection and cleanup added to shardscanner, disabled by default.
    • New dynamic config to better control scanner and fixer, particularly for concrete executions.
    • Documentation about how scanner/fixer work and how to control them, see the scanner readme.md
    • This also includes example config to enable the new fixer.
  • MigrationChecker interface to expose migration CLI by @abhishekj720 in https://github.com/uber/cadence/pull/5424
  • Added Pinot as new visibility store option by @neil-xie in https://github.com/uber/cadence/pull/5201
    • Added pinot visibility triple manager to provide options to write to both ES and Pinot.
    • Added pinotVisibilityStore and pinotClient to support CRUD operations for Pinot.
    • Added pinot integration test to set up Pinot test cluster and test Pinot functionality.

Fixed

Full Changelog: https://github.com/uber/cadence/compare/v1.2.4...v1.2.5-prerelease3

v1.2.4

7 months ago

What's Changed

Full Changelog: https://github.com/uber/cadence/compare/v1.2.3...v1.2.4

v1.2.2

7 months ago

What's Changed

Full Changelog: https://github.com/uber/cadence/compare/v1.2.1...v1.2.2

v1.2.1

7 months ago

v1.2.3

7 months ago

Added

Expose workflow history size and count to client by @timl3136 (#5392)

Fixed

[cadence-cli] fix typo in input flag for parallelism by @sankari165 (#5397)

Changed

Update config store client to support SQL database by @Shaddoll (#5395) Scaffold config store for sql plugins by @Shaddoll (#5396) Improve poller detection for isolation by @Shaddoll (#5399)

v1.0.0

1 year ago

We are v1.0! (with a schema upgrade)

What does this mean?!

Not much. Primarily that we are declaring "it's stable and in use" more visibly, because we continually get questions about this :) A larger public announcement / state-of-the-project is in the works.

Importantly, v1.0 does not imply any change to backwards compatibility (the minimum supported client version has not changed), RPC compatibility (ditto, all changes are backwards compatible), or Go API compatibility (this is not truly a library, Go compatibility is not a goal).

Going by previous version patterns, this would have been labeled v0.26.0 as it is a relatively incremental change (plus schema changes) from v0.25.0. As such, some strings still reference "0.26", because this older SHA is the one we have been using the most internally.
These strings will be updated and validated soon, and will likely be released as v1.0.1. This should have no behavioral impact at all, but will be visible in metrics, logs, and display strings.

What do I need to do to upgrade?

Schema upgrades needed

There have been schema changes to both normal and visibility datastores, primarily to provide better data for cleanup and hot-shard detection:

These were intentionally kept out of v0.25.0 to keep that upgrade simple, as they were not fully utilized yet.

Replication cache recommendation

We have internally disabled the replication cache (history.replicatorCacheCapacity dynamic config set to 0), due to unexpectedly large memory use under abnormal load, and you may wish to do so as well.

We did not encounter any misbehavior, and it did reduce database load as intended, but we intend to make some changes to it to estimate and constrain memory use before re-enabling.

What has changed?

At a very high level, we've been focused on:

  • Internal scaling challenges, both improving bottlenecks and improving our ability to accurately identify bottlenecks
    • Many metrics, logs, and refactors are at least somewhat related to this
    • Our multi-cluster support is improved in particular, as we have been connecting clusters and moving many domains to spread load more evenly
  • Database corruptions, as our Cassandra clusters have had some problems that cause issues for months
    • Many logs, scanner, and stale-task changes are related to this, e.g. to detect and remove invalid data
  • Scaling up the team
    • More changes to come!

Some loosely categorized PRs that were included follows:

Critical bugfixes (resolving issues in v0.25.0)

Parent-close-policies apply to child workflows even after they reset/continue-as-new/etc

  • Update parent close policy to terminate/cancel child workflows even after continue as new by @Shaddoll in https://github.com/uber/cadence/pull/5032
    • This requires new stored data, so it does not apply to child workflows started before this version.

Better config introspection

Schemas are now available via the go module, as go:embed files

Enhancing existing metrics and logging (and more included in other PRs)

Misc

New Contributors

Full Changelog: https://github.com/uber/cadence/compare/v0.25.0...v1.0.0

v0.25.0

1 year ago

Important Notice: If you're experiencing OOM after deploying this version, please update this dynamic property to disable replication cache.

history.replicatorCacheCapacity:
- value: 0

Per-domain metrics

  • 483a1492d Introduce per domain metrics (#5012)
  • e87bd74da Added logs for domainName empty situation (#4987)
  • c8783f0b3 Addition of domainName tag to Replication task (#4975)
  • 88991f2ff Addition of domain tag for Replication task metric (#4974)
  • e69dbd6a6 Added changes to readHistoryBranchRequest (#4972)
  • 76a025a7a Added domainName change to remaining functions of appendHistoryNodeRequest and RecordWorkflowExecutionUninitializedRequest (#4968)
  • 0f590423c Added changes to archival client (#4958)
  • d1965b1ab Added domain Tag to UpdateTaskList,DeleteTaskList,LeaseTaskList,CompleteTask and CompleteTaskLessThan (#4950)
  • 4c8013d76 Added changes to GetTask and CreateTask (#4947)
  • e88a9c7ad Added changes to PutReplicationTaskToDLQ and IsWorkflowExecutionExists (#4946)
  • b9b8b42b9 Added changes to DeleteCurrentWorkflowExecution and GetCurrentExecution (#4944)
  • 8c5f2ffb4 Added changes to ConflictWorkflowExecution and DeleteWorkflowExecution (#4943)
  • 13a130be7 Added changes to GetWorkflowExecution and UpdateWorkflowExecution (#4938)
  • 2bb13a17d Added DomainTag changes to ReadHistory branch for readHistoryRequest, CreateWorkflowRequest + added DomainCacheNoOp file (#4930)
  • c091a4960 Changed DeleteHistoryBranch and GetHistoryTree by adding Domain Tag with mocks (#4928)
  • b34f4e4b9 Adding DomainTag to the ForkHistoryBranch, ReadRawHistoryBranch and ReadHistoryBranchByBatch (#4926)
  • 6cf4252d4 Adding DomainTag to the Persistence metrics client (#4922)
  • c3f7bd347 Addition of DomainTag to required functions for the creation of metrics required for Domain Cost Attribution (#4908)

Replication improvement

  • 62428546f Immediate replication task hydration after successful transaction (#4980)
  • beaf67011 Return early when there are not replication tasks (#4982)
  • d38b08e45 Add Metric Emitter, which emits a metric once a minute for true replication lag in nanoseconds. (#4979)
  • 1a2804dc7 Reduce metrics cardinality for replication.TaskStore (#4981)
  • 93a6f2348 Return persisted history events blob (#4953)
  • 1be9b6d6a Replication cache for sharing hydrated messages (#4952)
  • 457c35e4f Partial response of GetReplicationMessages on history service (#4935)
  • d739bf5f6 Helpers for getting enabled and remote cluster info (#4951)
  • 385c1c368 Adds more pertinent information about replication (#4931)
  • fe3bf0b6b Refactor task ack manager (#4894)
  • 83aa1938a Removed TaskID from types.HistoryTaskV2Attributes (#4876)

Observability improvement

  • 1e788db68 Add domain_type and cluster_groups tags (#4990)
  • ff113929f Improve logs for task executor (#4989)
  • e597b8724 Add logs to debug transfer task (#4970)
  • 177f08713 Improve log for transfer task validator (#4961)
  • b0d1f06e8 Capture CassandraLWT error and log/bump metrics for it. (#4888)
  • 50d331a4b add activity info logging (#4867)
  • 93bda8f59 \adence-history does not emit continue-as-new metrics (#4866)
  • 7854f812d Add empty response metrics for read operations (#4855)
  • 471e6d164 Log replication messages that did not fit (#4844)
  • b03d03e77 add metric tags for activity task disaptch (#4821)
  • d21162d22 Add logs for domain failover (#4810)
  • 400bbe46b Improve failover coordinator error logging (#4811)
  • a51b61349 Log error fields as tags (#4801)
  • c59865478 Improve task re-dispatch error logging (#4809)
  • 22f97c80b Log error when fetchHistoryFromRemote fails (#4807)
  • 33edece45 Add source_cluster tag when emitting DLQ size (#4782)

Activity dispatch optimization

  • 52203abc1 count local and server optimized activity dispatches as started (#4901)
  • bafdf15b1 do not wait for activity task channel if sync match from history (#4860)
  • 361edb68d add activity dispatch configs to matching (#4818)
  • e77b43dd0 add activity dispatch configs (#4816)
  • 2b0b03f69 updated idl for activity task dispatch (#4815)
  • 2890600be add data contract for activity task dispatch (#4813)
  • cda6c5324 set EnableActivityLocalDispatchByDomain default value to true (#4788)

Restart workflow

  • e5036ed7c CDNC-1781 Add restart command/api (#4900)

Cross Cluster operations

  • e5ed7f726 Feature/adding canary for cross cluster -> readme patch (#4870)
  • 68fb2e60d Adds cross-cluster canary (#4868)

Corrupted workflows

  • 79437b3d3 Introduce a dynamic config for cassandra all consistency level delete (#5000)
  • 052d77c59 Update Cassandra deletes to use ALL consistency level (#4984)

Cancel workflow

  • add4b390a Standardizing cancellation behavior: a canceled workflow never starts a new run (#4898)
  • f1c557870 adding reason to cancel workflow. (#4934)

Failover lockdown

  • 147172c1d Feature/cdnc 2263 Add toggle which can block domain failovers (#4786)

Bug fixes

  • c2ffb71dd Adds fix for domain ack level issue (#5001)
  • 3985fec96 Fix history corruption check for workflow signaling (#4998)
  • 1375e49ca Revert "Fix error conversion for WorkflowExecutionAlreadyStartedError (#4838)" (#4999)
  • 494f202d6 Fix status check for visibility and archival (#4864)
  • a7270495a Bugfix/correct failover issue target domain not active ii (#4840)

Misc improvements & updates

  • 78a755c7a Add new unit test (#5008)
  • 278a3b8a0 Re-enable workflow test (#5007)
  • 43c9ebc5f Fix Cadence CLI (#5005)
  • 146bc31b3 Update idls (#4997)
  • 6da9676b5 Convert client peer resolving errors to service transient errors (#4993)
  • a91a250ef Adding first scheduled time metadata field for cron workflows. (#4969)
  • 5eb67d147 Make test now passes locally (#4915)
  • 3aaa1e8e8 Allow docker compose to work with docker-compose-mysql.yml on M1 (#4983)
  • 854fc59f4 Run docker build on commits, to prevent docker build from breaking in the future (#4978)
  • 172abd6f4 Fix docker build. (#4977)
  • 701fb7061 Adding limit for amount of pending activties in mutable state. (#4959)
  • 6ecd1e4e7 Fixing test. (#4941)
  • d8cb61eb8 Upgrade Golang base images to remediate CVEs (#4957)
  • f2b210821 Simplify shard write operations (#4955)
  • 9949a22c1 Simplify history engine task read ID logic (#4949)
  • 756601890 fix funcorder linter (#4942)
  • b21f34f8a add funcorder linter (#4939)
  • e3496a308 Add List*Execution (ElasticSearch) API ratelimiters (#4925)
  • 85e0fee1f Fix flacky QueryWorkflow tests (#4932)
  • 341d9f081 Improve decode_thrift output (#4929)
  • a4d77f547 Fix query workflow high latency after a long inactive time (#4871)
  • 43a17d2f6 downgrade testify to fix monorepo (#4918)
  • ef8d11e33 Update revive to catch more defer/recover badness (#4917)
  • 82544de0c Replace unsafe usage of recover() in helper functions (#4913)
  • c06649e60 Fix remaining server lint warnings and make lint error by default. (#4911)
  • 8b42a6dcc Start fixing server lint warnings (#4909)
  • d2f72d88d Fix flaky retrypolicy tests. (#4905)
  • 25e221bcf Add new CI step for lint validation (#4903)
  • 64cb46fb9 Add new es record for uninitialized workflow execution (#4899)
  • 8c449b316 Add JitterDelay option when creating workflows. (#4886)
  • 1f8c93a91 reduce MatchingActivityTaskSyncMatchWaitTime default value (#4897)
  • 7da6bc024 [codegen] introduce gowrap for generating retryableClient (#4879)
  • ed2beb20f Separating tools dependencies from main dependencies (#4895)
  • de0992686 Minor makefile cleanup, verbose CI, fmt with a recent Go version (#4896)
  • cfd637e26 add mockery to go generate (#4887)
  • 6f9e2d9c3 upgrade go version to 1.17 in go mod and Buildkite dockerfile (#4889)
  • 663a041c9 Added support for network topology strategy (#4875)
  • ac107606e Move visibility operation from search attributes to indexer message (#4881)
  • 691bf3f82 Magically speed up integration tests by nearly 10x (#4892)
  • e9915ae66 Rename dockers default cluster name to match the other config files. (#4885)
  • aff5ecf6a Simplified FindFirstVersionHistoryByItem (#4882)
  • 4cfb74142 fix flaky TestDelayStartWorkflow (#4884)
  • 9f2190050 update generated code (#4880)
  • 600904405 Support allowed authenticators in tool (#4873)
  • f133d3c58 Add support for changing the gocql connect timeout (#4874)
  • dc5230f44 Update idl for StickyWorkerUnavailableError (#4869)
  • 9e6d122a7 Used exposed admin proto IDLs (#4865)
  • 093030526 Add visibility operation types to Kafka message (#4828)
  • ae1441294 Move some proto definitions to admin package (#4861)
  • af932bd81 Fix CLI rendering long workflow types (#4853)
  • b457b553e Make cluster.Metadata a struct and stop using mocks for it (#4851)
  • 12d8c5412 Add UpdateFromConfig function to schema tool library (#4848)
  • d6ae27853 Decouple domain cache entry from cluster metadata (#4847)
  • 15267b96f Separate buildkite pipeline for PRs (#4850)
  • 0582a58a8 Update SQL implementation of UpdateExecution to support async transaction (#4792)
  • 535cda845 Remove unused loggers from history (#4822)
  • 915a777c9 Simplify history builder (#4837)
  • beab75c6f Removing target-domain-not-active special-case handling (#4835)
  • a57590894 Extract Engine from matching handler (#4833)
  • 20329a2b7 Forward activity responses and heartbeats on failover as well (#4823)
  • fbfafb9f5 Update PROPOSALS.md (#4831)
  • 94fd0a65d Update roadmap.md (#4829)
  • 0a37a8b47 remove redundant type conversions for activity task dispatch (#4820)
  • ee5461b7c Check for resurrected activities during RecordActivityTaskStarted (#4806)
  • 4194b291d Remove unused PayloadSerializer param (#4827)
  • 45770c2e3 Add CustomDomain and Operator as default indexed keys (#4825)
  • eede46696 Fill domainID for backwards compatibility (#4819)
  • 8b100632b Fix error type returned from GetWorkflowExecution and DeleteWorkflowExecution (#4817)
  • fc9d5faec Change access dienied error type (#4808)
  • e91a5a7e4 Allow decoding thrift from base64 string via CLI (#4805)
  • 5be511b8d Update base image to Alpine 3.15 (#4804)
  • e1aaeb76e fix WriteFile fail err hidden by Close invaildargment (#4744)
  • 21537697c Minor makefile cleanup, gofmt (#4802)
  • 6ea8658c3 Only update maxReadLevel after successful re-acquire of shard (#4799)
  • b49002da7 Add jittered workflow deletion configuration (#4789)
  • 91579a15d Fix docker prometheus config for linux docker (#4793)
  • 7a1fe537d Wrap underlying cause for conditional update error (#4797)
  • 480c733d3 Double inline archival time limit defaults (#4796)
  • 8c6164192 Use errgroup.Group for fanout style workfloads (#4784)
  • 8845d979c Update EnableRecordWorkflowExecutionUninitialized flag to filter by domain name (#4904)
  • 807a0e289 Added API for retrieving DLQ message count (#4787)
  • ba4a5d951 Support refreshing long running workflows based on user config (#4770)

Cleanup & Refactoring

  • cb7987640 Drop dynamic config for gRPC message size (#5002)
  • 354e6b07c Removed replication mocks (#4883)
  • 608bcb5d6 Remove unused functions from TaskAckManager (#4872)
  • 9d4524183 Add helper function to list all dynamic config keys used in production (#4891)
  • 650cf8aff Refactor dynamic config (#4863)
  • e8a06cc3d Update the default values of dynamic config to not depend on static config (#4858)
  • 2408f9dd0 Removed unused internal type getters (#4852)
  • c6ce73249 Removed global domain enabled config (#4845)
  • 3a813e850 Remove domain cache from history/workflow (#4846)
  • 3cfcaeab5 Remove no-longer used dynamic configs (#4843)
  • 856d33fb2 Shard tag not needed in shard.Context (#4842)