An EDN-like CRDT (Causal Tree) for Clojure & ClojureScript that automatically tracks history and resolves conflicts.
An EDN-like CRDT (Causal Tree) for Clojure(Script) that automatically tracks history and resolves conflicts.
Cause is like git for collaborative applications. It is designed to look and feel like regular EDN data (maps
, lists
), while also exposing the richness of the append only CvRDTs. Cause handles common collaboration problems like deterministically synchronizing complex data structures across devices, and tracking history. Rather than rely on a central authority, conflicts can be resolved on every client. This makes Cause well suited for decentralized p2p applications.
The implementation takes a lot of cues from Data Laced with History and prior art related to CRDTs.
Why create Cause? Why another CRDT?
Before starting Cause I was working on a rich text editor in Slate. I wanted online/offline sync, collaboration, granular version control, zero dependency on a centralized server, and all of this in one persistent data structure that's easy to render, edit and save to a database. Projects like Automerge and Schism offered compelling solutions, but fell short on one or more of the properties I was looking for.
This is what Cause strives to do:
One data structure for everything. In Cause this is the node
. Nodes are just triples of [id cause value]
. They store identity and causality, and can easily be sent between clients, written to a database and rendered. Multiple in-memory caches make the read and write performance of common node operations fast. And those caches can be reconstituted from a bag of related nodes at runtime, so at rest storage is reduced.
Persist all the data. Clojure's immutable data structures and immutable data in general is great! It's hard to call Cause immutable since it is designed to be used in distributed eventually consistent systems. Nodes will inevitably arrive in different orders and clients will often have different sets of nodes. What can be guaranteed is that no node is deleted. Cause is append only, with deletions being represented by adding tombstones that hide nodes instead of actually excising the original data. This is similar to Datomic and makes infinite undo and change tracking possible.
Simple conflict resolution. The functions that determine the current state of a causal collection should be easy to reason about. This makes them easier to develop correctly and easier to work with intuitively since they have very few corner cases. Most of this work happens in about 50 LOC with the causal.shared/weave-node
function and the two predicates that power it.
Idiomatic EDN. The higher level causal data structures that are built from nodes
implement many of the same protocols as their EDN counterparts. Most Clojure functions should just work on CausalLists and CausalMaps the same way they'd work on lists and maps.
Extended EDN. Causal collections have some properties that don't fit into existing Clojure collection protocols, specifically identity and history that can span multiple interrelated collections. These database-like properties can efficiently and easily be managed with a CausalBase.
node
- a triplet made of [id cause value]
id
- an easily sortable triplet of [lamport-ts site-id tx-index]
lamport-ts
- a logical clock incremented on every transaction. A natural int counter starting at 0
.site-id
- a random uid that represents a location (your computer, my computer)tx-index
- the order within a transaction. A natural int counter that is reset to 0
at the start of every transaction.
lamport-ts
and site-id
are the same for multiple nodes. Then tx-index
serves as the tie breakerlamport-ts
site-id
tx-index
is used keep them unique and orderedcause
- the identifier that "caused" a node
cause
is the id
of the preceding node, creating a linked listcause
can be a key
or an id
(it's normally a key
, but tombstone operations like undo might be caused by a more granular id
)value
is whatever you set it to, a string, a keyword, another causal collection
causal.core/hide
value. This is non destructive and enables synchronization and time travel.Nodes are all you need. From a bag of nodes we can consistently weave (build an ordered cache of) them every time. Nodes are also unique so we can deduplicate them across a chatty network. And they include complete history information: time = lamport-ts
, who = site-id
, transaction = lamport-ts
& site-id
, tx order = tx-index
. They do not include wall clock time, but they do have everything needed for infinite undo / redo as well as version control and blame tracking.
Cause trades a linear increase in spacial complexity (where n is the set of all additions and subtractions) for all the properties above. Lists are the most complicated to keep sorted and suffer from a linearly increasing time complexity on both reads and writes. Fortunately transacting contiguous sequences is O(n + m) and not O(n * m), so operations like pasting a large sequence stay linear.
This is the highest level abstraction and what most people will want.
(ns example
(:require [causal.core :as cause]))
(def cb (atom (cause/base))) ; like a database, but for nested causal collections
(def starting-data {:a 1 :b [2 3]})
(swap! cb cause/transact [[nil nil starting-data]]) ; inserts a root map collection with a nested list under the :b key
(def root-collection-uuid (cause/get-uuid (cause/get-collection @cb))) ; get the root collection with the single arity form of cause/get-collection
(swap! cb cause/transact [[root-collection-uuid :c 4] ; "assoc" :c 4
[root-collection-uuid :a cause/hide]]) ; "dissoc" :a
; Transactions are atomic so the addition of :c 4 and the removal of :a will happen at the same time
(cause/causal->edn @cb) ; {:b (2 3) :c 4}
(seq (cause/get-collection @cb)) ; ([[2 "BIW6uN8ONfCyf" 0] :c 4] -- the nodes that make up the root map
; [[1 "BIW6uN8ONfCyf" 3] :b :causal.base.ref/eA0tTyXym77oPL0Lf4X6P])
(seq (cause/get-collection @cb (:b (cause/get-collection @cb))))) ; ([[1 "BIW6uN8ONfCyf" 1] [0 "0" 0] 2] -- the nodes that make up the list under the :b key
; [[1 "BIW6uN8ONfCyf" 2] [1 "BIW6uN8ONfCyf" 1] 3])
; TODO: add more examples
; undo
; redo
; reset
; transact into specific points in a list
(ns example
(:require [causal.core :as cause]))
(def cl (atom (cause/list :a :b :c)))
(first @cl) ; [[1 "a-site-id" 0] [0 "0" 0] :a] -- [0 "0" 0] is the id of the root-node that every causal-list starts with
(second @cl) ; [[2 "a-site-id" 0] [1 "a-site-id" 0] :b] -- :b is "caused" by :a
(last @cl) ; [[3 "a-site-id" 0] [2 "a-site-id" 0] :c] -- :c is "caused" by :b
(swap! cl conj :d) ; :d will be inserted at the end, after :c
(cause/causal->edn @cl) ; (:a :b :c :d)
(ffirst @cl) ; [1 "a-site-id" 0] -- the ID of the first active element, in this case :a
(swap! cl cause/append (ffirst @cl) :ab) ; :ab will be inserted between :a and :b since (ffirst @cl) is the ID of :a
(cause/causal->edn @cl) ; (:a :ab :b :c :d)
(swap! cl cause/append (ffirst @cl) cause/hide) ; this will hide :a, cause is append only so nothing gets deleted
(cause/causal->edn @cl) ; (:ab :b :c :d)
(swap! cl cause/append (ffirst @cl) cause/hide) ; this will hide :ab
(cause/causal->edn @cl) ; (:b :c :d)
; seq and any function derived from Seqable (aka most functions implemented by CausalLists and CausalMaps) return nodes, not just values.
; This conveniently exposes id and cause metadata so you always have access to it.
(seq @cl) ; ([[2 "a-site-id" 0] [1 "a-site-id" 0] :b]
; [[3 "a-site-id" 0] [2 "a-site-id" 0] :c]
; [[4 "a-site-id" 0] [3 "a-site-id" 0] :d])
; Despite being append only and nodes only ever being hidden you should
; expect functions to behave like they're only working with the active nodes.
(count @cl) ; 3
(ns example
(:require [causal.core :as cause]))
(def cm (atom (cause/map :a 1 :b 2)))
; causal-maps are even simpler to work with.
; For the most part you can just assoc and dissoc and ignore ids.
(:a @cm) ; 1
(swap! cm assoc :c 3)
(get @cm :c) ; 3
(swap! cm dissoc :b)
(cause/causal->edn @cm) ; {:a 1 :c 3}
(seq @cm) ; ([[0 "a-site-id" 0] :a 1] -- exposes the ID of each node
; [[2 "a-site-id" 0] :c 3])
The API and data structures have no guarantee of stability until 0.1.0 is published (See Roadmap).
If you want to try this pre-release code that will likely change you can use git deps.
lein test
or with a watcher
lein bat-test auto
npm install
lein cljs-test
lein repl
user=> (cljs) ; to get a cljs repl
cljs.user=> :cljs/quit ; to go back to clj repl
Most Clojure collection functions should just work with causal collections. E.g. count
returns the number of active values, automatically ignoring hidden and special values. Or for a causal map (get (cause/new-causal-map :foo "bar") :foo)
will return "bar"
.
On the other hand, seq
and functions related to seq like first, last, next, rest, map
return active nodes, not just values. This preserves the helpful causal metadata like ids
and causes
so you have access to it while iterating over a collection.
Gotchas:
dissoc
'd from a causal-map you will not be able to get
it even though it does still exist in the underlying causal data structure and is only hidden with a tombstone.Causal collections will automatically track the order values are inserted into them via lamport timestamps. This includes deletions and potentially undo, redo and rewind operations. These values are all immutable, so the entire history of a causal collection exists within it. Blame information is also available via site-ids so you can see which site made each change.
Gotchas:
Data Laced with History ๐ blog, causal trees, swift
Causal trees: towards real-time read-write hypertext ๐ paper, causal trees, philosophy
Real-time Collaborative Editing with CRDTs ๐บ video, javascript, rust, ropes
aredington/schism
ย cljc
grizko/ron
ย compression, go
CRDT - An approach to async plugins and undo ๐ blog, go, xi, ropes
A Conflict-Free Replicated JSON Datatype ๐ paper, json
automerge/automerge
(the js implementation). This is the dream, but I want it in EDN instead of JSON and with more idiomatic ways of operating on the data (Clojure's core collection functions).Out of the Tar Pit ๐ paper, philosophy
CausalList
weave
function is idempotent and turn breaking edge cases into unit tests. Fix the edge cases.CausalMap
tx-index
transact
fn might automatically increment the tx-index when inserting a sequence of values
undo
, redo
, get history
for use in a timeline / changelog, reset
to a point in the history. There is a logical order to all nodes, for performance this will probably want to be stored as an additional vector inside the causal base data type.
hide
tombstones{:id :cause :value}
CausalRope
. This is not core to Clojure, but would be convenient for text editing.CausalVector
... I don't know how feasible this is, but it would be nice to haveCausalCounter
CausalSet
. Not a high priority, but would be nice to round out the collection types