A lightweight matching engine written in Kotlin using Event Sourcing, CQRS, FP and RP
This is a lightweight matching engine written in Kotlin.
If you are not familiar with matching engine, read this example. You are an apple farmers and you have 100 apples ripe to be sold. You can sell them in three ways. You can either let the customers knock your door, or you travel around in the town and knock each potential customer's door, or you could bring them to the farmer's market. You are not the only apple farmers in the town. Waiting to have customers turning up to your door and asking to buy your apples, you either are an apple farmer celebrity, or you are insane, or just plain lazy. You may be super hardworking that you would knock everyone's door to sell your apples, but I presume you would be arrested for the crime of disturbing the peace very soon. Statistically you have a better chance to sell your apples in the farmer's market, where the farmers and customers both have strong intentions to trade, and they are more committed to settle trades (exchange of goods and services). So assume you go to the farmer's market. There are tens of apple farmers, and thousands of customers. You could set up your booth and wait for the customers, which is better than waiting at home, but you are still passively selling. Or you could active go around and ask every stranger to buy your apples, probably with desperation on your face.
The farmer's market owner does not like farmers and customers running in the market like headless chickens, because it is simply dangerous and chaotic. The market owner announces that farmers and customers can now put their intention to either buying or selling in a centralised book anonymously, and the market owner will match the buying and selling in a fair and orderly manner. However, the market owner requires both buyers and sellers to commit to honouring the trades, that is, if you said you wanted to sell 50 apples for 1 pound each, you need to be ready to hand over 50 apples and get 50 pounds back in your pocket, and pay transaction fees to the market owner for helping you out. So selling and buying are more than intentions, they are obligations. So we call them Orders. The centralised book is called the Order Book.
Then some genius businessmen come and say, "I can buy the apples for my customers on their behalf, and I could deliver the apples to their doors, and of course, for a fee". So customers no longer need to even spend time out in the market. Brokers are the names of these businessmen. Fair enough, so customers can wait just sit home and enjoy the apples, knowing they have paid a fee to the Brokers for each apple they purchased.
There are also opportunists that watch the market and are willing to buy and sell at the same time. They aim to gain profit by arbitraging (Buy low, sell high). The market owner is very happy with these activities because it makes the market looks very busy and active, even when there are no real customers and farmers around, so the market owner called these opportunists "Market Makers". Market Makers provide Quotes that are usually in pairs (Buy and Sell). Some Market Makers prefer quoting in multiple levels (multiple pairs) at the same time, and this aggregation is called a Mass Quote.
The operation of the Order Book is the Matching Engine. And the farmer's market is the Exchange.
The Orders and Quotes sitting in the Order Book are passively waiting to be traded, so they are passive entries. While the new coming Orders and Quotes are called the aggressors.
The domain borrows a lot of languages and terms from the FIX Protocol, meanwhile abstract away from the verbose and vintage style of naming.
Determines when the Order is in effect.
The Book is the Aggregate Root that guarantees that each update is the result of one transaction.
The Book contains
The project is built with the following option:
gradle build test
Benchmarking is run under the following option:
gradle --no-daemon clean jmh
Somehow changing the annotation @Benchmark did not affect the generated JMS source. The issue was reported here.
And the report can be found under /build/reports/jmh/results.json
There are two basic constructs:
toString
The entire library focuses on the domain logic of a matching engine, and therefore, will not depend on any particular technology like Kafka or REST.
The simplest way to interpret Event-sourcing is as below:
"New State" = func (Event, State)
So each state transition is the consequence of an event. Events are played sequentially and therefore single-threaded. Each event has a monotonic increasing sequence number as the ID per aggregate.
Command Query Responsibility Segregation (CQRS), literally re-categorises the traditional CRUD into the followings
Combined with Event-sourcing, the application semantics can be as the followings:
Please note the Either.Left()
,which is the exception with any event, is only used when
The execute
function validates the command and decides which Events to be produced. The play
function only updates the state and absolutely nothing else.
There are at least four reasons for this:
play
function implies that it address two concerns in the evolution of the application - changes in business logic and changes in data structures. It violates the Single Responsibility Principle. The business logic should be moved to the execute
function so all business logic are cohesively there and all state changes are cohesively in the play
function.play
function is invoked during recovery, having business logic there risks inconsistent state if events generated by old business logic were re-played using new business logic in play
function.play
function.play
to execute
function means the business logic is only ever used once for an event. It would be (n + 1) times if the state were recovered n times. Why do it more than once while you could do it once only?The transaction is completed when all events are played and the final new state is computed.
Each aggregate can be recovered in isolation, given the fact that each aggregate has its own sequence of events. Events can be compacted as a snapshot to reduce the ever-growing number of events to be re-played and hence the recovery time. However, the snapshot is not planned to be implemented in this project, at least currently. This is namely the Memento pattern.
Aggregates are computed incrementally while the events are played. The transaction will collect the generated events and merge the aggregates. At the end of the transaction, there should be one final aggregate and a list of events in chronological order.
If the events were to be transported externally, it is recommended to group all events in one transaction into one transport message. In this way, the transaction either has happened or not at all from the external's perspectives, and therefore it ensures state integrity and consistency. It would be extremely difficult to reason or to recover the state if only half of the events of a transaction is played.
Machine-time and randomisation are strictly prohibited in the domain, because functions involving them are no longer deterministic, and therefore states recovered from re-playing events will be different from the previous.
Machine-time is stateful and randomisation is indeterministic. They are supplied outside of the domain.
As all instruction manuals warn their readers in the very last pages, I strongly warn anyone NOT TO USE CQRS AND EVENT-SOURCING AT THE SYSTEM LEVEL. They are meant to be used within the bounded-context and the applications in your system need to be well divided by domains. Check out Domain-driven development if you want to explore more.
Boundary needs to be set up around the domain in order to ensure the domain integrity and the success application of CQRS and Event-sourcing. Here are the rules:
In order to serialise the processing of the requests, each request for a book is partitioned to a designated queue to be processing. Also the events from each book are partitioned to preserve the sequence. Then a multiplexer will distribute the events to the corresponding firm partition in order to notify the interested firm when happened in the book.
I purposefully do not use floating-point numbers or BigDecimal
as prices. Long is used to represent a price throughout the project and it is not going to change. The reason is that floating-point numbers are not accurate and BigDecimal
is very slow. Moreover, there is no real need for matching engine to work with either of them. As long as the incoming orders or quotes have their prices normalised to a fixed decimal place and keep the precision somewhere else, the actual price can always be translated from the price as long in this domain. There is an alternative of BigDecimal
(The Money class) which is so much faster. But still Long will be used as the data type of price values in this project.
I benchmark the command validation and event playing to potentially resolve any performance deficiency. My choice of tool is JMH as JetBrains also used it in Kotlin Benchmarking.
I aim to achieve 100% code coverage because this is pure domain code and there should be no excuse of not testing all the code and branches. Also if I cannot achieve 100% I would consider the implementation or API not fluent enough in this project.
I intend to use as few dependencies as possible. However, I need support for Functional Programming and a fluent assertion framework as a minimum.
Production
Testing
This project was started as a pet project for myself to learn Kotlin only.
I have been working on Exchange systems for more or less 10 years. It was natural for me to pick the problem I was most familiar with. It is a summary or brain-dump of what I have learnt about the matching engine. I wrote new code in this project from scratch and purposefully did not even look at my old works, in order to ensure I have the knowledge in my head.
I am perfectly aware that this project may become something of commercial value. While I am sharing the source code of the matching algorithm here, I am also very happy to be consulted how to realise it as an enterprise system. Contact me if you are interested.
Strictly Programming - My technical blog