Dapper Style Distributed Tracing Instrumentation Libraries
Released 0.9.0! A lot of updates here. Many things are not backward compatible
with 0.8.x, in particular the configuration. See the sample
for information
on usage and configuration changes.
In-progress Currently updating the documentation to be current. A lot of out of data material in the wiki.
samples
directoryIn addition to what you find in this repository, there are also other repositories:
Distributed Trace For Video Systems written by Michael Bevilacqua-Linn discusses our experiences at Comcast implementing distributed traces.
Money is a modular distributed tracing platform that can be seamlessly incorporated into modern applications. It's purpose is to provide a foundation for operational analytics through distributed tracing.
Money is built on Scala and Akka, to be non-blocking from the core. It is purposefully un-opinionated, keeping undesired pull-through dependencies to a minimum.
Money modules build on the core, so implementing tracing is a snap. From Apache Http Components to Spring 4, from thread pools to Scala Futures, Money has modules to support a wide range of architectures.
Money was inspired by inspired by Google Dapper and Twitter Zipkin; however there are some subtle yet fundamental differences between those systems and Money, the biggest one being...
In Dapper, a Span can encompass the communication between a client and a server. Let's use an example of an Order System calling an Inventory Management System. With Dapper, you could have the following:
The idea being that everything can be calculated when the data is at rest.
In Money, it is theoretically possible to do the same, but by default we always extend a span when the server starts processing. We do this because we like to record important notes by default, namely the span-duration and the span-success. Using the example above, with Money we would get:
... and on the server we would get...
There are a tradeoffs with any decision.
Here are some disadvantages:
Here are some advantages:
I can see rocks starting to fly here, and I understand. Money was not built in its present incarnation to support systems which generate 10s of millions of events per second. Money was built to use distributed tracing as a foundation for operational analytics.
We were much more interested in creating a standard set of metrics that we could use as a basis to perform operational analytics. As such, for us, every event does matter as we can build aggregates very easy (even success).
We have been able to instrument systems that do generate many millions of events per hour with success, but Money did not have the same considerations that went into the Dapper design. Being able to process our base metrics gets us closer to real-time understanding of processing; distributing the calculations to the origin systems gave us a lot of flexibility and speed in processing the data at rest.
We are committed to support sampling and are evaluating designs...ideas are welcome. Look for basic sampling to be added shortly
Zipkin comes with an entire infrastructure built around Scribe and Cassandra that actually allows you to see data. This is super cool, and something we aspire to complete. We have looked at Spark Streaming and Akka Cluster Sharding as implementation mechanisms (and have some prototype / experimental code to that end), but we have not yet gotten our act together.
One of the main advantages of Money is that it provides usable operational analytics out-of-the-box; whether reporting to Graphite, exposing data via JMX, and/or aggregating logs in Logstash. As such, we have been able to gain key insight into traces and performance using standard open source tools; here are some examples:
This depends on the scale of your implementation. Money tries to serve a wide range of implementations.
Certainly, if you want to implement an application that is serving 1000s or 10000s of request per second per JVM, Money should work for you. You can easily funnel data into your log aggregator or other reporting system and start getting the benefits immediately.
If your implementation is in the order of 50000+ RPS with lots of spans, then things will get difficult as you will have to manage a lot of data. Spooling span events to disk and sending them as you can is one approach. You can use FluentD, Heka, PipeD or something else to eventually get the data off of disk. Theoretically it is possible, but without sampling, Money is generating a ton of data. If you are not using that data for analytics, you can filter it out (or contribute back a sampling feature); either way, it becomes a challenge.
Add a dependency as follows for maven:
<dependency>
<groupId>com.comcast.money</groupId>
<artifactId>money-core_2.12</artifactId>
<version>0.9.0</version>
</dependency>