Brave Versions Save

Java distributed tracing implementation compatible with Zipkin backend services.

5.14.1

1 year ago

What's Changed

New Contributors

Full Changelog: https://github.com/openzipkin/brave/compare/5.14.0...5.14.1

5.13.11

1 year ago

What's Changed

New Contributors

Full Changelog: https://github.com/openzipkin/brave/compare/5.13.10...5.13.11

5.13.10

1 year ago

What's Changed

Full Changelog: https://github.com/openzipkin/brave/compare/5.13.9...5.13.10

5.13.9

2 years ago

What's Changed

New Contributors

Full Changelog: https://github.com/openzipkin/brave/compare/5.13.8...5.13.9

5.13.8

2 years ago

What's Changed

New Contributors

Full Changelog: https://github.com/openzipkin/brave/compare/5.13.7...5.13.8

5.13.7

2 years ago

What's Changed

Full Changelog: https://github.com/openzipkin/brave/compare/5.13.6...5.13.7

5.13.6

2 years ago

What's Changed

New Contributors

Full Changelog: https://github.com/openzipkin/brave/compare/5.13.5...5.13.6

5.13.5

2 years ago

What's Changed

New Contributors

Full Changelog: https://github.com/openzipkin/brave/compare/5.13.4...5.13.5

5.13.2

3 years ago

Brave 5.13 makes it safer to try emerging trace header formats.

(Http|Messaging|Rpc)Tracing.propagation()

This is an advanced topic about how propagation (ex which headers are sent or received) work. In summary, it is now easier to have one library, ex gRPC, accept a different format than another.

Brave 4 was released almost 4 years ago. Not only did Brave 4 support multiple instances of differently configured tracers, it allowed each to configurePropagation differently. For example, we build-in support for different B3 formats, including the more efficient single header variant. For years, sites could use alternate formats such as AWS, Stackdriver and emerging formats like W3C. Typically the bespoke formats are attempted, and if any problem we use B3.

Recently, we've learned some sites are being pushed into a less efficient and more complex W3C trace context format. This is caused by reasons including affinity for something called a standard, and defaults in some libraries. For example, OpenTelemetry includes B3 propagation, but they chose to disable it by default. This choice isn't uniform in OpenTelemetry: other distributions such as Amazon's defaults to interop with B3.

Before, most would try to change the tracer-scoped propagation format in response to this, that or mechanically convert "traceparent" to "b3". However, there's a problem with assigning Tracing.propagation(). It carries any penalty of performance and instability to all communication. This is too broad, as B3 is a de-facto standard. Only certain upstream and downstream services would disable it entirely. Before, we didn't have a way for users to choose what to do except on a per-tracer basis.

Now, you can isolate unstable or inefficient formats to only libraries that need them.

Ex You can choose to use W3C trace-context, but only for a specific gRPC client

grpcTracing = GrpcTracing.create(rpcTracing.toBuilder().propagation(traceContextPropagation).build());
channel = ManagedChannelBuilder.forAddress("something_that_only_talks_w3c", serverPort)
    .intercept(grpcTracing.newClientInterceptor())
    ...

While this example is about gRPC, it hints that you can change any library or the entire RPC subsystem while leaving everything else alone. You can also use this approach to disable baggage. As these concerns are uniform, we added them to all our major abstractions: HttpTracing, MessagingTracing and RpcTracing.

Thanks very much for @dimi-nk who helped us identify problems they face in header diversity. While imperfect, we hope this helps and will continue work to reduce pain in tracing.

Brave no longer imports Maven Bill of Materials (BOM)

End users can opt-in to io.zipkin.brave:brave-bom to pin our versions, but we will no longer use tools like BOMs for internal convenience.

Our parent project formerly imported netty-bom for our convenience. This allowed our tests to not download several similar versions of netty. However, this leaked a detail to those using brave's core library. Simply depending on io.zipkin.brave:brave would download that BOM. Even if it didn't impact anything, it causes confusion as to why an unrelated library's file is being downloaded. In summary our internal convenience should not cause confusion for others. Hence, all core libraries no longer import boms, and neither do transitively (parents).

The build is more resilient and faster

We had numerous problems due to rate limiting and in some cases CI service shutoff completely. A few top level changes led by @adriancole allowed the project to resume functioning from a test and deployment perspective.

  • The build now uses GitHub Action workflows
  • The build now publishes to Sonatype directly instead of intermediating through a service.
  • The build no longer depends on Docker Hub (docker.io) images as that can trigger rate limits for us or forks

Smaller updates

  • @adriancole fixed "grpc-trace-bin" aka census propagation
  • @anuraaga added "leaked all the way until GC" to StrictScopeDecorator
  • @rgamez fixed a problem in p6spy that constrained zipkinServiceName to a smaller character set than it should allow.
  • @m50d fixed a problem where setting Kafka headers marked read-only could crash a request (raise an exception)

Despite all our work, life in tracing is becoming more difficult now. For example, the main distribution of OpenTelemetry chose to only propagate their W3C trace-context format. In other words they disable B3 by default. Not only is this format less efficient than b3 single, it is more complicated. Most tools don't implement the tracestate part. The most common practice is to blindly copy an unvalidated string into it. Lack of validation in a primary trace context field means easy bugs that can propagate across the network. Receivers have to expect and handle more malformed use cases due to the flexibility allowed in the spec and practices such as these. This means a fast moving library, almost always <1.0. Use of an unstable format and an unstable library are two problems, not one, and with different implications. For example, if anything <1.0 is used in tracing, it should be re-packaged with tools like "maven-shade-plugin" in order to eliminate compatibility and upgrade problems. Isolating entry-points into these unstable areas of code and communication is the safest way out. This is why we broke our already flexible propagation system into parts, so that users can isolate unstable headers to only where they are used.

5.12.3

4 years ago

Brave 5.12 introduces a powerful new way to handle data, completes our RPC abstraction, drops our Zipkin dependency and pours our thinking into RATIONALE docs.

There's a lot in this release for those doing advanced things like managing configuration tools or implementing custom tracing backends. Most users will do nothing except upgrade.

If you are using Brave directly, you should take note of deprecation mentioned. We do a major release every couple years, to remove deprecation and Brave 6 will also do that. By paying attention, not only will your code work faster, but you'll have less surprise later.

Like all releases, volunteers bore a huge responsibility on this release. As so much happened here, it was quite a load. Please reach out and thank those who contributed, star our repo or say hi on gitter. If you have ideas, we'd love to hear about them, too.

On to the main show!

Introducing SpanHandler

Brave 5.12 has a cleaner integration for data than ever before. SpanHandler replaces FinishedSpanHandler. SpanHandler can do everything FinishedSpanHandler did: redacting, adding tags based on baggage, remapping trace IDs, sending to multiple systems etc.

The more advanced begin hook adds much more power. You can setup default baggage only on local roots, add correlated mapped data extensions, perform aggregations such as child counts.

This is our most powerful API co-designed by @anuraaga and with lots of good feedback from our usual suspects @jeqo and @jorgheymans. For now, you can just replace FinishedSpanHandler with SpanHandler, but if you are curious.. here are few links of interest:

See https://github.com/openzipkin/brave/blob/master/brave/src/main/java/brave/handler/SpanHandler.java See https://github.com/openzipkin/brave/tree/master/brave/src/test/java/brave/features/handler See https://github.com/openzipkin/zipkin-reporter-java/tree/master/brave

MutableSpan can do everything now

MutableSpan was initially a response to complaints that immutable conversions added GC pressure and generally weren't a good choice for telemetry.

Before, we paired TraceContext with MutableSpan, splitting responsibilities. However, this would make things like natively writing JSON from Zipkin types difficult. Hence, we fully fleshed out MutableSpan so that it accompanies, but is decoupled from TraceContext.

Here are some features newly available with much thanks to @anuraaga for a month of help on them!

  • MutableSpanBytesEncoder - allows you to write MutableSpan directly to JSON without any dependencies or intermediating through another type such as zipkin2.Span.
  • MutableSpan.xxxId() - allows you to specify read or remap all IDs including trace IDs, depending on your output
  • MutableSpan.annotations(), tags() - read-only immutable collection views for convenience of those not concerned with performance (internally implemented as arrays)
  • MutableSpan.annotationCount(), tagCount() xxxValueAt(index) - allocation free tools to write data conversions as for loops.

RPC abstraction is now complete!

We started an RPC abstraction about 9 months ago. Last October, we RPC sampling support in Brave 5.8.

With a lot of thanks to our contributors @devinsba @jeqo @jcchavezs and especially weeks of effort by volunteer @anuraaga, we have a complete product. Those using gRPC or Dubbo can now uniformly sample and parse parse based on RPC metadata:

By default, the following are added to both RPC client and server spans:

  • Span.name is the RPC service/method. Ex. "zipkin.proto3.SpanService/Report"
    • If the service is absent, the method is the name and visa versa.
  • Tags:
    • "rpc.method", eg "Report"
    • "rpc.service", eg "zipkin.proto3.SpanService"
    • "rpc.error_code", eg "CANCELLED"
    • "error" the RPC error code if there is no exception
  • Remote IP and port information

Users familiar with how HTTP works will love the familiarity. The APIs are similar, exactly the same features are supported, whether that's sampling, baggage you name it. Those curious about our decision making process can have a look at the RATIONALE as we tried our best to make sound decisions and be transparent about them. Enjoy!

Zipkin dependency is dropped!

With the SpanHandler type finalized, we have deprecated support for zipkin2.Reporter<zipkin2.Span> in Brave and removes dependencies on Zipkin libraries.

This isn't to deprecate Zipkin support, of course, just move the responsibility to the zipkin-reporter-brave project (even [XML beans](https://github.com/openzipkin/zipkin-reporter-java/tree/master/spring-beans for those who need it!)

The end result is cleaner integrations for the various SaaS offerings who use Brave, but don't use Zipkin. Such use cases should be directly implemented as SpanHandler now, with no need to route through zipkin format.

Zipkin users should simply replace AsyncReporter with AsyncZipkinSpanHandler to adjust, similar to what's in our README:

// Configure a reporter, which controls how often spans are sent
//   (this dependency is io.zipkin.reporter2:zipkin-sender-okhttp3)
sender = OkHttpSender.create("http://127.0.0.1:9411/api/v2/spans");
//   (this dependency is io.zipkin.reporter2:zipkin-reporter-brave)
zipkinSpanHandler = AsyncZipkinSpanHandler.create(sender);

tracing = Tracing.newBuilder()
                 .addSpanHandler(zipkinSpanHandler)
                 ...

Test infrastructure overhaul

As we no longer have a Zipkin dependency, we decided to make tools to help common unit and integration tests. For example, vendors integrating with Brave should be able to assert on the data produced. Third party libraries should be able to avoid common bugs. Beyond our normal ITHttpServer and similar tests, we've extracted the following in the brave-tests package:

Rationale

We have updated and added many RATIONALE files including the below to better help people understand our thinking.. and to help us remember our thinking!

Thanks to @jorgheymans @jeqo @jcchavezs @anuraaga and @NersesAM for the help adding content and reviewing

brave brave-instrumentation brave-instrumentation-dubbo brave-instrumentation-http brave-instrumentation-grpc brave-instrumentation-kafka-streams brave-instrumentation-rpc

Other Notable Changes

Updates

  • Kafka 2.5 is now supported, thanks to @jeqo

Behavior

  • one-way RPC span modeling should no longer use span.start().flush() on one host and span.finish() (without start) on the other. This was implemented inconsistently and not very compatible with most clones.

Additions

  • Tracing.Builder.clearSpanHandlers(), spanHandlers() - allows TracingCustomizer instances to re-order or prune span handlers. For example, to ensure Zipkin is last, or theirs is first.
  • Tracing.Builder.alwaysSampleLocal() - special hook for metrics aggregation and secondary-sampling that says the backend should always see recorded spans even if they weren't sampled in headers

Deprecations:

  • Tracer.propagationFactory() is deprecated for the existing Tracer.propagation() as we no longer rely on non-string keys (these were only used by gRPC and we changed to hide this conversion).
  • brave.ErrorParser is deprecated as it was only used for Zipkin conversion. You can optionally specify Tag<Throwable> to affect the default "error" tag in zipkin-reporter-brave