Ebpf Exporter Versions Save

Prometheus exporter for custom eBPF metrics

v2.4.1

3 weeks ago
  • Disabled cache for tracing labels, which was causing a memory leak (#363)
  • Enabled pprof support (#364)
  • Added errno decoder (#378)
  • Added padding to label decoder to skip struct holes (#376)
  • Added ext4dist example (#365)
  • Added xfsdist example (#368)
  • Fixed biolatency example (#373, #374, #375)
  • Rewrote sock-trace and simplified example with socket cookies (#381)
  • Split cachestat into pre and post kernel 5.16 (#372)
  • Fixed tracing screenshot path in the README (#379)
  • Added a helper to extract tracing propagation args (#358)
  • Added probing for /usr/share/hwdata/pci.ids for RedHat/Fedora/CentOS (#380)
  • Added sd_notify support when running under systemd (#382)
  • Bumped dependencies to latest (#359, #360, #361, #366, #367)

v2.4.0

2 months ago

This is a big release that comes with a major new feature: Distributed Tracing via OpenTelemetry (#297).

You can find the full documentation in ./tracing.

As a quick demo, you could run a demo locally with a provided Docker image:

  1. Run Jaeger all-in-one to provide an OpenTelemetry sink and UI:
docker run --rm -it --net host jaegertracing/all-in-one:1.54.0
  1. Open Jaeger UI: http://localhost:16686/.

  2. Build tracing demos from the root of the repo:

make tracing-demos
  1. Run ebpf_exporter with a sock-trace example from the root of the repo:
docker run --rm -it --privileged --net host -e OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 -v $(pwd)/tracing:/tracing ghcr.io/cloudflare/ebpf_exporter:v2.4.0 --config.dir=examples --config.names=sock-trace
  1. Run the demo:
./tracing/demos/sock/demo
  1. Refresh the Jaeger UI and select demo as the service, click "Find Traces".

  2. Observe a trace that includes both userspace demo component produced spans and kernel spans produced with ebpf_exporter:

image image

We have more examples bundled, please see the docs.

Tracing support required us to take a few dependencies that needed a newer Go version, so we bumped the build requirement from go1.18 to go1.20.

Other changes:

  • Bumped dependencies to latest (#347, #348, #353, #354)
  • Updated softirq-latency-net-rx example (#349)
  • Modernized eBPF description (#357)

v2.3.3

3 months ago
  • Partition decoder cache by name (#346)

v2.3.2

3 months ago
  • Protect decoder cache with a mutex (#345)
  • Upgrade GitHub actions as recommended by GitHub (#344)

v2.3.1

3 months ago
  • Added decoder cache (#342)
  • Added interface name decoder with an example (#339)
  • Updated dependencies (#337, #341)
  • Added a link to a third party helm chart (#336)

v2.3.0

4 months ago

Highlights:

  • Added support for fanotify for a faster and more reliable cgroup monitoring (#244, #263, #264, #265, #266, #279, #288)
  • Added builds with built-in libbpf (now preferred) and system provided libbpf (#286)
  • Started publishing Docker images on GitHub (#271, #290, #291, #292, #293, #294, #295)

New examples:

  • Added icmp-ip example with inet_ip decoder (#251)
  • Added pci_vendor, pci_device, pci_class, pci_subclass decoders with examples (#255, #274)
  • Added kstack decoder with an example (#313)
  • Added unix-socket-backlog example (#284)
  • Added softirq-latency example (#300, #304)
  • Added softirq-latency-net-rx example that's an array based version of softirq-latency (#310)
  • Added cfs-throttling example (#311)
  • Added tcp-retransmit example (#318, #335)

Changes to examples:

  • Added jsonschema for examples and cleaned up unused keys (#314)
  • Added exp2zero histogram type for cases when 0 is a significant outcome and added tcp-syn-backlog-exp2zero example (#280)
  • Fixed uint decoder for very large numbers (#296)
  • Removed copy-pasted division by 50 in tcp-syn-backlog-exp2zero example (#301)
  • Added increment_exp2zero_histogram helper macro for examples (#302)
  • Added {increment_map,increment_{exp2,ex2zero}_histogram}_nosync helper macros (#303, #305)
  • Fixed tcp-syn-backlog example with linear histogram (#306)
  • Added example rebuild if any of the headers change (#307)
  • Simplified header includes in examples (#312)
  • Fixed shrinklat example failure due to wrongly sized key (#319)
  • Suppressed BTF warning in the shrinklat example due to type mismatch (#327)
  • Fixed biolatency kernel version check after an upstream LTS backport (#309)

Build changes:

  • Bumped Go to 1.20 and dependencies to latest (#249, #258, #281, #282, #283, #289, #317, #333, #334)
  • Added build-dynamic and build-static make goals (#241)
  • Expanded linting from golangci-lint and fixed uncovered issues (#256, #259, #269, #270)
  • Added configuration loading checks for existing configs to CI (#322, #326)
  • Added export of built examples in CI jobs to attach them to releases (#308, #325)
  • Suppressed errors when building outside of a git repo (#242)
  • Added checks for libbpf version on startup to prevent runtime errors (#247)
  • Clarified libbpf instructions (#262)
  • Started running tests with -race if available (#267)
  • Added checks that produced binaries work in CI (#268)
  • Switched from dbhi/qus/action to more official docker/setup-qemu-action for CI builds (#272)
  • Split Docker image into multiple variants: ebpf_exporter and ebpf_exporter_with_examples (#273)
  • Optimized libbpf dependencies in CI (#275)
  • Added clang-format output diff to CI failures (#328)

Other changes:

  • Styling and typo fixes (#252, #276, #329)
  • Added map value size validation to startup config checks (#257, #321, #322)
  • Added linguist ignores for vmlinux.h files that were screwing language stats (#248)
  • Added .dockerignore for libbpf and built examples (#298)
  • Removed unused perf_event from config definitions (#315)
  • Added support for external BTF information (#320, #323)
  • Added a uprobe benchmark (it's slow!) (#331)

v2.2.0

9 months ago

The best release yet! Syscalls, per-cpu maps, running with no elevated capabilities at runtime — it has it all.

  • Added capability dropping and documented necessary capabilities (#231)
  • Added support for systemd socket activation (#237)
  • Added tracepoints and empty probes benchmark (#236)
  • Added support for reading percpu maps (#226)
  • Added support for XDP attachment with an example (#215, thanks @huseyinsaatci)
  • Added syscall decoder with an example (#214, thanks @huseyinsaatci)
  • Added udp receive packet drops example (#213, #229)
  • Added kfree_skb example (#233, #234)
  • Simplified oomkill example (#230)
  • Replaced tracepoints with tp_btf in examples to remove the need for tracefs (#227)
  • Reduced libbpf logging unless --debug is enabled (#216)
  • Allowed suppressing timestamps in logs with --log.no-timestamps (#239)
  • Added clang-format config to enforce formatting on C code (#222)
  • Formatted examples uniformly (#228)
  • Added default build goals to Makefiles (#225)
  • Updated ubuntu in CI from 20.04 to 22.04 (#223)
  • Updated vmlinux.h from 5.15.0-25 to 6.3.0-7 and generation instructions (#224)
  • Updated dependencies to latest (#197, #202, #203, #204, #205, #206, #207, #210, #211, #212, #218, #238)

v2.1.0

1 year ago
  • Enabled pre-aggregation for label sets to allow duplicate labels (#180)
  • Added tcp-window-clamp example (#172)
  • Enabled passing CFLAGS to examples (#172)
  • Added a note about supported distros to README (#174)
  • Updated module path to v2 to make Go happy (#177)
  • Cleaned up HistogramBucketType (#178)
  • Added a link to libbpf program types and SEC names (#181)
  • Switched to consistent indentation in examples (#194)
  • Updated used bpf instruction set via -mcpu to v3 (#182)
  • Updated Go to 1.20 (#193, #195)
  • Updated golangci-lint to latest (#192)
  • Updated dependencies to latest (#169, #173, #175, #176, #183, #187, #188, #190, #191, #195)

v2.0.0

1 year ago

ebpf_exporter v2 is here!

This release comes with a bunch of breaking changes (all for the better!), so be sure to read the release notes below.

First and foremost, we migrated from BCC to libbpf. BCC has served us well over the years, but it has a major drawback that it compiles eBPF programs at runtime, which requires a compiler, kernel headers and has a chance of failing due to kernel discrepancies between hosts and kernel versions. It was hard to do static linking with bcc, so we ended up providing a binary linked against an older libc, for which you had to provide your own libbcc (which could also break due to unstable ABI).

With libbpf all these problems go away:

  1. Programs (now called configs) are compiled in advance, and for each config you have an eBPF ELF object and a yaml config describing how to extract metrics out of it.
  2. Thanks to libbpf and CO-RE you can COmpile once and Run Everywhere, worrying less about runtime failures.
  3. It's easy to statically compile in libbpf, so we now provide a statically compiled binary that you can use anywhere with no dependencies. We also have a Dockerfile in the repo (not yet published on Docker Hub) if you're inclined to use that, and it's easier to run than ever.

Big thanks to @wenlxie for doing a bulk of the work on porting to libbpf in #130. Another big thanks to @aquasecurity for their work on libbpfgo, which made it a lot easier for us to switch.

In BCC repo itself there's an effort to migrate programs from BCC to libbpf and you can see it here:

The programs above can be used as an inspiration to what can ebpf_exporter provide for you as a metric.

Now to config changes. Previously you needed to make one big yaml config with all your metric descriptions and metrics intermingled. Now each logical program is called a config (a .yaml file) and each config has a dedicated eBPF ELF object (a .bpf.o file compiled from a .bpf.c file). When you start ebpf_exporter, you need to give it the path to the directory with your configs and tell it which configs to load. This allowed us to greatly flatten and simplify the configs and it allows you to have a simpler tooling configuring what ebpf_exporter should enable.

Having eBPF C code in separate files also allows you to use your regular tooling to build eBPF ELF objects. In examples directory you'd find a collection of our example configs along with a Makefile to build eBPF code. The expectation is that you would replicate something similar for your internal configs, and you all the needed bits and pieces provided for you to copy and adapt. We provide vmlinux.h for both x86_64 (aka amd64) and aarch64 (aka arm64).

Having separate .bpf.o allows you to compile not just C code, but anything that would provide a valid eBPF ELF object. We tried with Rust, but unsuccessfully. Please feel free to send a PR if you have better luck with it. We still expect that majority of the people would use plain old C, since that's what libbpf mainly supports and has a lot of examples for.

Since programs for configs need to compiled in advance, we compile them as a part of CI job, allowing to spot mistakes early.

You no longer need to describe how to attach your eBPF programs in the config, it all happens in code. Take timers code as an example:

SEC("tracepoint/timer/timer_start")
int do_count(struct trace_event_raw_timer_start* ctx)

We use libbpf provided SEC macro to tell what to attach to, which in this case is timer:timer_start tracepoint. You can use any SEC that libbpf provides (there are many) and it should work out of the box, including uprobe, usdt and fentry (the latter currently requires a kernel patch on aarch64).

We piggyback on libbpf for most of the stuff with SEC, with the only exception being perf_event. For that we have a custom handler allowing you to set type, config, and frequency of the event you want to trace. Below is type=HARDWARE, config=PERF_COUNT_HW_CACHE_MISSES at 1Hz from llcstat example:

SEC("perf_event/type=0,config=3,frequency=1")
int on_cache_miss(struct bpf_perf_event_data *ctx)

With uprobe support we also provide a way for you to run some code when you program is attached:

SEC("uprobe//proc/self/exe:post_attach_mark")
int do_init()

There's post_attach_mark() function in ebpf_exporter that runs immediately after all configs are attached. In bpf-jit example we use it to initialize a metric that would otherwise require a probe to run, which might be a while.

We now allow loose program attachment. If previously all programs had to be attached successfully for ebpf_exporter to run, now we allow failures and export a metric whether each program was attached or not. This way you can use alerting to detect when this happens, while not sacrificing unrelated configs. This is handy if your programs attach to something that might be missing from some kernels, like a static function that is sometimes not visible. We used it in our cachestat example.

Speaking of metrics, if you have kernel.bpf_stats_enabled sysctl enabled, we now also report how many times each of your eBPF programs ran and how long it spent running, which might be handy if you want to get an idea of how long things take.

In code and for the debug endpoint we renamed "tables" to "maps" to match eBPF terminology. If you were using /tables for debugging, you should switch to /maps. Previously configs needed to specify which table metrics came from, now it's automatically inferred from the metric name itself.

We have updated our benchmark, which now includes fentry, so you can see how much faster it is than good old kprobe and how much overhead you should expect in general (it's not much).

All of these changes are reflected in README, so if you start from scratch, you shouldn't worry. If you are currently using ebpf_exporter v1, it will take some work to upgrade. The good news is that the metrics you export do not need to change. Internally at Cloudflare we upgraded without any issues.

You may have noticed that previously ebpf_exporter took some time to start up due to the need to compile programs. Since this is no longer the case, you should expect much faster startup times now. For complex configs like biolatency you should also expect lower memory usage (we observed ~250MiB -> ~30MiB drop during the upgrade).

If you need some documents getting up to speed with libbpf and CO-RE, here are three great blog posts from libbpf maintainer @anakryiko:

We hope you'll enjoy these changes. As usual, please let us know if you run into any issues.

v1.2.5

2 years ago
  • Run linter on Go 1.17 (#116)
  • Bump golangci-lint from v1.38 to v1.42.1 (#117)
  • Publish binaries for x86_64 and aarch64 (#118, #122)
  • Register build info in prometheus (#120, #123, #124)
  • Remove default config path to make --version work (#121)

The binaries in this release require glibc 2.27 or newer. You need to have libbcc.so installed to run the binaries, and on Ubuntu or Debian it's int the libbpfcc package.