Netdata Versions Save

The open-source observability platform everyone needs!

v1.43.0

6 months ago

Groundbreaking: systemd-journal logs release!

Table of Contents

Steady to our schedule, this is another great Netdata release!

Netdata Growth

  • 65.5 k GitHub Stars ⭐ Since October 2023, Netdata is leading the observability category in the CNCF landscape, surpassing Elasticsearch. Thank you for your love ❤️! Give Netdata a ⭐ too, on GitHub!

  • 595 M docker hub pulls Netdata runs with about 200k docker hub downloads per day. Since June 2023 we are a Verified Publisher, so that Netdata pulls don't count against docker hub pull limits for our users, allowing all our users to integrate Netdata to their CI/CD toolchains.

Release Summary

This release is the most robust and reliable Netdata we have ever built.

These are the main areas Netdata has improved since the last release:

  1. Logs Today we release an almost rewritten version of systemd-journal, to improve its performance and visualization capabilities. systemd-journal holds critical systems and security information and given the lack of systemd-journal visualization tools, we focused first on filling this gap. At the same time, we are standardizing the way logs should be as a part of Netdata, enabling us to support more log management engines, like Loki and Elasticsearch.

  2. Instances Slice and Dice Given the capabilities of the new Netdata Agent UI (v2), we are changing the way some of our collectors collect and expose metrics, to allow easier slicing and dicing of the data and be more OpenTelemetry compatible in terms of specifications. So, in this release we changed the way apps.plugin exposes charts in the Applications section of the dashboard. Following the NIDL framework, each application group is now an instance, allowing better aggregation of processes utilization across nodes. Similarly, our systemd units charts have been updated to have an instance for each systemd unit. For the same reasons, disk charts now have additional labels (id, model and serial) to help us identify disks from the charts. Unfortunately, such changes tend to make the older dashboards (v1, v0) less usable, especially on servers with many hundreds of instances.

  3. Stock Alerts A number of changes have been implemented to the Netdata Health engine, to allow better integration with the new dashboard. More changes in this area are about to come, as part of the next release: a) allow multi-node alerts on parents, b) allow evaluating and configuring alerts from the UI.

  4. Alerts Accuracy Netdata has by default 3 tiers of metrics, each with a different resolution. The Netdata query planner is automatically picking the right tier to satisfy a query, based on the number of points requested in the response. For alerts there was a side effect. Since alerts request only 1 point of data in the response, the query planner was picking the "easier" tier to query, which is of course the one with the lower resolution. Now alerts are always run on tier 0, the higher resolution one.

  5. Lower Resources Utilization Several changes have been implemented for Netdata to better take care of itself. That includes lower memory usage, lower disk footprint, self vacuuming of SQLite databases, and more. Probably the most notable change is that now Netdata needs only 1 pointer (8 bytes on 64 bit, 4 bytes on 32 bit) for each use of a label name-value combination. This improves drastically Netdata's memory requirements in setups like busy k8s clusters, that containers come and go all the time, increasing the labels cardinality significantly.

  6. 32bit Netdata on 64bit IoT machines A common request when Netdata is installed on 64bit IoT devices, is to have a 32bit Netdata running there. Before this release, this was not possible. Now a 32bit Netdata will nicely run on a 64bit operating system.

  7. Netdata Cloud on prem Netdata Cloud is now available to be installed on-prem! Several companies have already deployed it and are currently testing it. If you want to join them, submit this form.

Release Highlights

systemd-journal

systemd-journal was first included in Netdata v1.42.0. Immediately after release, we recognized the wider need for this feature, so we've rewritten the plugin almost entirely, to provide the best possible experience. This work is also fundamental for supporting more log monitoring integrations - stay tuned!

The major improvements done on systemd-journal logs function were:

  • addition of the histogram for log entries over time, with a break down per field-value, for any field and any time-frame
  • enable of the PLAY mode provides the same experience as journalctl -f, showing new logs entries immediately after they are received
  • allow filtering on any journal field or field value, for any time-frame
  • add support for coloring log entries, the same way journalctl does

If you want to take a look at a full presentation of the systemd-journal plugin, how it works, how you can take full advantage of this and even instructions on configuration of a logs centralization server, check the documentation for the plugin.

chrome_tf8dV0qS5x

You can experience the power of systemd-journal logs function in one of our Netdata demo rooms here or check our latest YouTube video on it.

Want to know why you should untap the full potential of systemd-journal logs? Check out Netdata's founder, Costa Tsaousis @ktsaou, blogpost on it here.

Virtual Machine monitoring (VMWare vSphere)

With the increased feedback and requests on VMware vCenter Server collectors we have:

  • Reviewed our out-of-the-box charts
  • Added labels to the charts, e.g. host, datacenter, cluster, vm
  • Reviewed the metadata on alerts
  • Added summary charts section

It is with this feedback from the Community that we can keep working on improving Netdata to ensure it meets your needs!

What is coming next

We are currently working on the following areas, which we hope to release next month:

  1. Logs Explorer for Loki and Elasticsearch Similar to systemd-journal, allow Netdata to explore, query and visualize logs from Loki and Elasticsearch.

  2. Collectors Configuration from the UI In the last release we presented the Integrations Marketplace. Since then, we work to make all integrations configurable via the dashboard. This will allow all of us to configure our Netdata servers directly from the UI, without touching configuration files, improving significantly the usability and easiness of Netdata.

  3. Alerts Configuration from the UI Similarly, we work to allow configuring alerts directly from the UI, without text file configurations, so the all of us can create powerful alerts on the spot.

  4. Netdata Mobile App We are at the final stage of releasing our Netdata Mobile App (iOS and Android) for receiving mobile push notifications and exploring alerts statuses.

  5. Scalability Given the wide adoption of Netdata, we are committed to make Netdata scale better in larger environments. Especially when it comes to Netdata parents, we aim to provide the best scalability possible. We are currently finalizing the necessary changes to allow Netdata achieve:

    • 1 CPU core per 1 million metrics/s for data collection
    • 1 CPU core per 1 million metrics/s for ML and health (alerts)
    • 1 CPU core per 1 million metrics/s for re-streaming (pushing metrics to another parent)

    Of course, the numbers depend on the CPU and its clock, but they shouldn't vary significantly on modern systems.

    At the same time, we work to integrate Gorilla compression to our database. This will provide a significantly better overall memory footprint for Netdata.

Acknowledgments

We would like to thank our dedicated, talented contributors that make up this amazing community. The time and expertise that you volunteer are essential to our success. We thank you and look forward to continuing to grow together to build a remarkable product.

  • @MAH69IK for improving ntfy notification title.
  • @chpfm for fixing slave/user metrics collection stopping when query times out in go.d/mysql.
  • @k0ste for various installation improvements on CentOS-Stream.
  • @kylemanna for fixing an issue where a properly functioning sensors was skipped due to limits in python.d/sensors.
  • @miversen33 for adding access control configuration to ntfy notification method.
  • @novotnyJiri for fixing the wrong path in ansible-playbook deployment guide.
  • @theggs for adding installation description for Homebrew on Apple Silicon.
  • @vpnable for fixing counting UNDEF as users in go.d/openvpn_status_log.
  • @zhqu1148980644 for fixing docker-compose example.
  • @luisj1983 for implementing molecule tests in the netdata/ansible playbook

Contributions

Collectors

All changes

Improvements

  • Improve exposing metrics by creating a chart for each app group/user/user group (apps.plugin) (#16095, @thiagoftsm)
  • Add env NETDATA_LOG_SEVERITY_LEVEL support to external collectors (#16089, @ilyam8)
  • Add env NETDATA_LOG_SEVERITY_LEVEL support (charts.d.plugin) (#16085, @ilyam8)
  • Add env NETDATA_LOG_SEVERITY_LEVEL support (python.d.plugin) (#16084, @ilyam8)
  • Improve performance by reading files sequentially (systemd-journal.plugin) (#16038, @ktsaou)
  • Add systemd-journal plugin to apps_groups.conf (apps.plugin) (#16024, @ilyam8)
  • Improve exposing metrics by creating a chart for each systemd service (cgroups.plugin) (#15975, @thiagoftsm)
  • Add disk labels (proc/diskstats) (#15949, @ktsaou)
  • Add support for opening journal files when running inside a container (systemd-journal.plugin) (#15830, @ktsaou)
  • Add env NETDATA_LOG_SEVERITY_LEVEL support (go.d.plugin) (#1351, @ilyam8)
  • Add "network" config option that allows configuration of DNS resolution (go.d/ping) (#1348, @ilyam8)
  • Add "custom_numeric_fields" config option (go.d/web_log) (#1343, @ilyam8)
  • Add upsd (NUT) collector (go.d/upsd) (#1341, @ilyam8)
  • Improve status chart by making it a dimension per status (go.d/vcsa) (#1332, @ilyam8)
  • Add label to vm/host charts (go.d/vsphere) (#1331, @ilyam8)

Bug fixes

  • Fix 1-second latency in play mode (systemd-journal.plugin) (#16123, @ktsaou)
  • Fix an issue where ipv4 metrics were exposed as ip (proc/netstat) (#16122, @ilyam8)
  • Fix an issue where OOMKill was created unconditionally (ebpf.plugin) (#16115, @thiagoftsm)
  • Fix an issue where ebpf threads did not respect the enable/disable value in the configuration (ebpf.plugin) (#16083, @thiagoftsm)
  • Fix using undefined var when loading job statuses (python.d.plugin) (#15965, @ilyam8)
  • Fix an issue where a properly functioning sensor was skipped due to limits (python.d/sensors) (#15905, @kylemanna)
  • Fix slave/user metrics collection stopping when query times out (go.d/mysql) (#1346, @chpfm)
  • Fix counting UNDEF as users (go.d/openvpn_status_log) (#1334, @vpnable)
  • Fix an issue where power metric were not collected due to renaming (go.d/nvidia_smi) (#1310, @ilyam8)

Other

Packaging / Installation

All changes
  • Fix removing wrong directories when uninstalling on FreeBSD (#16167, @tkatsoulas)
  • Fix repo path for openSUSE 15.5 packages (#16161, @tkatsoulas)
  • Fix an issue running a Docker container when the default user was configured as a non-root user (#16156, @ilyam8)
  • Fix an issue where the uninstaller script doesn't clean up properly (#16148, @ilyam8)
  • Fix problem with the uninstaller script when executed as a regular user (#16146, @ilyam8)
  • Skip trying to preserve file owners when bundling external code (#15966, @Ferroin)
  • Cleanup Dockerfile (#15902, @Ferroin)
  • Skip copying environment/install-type files when checking existing installations (#15876, @Ferroin)
  • Add setuid fallback for perf and slabinfo plugins in the installer script (#15807, @ilyam8)
  • Fix an issue where cleanup was not performed during the kickstart.sh dry run (#15775, @ilyam8)
  • Add CentOS-Stream to distros (#15742, @k0ste)
  • Fix build with --disable-https (#15395, @MrZammler)
  • Enable building go.d plugin natively for CentOS-Stream (#14551, @k0ste)

Documentation

All changes

Health

All changes

Other Notable Changes

All changes

Improvements

Bug Fixes

Other

Deprecation notice

Changed in this release

In accordance with our previous deprecation notice, the following items in this release have been changed:

Component Type Change Action
apps.plugin collector a dimension for each group/user/user group => a chart for each group/user/user group
cgroups.plugin collector a dimension for each systemd service => a chart for each systemd service
proc.plugin collector all "Networking Stack" metrics except "tcp" have been moved to "IPv4 Networking"
family attribute alert configuration and Health API deprecated use chart labels

Will be changed in the next release

We plan to change in the next release (v1.44.0):

Component Type Change Action
charts.d/nut collector deprecated use go.d/upsd

Netdata Release Meetup

Join the Netdata team on the 18th of October at 16:30 UTC for the Netdata Release Meetup.

Together we’ll cover:

  • Release Highlights.
  • Acknowledgments.
  • Q&A with the community.

RSVP now - we look forward to meeting you.

Support options

As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:

  • Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
  • GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
  • GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
  • Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
  • Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 1700 engineers are already using it!

v1.42.4

7 months ago

Netdata v1.42.4 is a patch release to address issues discovered since v1.42.3.

This patch release provides the following bug fixes and updates:

  • Fixed alarm variables not being created for all chart dimensions. (#15984, @MrZammler).

Support options

As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:

  • Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
  • GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
  • GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
  • Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
  • Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 1700 engineers are already using it!

v1.42.3

7 months ago

Netdata v1.42.3 is a patch release to address issues discovered since v1.42.2.

This patch release provides the following bug fixes and updates:

Acknowledgements

We would like to thank our dedicated, talented contributors that make up this amazing community. The time and expertise that you volunteer are essential to our success. We thank you and look forward to continuing to grow together to build a remarkable product.

  • @moonbreon for improving handling of closed connections in streaming.

Support options

As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:

  • Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
  • GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
  • GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
  • Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
  • Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 1700 engineers are already using it!

v1.42.2

8 months ago

Netdata v1.42.2 is a patch release to address issues discovered since v1.42.1.

This patch release provides the following bug fixes and updates:

Acknowledgements

We would like to thank our dedicated, talented contributors that make up this amazing community. The time and expertise that you volunteer are essential to our success. We thank you and look forward to continuing to grow together to build a remarkable product.

  • @kevin-fwu for adding an option to avoid duplicate labels when exporting in Prometheus format.
  • @k0ste for fixing permission attributes for conf.d dirs for RPM.

Support options

As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:

  • Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
  • GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
  • GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
  • Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
  • Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 1600 engineers are already using it!

v1.42.1

8 months ago

Netdata v1.42.1 is a patch release to address issues discovered since v1.42.0.

This patch release provides the following bug fixes and updates:

  • Fixed issue with missing entries for Systemd-journal and Processes functions (#15814, @ktsaou)
  • Fixed linking health.log to stdout in Docker (#15813, @ilyam8)
  • Updated UI version to v6.28.0 (#15810, @ilyam8)
  • Fixed 401 when behind a proxy with Basic auth and signed in (#15808, @ktsaou)
  • Fixed Health Management API (#15806, @underhood)
  • Fixed build deps in DEB packages for systemd-journal.plugin (#15805, @Ferroin)
  • Cleaned up python deps for RPM packages (#15804, @Ferroin)
  • Added proper SUID fallback for DEB plugin packages (#15803, @Ferroin)
  • Fixed an issue where the nd_journal_process column was not populated for the Systemd-journal function (#15798, @ktsaou)
  • Fixed negative retention when database is empty in /api/v2/info (#15796, @ktsaou)
  • Fixed handling of unassigned drives for python.d/hpssa (#15793, @ilyam8)
  • Fixed an issue that prevented systemd-journal.plugin from restarting (#15787, @ktsaou)
  • Fixed publishing of openSUSE 15.5 packages (#15781, @tkatsoulas)
  • Updated OpenSSL version of static builds to 1.1.1v (#15779, @tkatsoulas)

Support options

As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:

  • Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
  • GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
  • GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
  • Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
  • Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 1600 engineers are already using it!

v1.42.0

8 months ago

Steady to our schedule, this is another great Netdata release!

Netdata Growth

  • 64.5 k GitHub Stars ⭐

    Netdata got at the top trending repos on GitHub, after the last release. ❤️ Thank you for your love! 🚀 You rock!

    Give Netdata a ⭐ on GitHub too!

  • 580+ M docker hub pulls, running at 200+ k per day.

    Netdata is a verified publisher on Docker Hub, and our users enjoy free unlimited Docker Hub pulls!

Release Highlights

Integrations Marketplace

A beta version of the Netdata Marketplace is included in this release: image

More than 800 integrations are available, directly from the dashboard. For each integration, all the information required to get it up and running is included:

2023-08-08 15-36-40

Integrations are still in beta. We improve it every day, but we think it is already quite useful.

SystemD Journal

A new Netdata Function has been added to query the systemd journal logs:

2023-08-08 16-04-49

The function respects the current date-time picker, so it can query any possible timeframe the systemd journal has data for.

IMPORTANT
Netdata Functions are available only when you are signed in to Netdata and your Netdata Agent is claimed. This has been done to protect your privacy. Netdata Cloud checks that the users of the Agent dashboard are allowed to view this information.

IMPORTANT
The systemd-journal function is currently available only on Netdata Agents that have been installed from source, or with native packages of the Linux distribution (RPM, DEB). For users running static builds of Netdata or running Netdata in a Docker container, we are working to bring systemd-journal to them too. Stay tuned...

Claiming via the UI

You can now connect your agents to Netdata Cloud, via the dashboard:

2023-08-08 15-53-30

The UI verifies that you are the owner of a Netdata, by asking you to provide a random key that is saved to a file on disk. Once you provide the right key, Netdata is automatically claimed to your space at Netdata Cloud.

Easily Spot Anomalies

The UI has an AR button above the menu. When you press it, the dashboard queries the Netdata Metrics Scoring Engine, to find the anomaly rates for the visible timeframe, across the metrics included in the dashboard. Then it add a badge next to each category and subcategory, showing its anomaly rate.

This way, you can quickly spot what is anomalous on the current view of the dashboard.

2023-08-08 16-25-44

Acknowledgements

We would like to thank our dedicated, talented contributors that make up this amazing community. The time and expertise that you volunteer are essential to our success. We thank you and look forward to continuing to grow together to build a remarkable product.

  • @Leny1996 for fixing Docker bind-mount stock files creation.
  • @fhriley for adding Linux power cap Intel RAPL metrics collector.
  • @icy17 for fixing potential crash in the h2o server.
  • @kiela for fixing typos and images placement in the Deployment Strategies doc.
  • @zeylos for fixing non-interactive options for apt-get and zypper.

Contributions

Collectors

New

  • Add AMD GPU collector (proc.plugin)(#15515, @Dim-P)
  • Add PCI Advanced Error Reporting metrics collector (proc.plugin) (#15488, @ktsaou)
  • Add Linux power cap Intel RAPL metrics collector (proc.plugin) (#15364, @fhriley)
  • Add systemd-journal plugin (systemd-journal.plugin)(#15363, @ktsaou)

Improvements

  • Collect EDAC metrics per-memory controller (MC) and DIMM (proc.plugin) (#15473, @ktsaou)

Bug fixes

Other

  • Change restart message to info (freeipmi.plugin) (#15664, @ilyam8)
  • Filter out systemd-udevd.service/udevd cgroup (cgroups.plugin) (#15571, @ilyam8)
  • Improve FD limit issue tracing (apps.plugin) (#15504, @ktsaou)
  • Add hash table charts for internal monitoring (ebpf.plugin) (#15323, @thiagoftsm)

Documentation

Packaging / Installation

Health

  • Disable systemdunits alarms (#15726, @ilyam8)
  • Remove the noise by silencing alerts that don't need to wake up people (#15590, @ktsaou)

Other Notable Changes

Improvements

Bug Fixes

Code organization

Deprecation notice

We plan to change the following items in the next release (v1.43.0):

Component Type Change Action
apps.plugin collector a dimension for each group/user/user group => a chart for each group/user/user group
cgroups.plugin collector a dimension for each systemd service => a chart for each systemd service
proc.plugin collector all "Networking Stack" metrics except "tcp" => "IPv4 Networking"
python.d/nvidia_smi collector deprecated use go.d/nvidia_smi
family attribute alert configuration and Health API deprecated use chart labels

Netdata Release Meetup

Join the Netdata team on the 11th of August at 17:00 UTC for the Netdata Release Meetup.

Together we’ll cover:

  • Release Highlights.
  • Acknowledgements.
  • Q&A with the community.

RSVP now - we look forward to meeting you.

Support options

As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:

  • Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
  • GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
  • GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
  • Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
  • Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 1600 engineers are already using it!

v1.41.0

9 months ago

Checkout the v1.41 release meetup recording or read on to learn more about the new UI and other features in this release.

netdata release notes meetup

Steady to our schedule, this is another great Netdata release!

Netdata Growth

  • 64 k GitHub Stars ⭐
  • 1.7 M monitored nodes
  • 570+ M docker hub pulls

Give Netdata a ⭐ too, on Github!

❤️ Thank you for your love! 🚀 You rock!

Release Highlights

New Agent Dashboard

Netdata Agents and Parents now have a new UI!

New CHARTS :green_circle: New SUMMARIES :green_circle: MACHINE-LEARNING FIRST :green_circle: INFRASTRUCTURE LEVEL DASHBOARDS :green_circle: FILTER, SLICE, and DICE any dataset :green_circle: ANOMALY ADVISOR :green_circle: METRICS CORRELATIONS :green_circle: NETDATA FUNCTIONS :green_circle: EVENTS FEED :green_circle: HEATMAPS :green_circle:

Netdata Agent

In the last few months, we have ported and open-sourced all Netdata Cloud APIs to the Netdata Agent, allowing Netdata Parents to drive the same multi-node / infrastructure level dashboards Netdata Cloud provides!

So, as of today, Netdata Agents and Parents present the same UI, exactly the same dashboard, charts and features with Netdata Cloud!

Single Node Dashboard Changes

Apart from the entirely new look, single-node dashboards now group similar charts together. So, all disk drives, network interfaces, cgroups (containers and VMs), are now a single set of charts.

This allows Netdata to aggregate a vast amount of datasets in a chart, like the following, where almost 20k containers are now manageable:

image

To make it easier for you to navigate, filter, slice, and dice the data, the menus above each chart give you easy access to all the data of the chart:

Netdata Agent 2

Multi Node Dashboards

When Netdata Agents are configured as Parents (multiple other agents stream metrics to them), they now present multi-node and multi-instance charts. At the top right corner of the dashboard, there is the global nodes filter, from which you can slice the entire dashboard for one or a few of your nodes.

image

Want to know more?

Get a firsthand walkthrough with Costa Tsaousis, Netdata's Founder, on the rationale for this change and the path Netdata is taking by checking the video from Netdata Office Hours on YouTube.

The old dashboards are still accessible

You can still access all versions of the dashboards, as follows:

  • http://your.server:19999/ The default dashboard is now a live version of the new UI. The dashboard static files are served by Cloudflare and are automatically updated when we release a new version of the UI, so that your Netdata agent is always up to date.

  • http://your.server:19999/v2/ A local copy of the latest dashboard, as it was at the time the agent was released. This is distributed with Netdata under the Netdata Cloud UI License v1.0. The local copy is automatically used if for any reason the web browser cannot download the live version of it.

  • http://your.server:19999/v1/ The previous single-node version of the Netdata Agent dashboard.

  • http://your.server:19999/v0/ The now ancient, original version of the Netdata Agent dashboard.

Netdata Assistant

Netdata Assistant: Your AI-Powered Troubleshooting Sidekick

The Netdata Assistant is an AI-powered tool that uses large language models and our community's knowledge to guide you during troubleshooting and help you get to the root cause sooner.

The goal of the Netdata Assistant is straightforward: to make your troubleshooting process easier. It's here to save you from the hassle of sifting through tons of information so you can focus on solving the problem at hand.

It will give you the lowdown on the alert, why it's happening, and why you should care. It'll also guide you on how to troubleshoot it and even offer some handy web links for more info if you're interested.

image

Read more about it on the Netdata blog here.

New FreeIPMI collector for monitoring enterprise hardware

Netdata got a new FreeIPMI collector. The new collector is able to collect IPMI sensors at a much better data collection rate, and it is more reliable and robust compared to the previous one.

We have also categorized all sensors based on the component they monitor:

image

And provided as labels the exact sensor name each metric refers to:

image

Netdata Detects FDs Leaking

"FD" stands for "file descriptor". A file descriptor is an integer that the operating system assigns to an open file to track it. This includes regular data files, directories, network sockets, pipes, and other types of I/O streams.

In Linux, everything is treated as a file, which includes hardware devices, directories, and sockets. Each open file is assigned a file descriptor. When a file is closed, its file descriptor is freed up for reuse. However, if an application doesn't close a file when it's done with it, that's called a "file descriptor leak".

File descriptor leaks can cause several problems:

  1. Resource exhaustion: Each process has a limit to the number of file descriptors it can open. If a process continually leaks file descriptors without closing them, it will eventually hit this limit and won't be able to open any more files, which often causes the process to crash.

  2. Unexpected behavior: Open file descriptors hold resources, like network sockets, that might be expected to be available for other uses. If these resources are tied up due to a leak, it can cause unexpected behavior.

  3. Security issues: File descriptors can sometimes be used to gain unauthorized access to data if they're not properly managed.

apps.plugins is now able to track the usage of FDs against the limits set for each application. We have added an fds category in the Applications section of the dashboard. The first chart shows the percentage of FDs used by each application against its limits:

image

Acknowledgements

We would like to thank our dedicated, talented contributors that make up this amazing community. The time and expertise that you volunteer are essential to our success. We thank you and look forward to continuing to grow together to build a remarkable product.

  • @k0ste for improving Prometheus exporting doc.
  • @carlocab for replacing info macro with a less generic name.
  • @MYanello for updating the pfSense package installation instructions.

Contributions

Collectors

Improvements

  • Improve of fds monitoring (apps.plugin) (#15437, @ktsaou)
  • Add application groups file descriptor limit monitoring (apps.plugin) (#15417, @ktsaou)
  • Re-create sdr cache on start (freeipmi.plugin) (#15361, @ktsaou)
  • Add sensor state chart, create a per-sensor chart instead of a per-sensor dimension (freeipmi.plugin) (#15327, @ktsaou)
  • Expose CmdLine in apps function (apps.plugin) (#15275, @ilyam8)
  • Remove pod_uid and container_id labels in k8s (cgroups.plugin) (#15216, @ilyam8)
  • Add cluster mode (go.d/elasticsearch) (#1227, @ilyam8)
  • Add 'fallback_type' config option to match Untyped (go.d/prometheus) (#1225, @ilyam8)

Bug fixes

  • Fix sensor state updates (freeipmi.plugin) (#15360, @ilyam8)
  • Fix tc.plugin charts labels (tc.plugin) (#15262, @ilyam8)
  • Fix collecting hostgroup from stats_mysql_connection_pool (go.d/proxysql) (#1226, @ilyam8)

Other

Documentation

Packaging / Installation

Health

Exporting

  • Hide not available for viewers charts when exporting in the shell format (#15309, @ilyam8)
  • Fix slow exporting in Prometheus format (#15276, @ilyam8)

Other Notable Changes

Improvements

  • Enrichment of /api/v2, buildinfo improvements and code cleanup (#15294, @ktsaou)

Bug fixes

Code organization

Deprecation notice

There is not an obvious list of items that will be deprecated in the upcoming release (v1.42.0). Feel free to check and elaborate on the upcoming backlog

Deprecated in this release

In accordance with our previous deprecation notice, the following items in this release:

Component Type Will be replaced by
python.d/nvidia_smi collector go.d/nvidia_smi
family attribute alert configuration and Health API chart labels attribute (more details on netdata#15030)

Netdata Release Meetup

Join the Netdata team on the 21st of July at 17:00 UTC for the Netdata Release Meetup.

Together we’ll cover:

  • Release Highlights.
  • Acknowledgements.
  • Q&A with the community.

RSVP now - we look forward to meeting you.

Support options

As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:

  • Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
  • GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
  • GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
  • Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
  • Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 1400 engineers are already using it!

v1.40.1

10 months ago

Netdata v1.40.1 is a patch release to address issues discovered since v1.40.0.

This patch release provides the following bug fixes:

  • Fixed ebpf sync thread crash (#15174, thiagoftsm).
  • Fixed ebpf threads taking too long to terminate (#15187, thiagoftsm).
  • Fixed building with eBPF on RPM systems due to missing build dependency (#15192, k0ste).
  • Fixed building on macOS due to incorrect include directive (#15195, nandahkrishna).
  • Fixed a crash during health log entry processing (#15209, stelfrag).
  • Fixed architecture detection on i386 when building native packages (#15218, ilyam8).
  • Fixed SSL non-blocking retry handling in the web server (#15222, ktsaou).
  • Fixed handling of plugin ownership in static builds (#15230, Ferroin).
  • Fixed an exception in python.d/nvidia_smi due to not handling N/A value (#15231, ilyam8).
  • Fixed installing the wrong systemd unit file on older RPM systems (#15240, Ferroin).
  • Fixed creation of charts for network interfaces of virtual machines/containers as normal network interface charts (#15244, ilyam8).
  • Fixed building on openSUSE Leap 15.4 due to incorrect $(libh2o_dir) expansion (#15253, Dim-P).

Acknowledgements

We would like to thank our dedicated, talented contributors that make up this amazing community. The time and expertise that you volunteer are essential to our success. We thank you and look forward to continuing to grow together to build a remarkable product.

  • @k0ste for fixing building with eBPF on RPM systems.
  • @nandahkrishna for fixing building on macOS.

Support options

As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:

  • Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
  • GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
  • GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
  • Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
  • Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 1400 engineers are already using it!

v1.40.0

10 months ago

Netdata Growth

🚀 Our community growth is increasing steadily. ❤️ Thank you! Your love and acceptance give us the energy and passion to work harder to simplify and make monitoring easier, more effective and more fun to use.

  • Over 63,000 GitHub Stars ⭐
  • Over 1.5 million online nodes
  • Almost 94 million sessions served
  • Over 600 thousand total nodes in Netdata Cloud
    Wow! Netdata Cloud is about to become the biggest and most scalable monitoring infra ever created!

Let the world know you love Netdata. Give Netdata a ⭐ on GitHub now. Motivate us to keep pushing forward!

Unlimited Docker Hub Pulls!

To help our community use Netdata more broadly, we just signed an agreement with Docker for the purchase of Rate Limit Removal, which will remove all Docker Hub pull limits for the Netdata repos at Docker Hub. We expect this add-on to be applied to our repos in the following few days, so that you will enjoy unlimited Docker Hub pulls of Netdata Docker images for free!

Release Highlights

Dashboard Sections' Summary Tiles

Netdata Cloud dashboards have been improved to provide instant summary tiles for most of their sections. This includes system overview, disks, network interfaces, memory, mysql, postgresql, nginx, apache, and dozens more.

To accomplish this, we extended the query engine of Netdata to support multiple grouping passes, so that queries like "sum metrics by label X, and then average by node" are now possible. At the same time we made room for presenting anomaly rates on them (vertical purple bar on the right) and significantly improved the tile placement algorithm to support multi-line summary headers and precise sizing and positioning, providing a look and feel like this:

image

The following chart tile types have been added:

  • Donut
  • Gauge
  • Bar
  • Trendline
  • Number
  • Pie chart

To improve the efficiency of using these tiles, each of these tiles supports the following interactive actions:

  1. Clicking the title of the tile scroll the dashboard to the data source chart, where you can slice, dice and filter the data based on which the tile was created.
  2. Hovering the tile with your mouse pointer, the NIDL (Nodes, Instances, Dimensions, Labels) framework buttons appear, allowing you to explore and filter the data set, right on the tile.

Some examples that you can see from the Netdata Demo space:

Silencing of Cloud Alert Notifications

Although Netdata Agent alerts support silencing, centrally dispatched alert notifications from Netdata Cloud were missing that feature. Today, we release alert notifications silencing rules for Netdata Cloud!

Silencing rules are applied on any combination of the following: users, rooms, nodes, host labels, contexts (charts), alert name, alert role. For the matching alerts, silencing can optionally have a starting date and time and/or an ending date time.

With this feature you can now easily setup silencing rules, which can be set to be applied immediately or at a defined schedule, allowing you to plan for upcoming schedule maintenance windows - see some examples here.

Image

Read more about Silencing Alert notifications on our documentation.

Machine Learning - Extended Training to 24 Hours

Netdata trains ML models for each metric, using its past data. This allows Netdata to detect anomalous behaviors in metrics, based exclusively on the recent past data of the metric itself.

Before this release Netdata was training one model of each metric, learning the behavior of each metric during the last 4 hours. In the previous release we introduced persisting these models to disk and loading them back when Netdata restarts.

In this release we change the default ML settings to support multiple models per metric, maintaining multiple trained models per metric, covering the behavior of each metric for last 24 hours. All these models are now consulted automatically in order to decide if a data collection point is anomalous or not.

This has been implemented in a way to avoid introducing additional CPU overhead on Netdata agents. So, instead of training one model for 24 hours which would introduce significant query overhead on the server, we train each metric every 3 hours using the last 6 hours of data, and we keep 9 models per metric. The most recent model is consulted first during anomaly detection. Additional models are consulted as long as the previous ones predict an anomaly. So only when all 9 models agree that a data collection is anomalous, we mark the collected sample as anomalous in the database.

The impact of these changes is more accurate anomaly detection out of the box, with much fewer false positives.

You can read more about it in this deck presented during a recent office hours (office hours recording).

Rewritten SSL Support for the Agent

The SSL support at the Netdata Agent has been completely rewritten. The new code now reliably support SSL connections for both the Netdata internal web server and streaming. It is also easier to understand, troubleshoot and expand. At the same time performance has been improved by removing redundant checks.

During this process a long-standing bug on streaming connection timeouts has been identified and fixed, making streaming reliable and robust overall.

Alerts and Notifications

Mattermost notifications for Business Plan users

To keep building up on our set of existing alert notification methods we added Mattermost as another notification integration option on Netdata Cloud. As part of our commitment to expanding our set of alert notification methods, Mattermost provides another reliable way to deliver alerts to your team, ensuring the continuity and reliability of your services.

Business Plan users can now configure Netdata Cloud to send alert notifications to their team on Mattermost.

image

Visualizations / Charts and Dashboards

Netdata Functions

On top of the work done on release v1.38, where we introduced real-time functions that enable you to trigger specific routines to be executed by a given Agent on demand. Our initial function provided detailed information on currently running processes on the node, effectively replacing top and iotop.

We have now added the capability to group your results by specific attributes. For example, on the Processes function you are now able to group the results by: Category, Cmd or User. With this capability you can now get a consolidated view of your reported statistics over any of these attributes.

image

External plugin integration

The agent core has been improved when it comes to integration with external plugins. Under certain conditions, a failed plugin would not be correctly acknowledged by the agent resulting in a defunc (i.e. zombie) plugin process. This is now fixed.

Preliminary steps to split native packages

Starting with this release, our official DEB/RPM packages have been split so that each external data collection plugin is in its own package instead of having everything bundled into a single package. We have previously had our CUPS and FreeIPMI collectors split out like this, but this change extends that to almost all of our external data collectors. This is the first step towards making these external collectors optional on installs that use our native packages, which will in turn allow users to avoid installing things they don’t actually need.

Short-term, these external collectors are listed as required dependencies to ensure that updates work correctly. At some point in the future almost all of them will be changed to be optional dependencies so that users can pick and choose which ones they want installed.

This change also includes a large number of fixes for minor issues in our native packages, including better handling of user accounts and file permissions and more prevalent usage of file capabilities to improve the security of our native packages.

Acknowledgements

We would like to thank our dedicated, talented contributors that make up this amazing community. The time and expertise that you volunteer are essential to our success. We thank you and look forward to continuing to grow together to build a remarkable product.

  • @n0099 for fixing typos in the documentation.
  • @mochaaP for fixing cross-compiling issues.
  • @jmphilippe for making control address configurable in python.d/tor.
  • @TougeAI for documenting the "age" configuration option in python.d/smartd_log.
  • @mochaaP for adding support of python-oracledb to python.d/oracledb.

Contributions

Collectors

Improvements

Bug fixes

  • Fix handling of newlines in HELP (go.d/prometheus) (#1196, @ilyam8)
  • Fix collection of bind mounts (diskspace.plugin) (#14831, @MrZammler)
  • Fix collection of zero metrics if Zswap is disabled (debugfs.plugin) (#15054, @ilyam8)

Other

  • Document the "age" configuration option (python.d/smartd_log) (#15171, @TougeAI)
  • Send EXIT before exiting in (freeipmi.plugin, debugfs.plugin) (#15140, @ilyam8)

Documentation

Packaging / Installation

Streaming

  • Streaming improvements and rewrite of SSL support in Netdata (#15113, @ktsaou)

Health

Exporting

ML

Other Notable Changes

Improvements

Bug fixes

Code organization

Deprecation notice

The following items will be removed in our next minor release (v1.41.0):

Patch releases (if any) will not be affected.

Component Type Will be replaced by
python.d/nvidia_smi collector go.d/nvidia_smi
family attribute alert configuration and Health API chart labels attribute (more details on netdata#15030)

When using Netdata Cloud, the required agent version to take most benefits from the latest features is one version before the last stable. On this release this will become v1.39.1 and you'll be notified and guided to take action on the UI if you are running agents on lower versions.

Check here for details on how to Update Netdata agents.

Netdata Release Meetup

Join the Netdata team on the 19th of June at 16:00 UTC for the Netdata Release Meetup.

Together we’ll cover:

  • Release Highlights.
  • Acknowledgements.
  • Q&A with the community.

RSVP now - we look forward to meeting you.

Support options

As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:

  • Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
  • GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
  • GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
  • Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
  • Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 1400 engineers are already using it!

Running survey

Helps us make Netdata even greater! We are trying to gather valuable information that is key for us to better position Netdata and ensure we keep bringing more value to you.

We would appreciate if you could take some time to answer this short survey (4 questions only).

v1.39.1

11 months ago

This patch release provides the following bug fixes:

  • We noticed that claiming and enabling auto-updates have been failing due to incorrect permissions when kickstart.sh was doing a static installation. The issue has affected all static installations, including the one done from the Windows MSI installer. The permissions have now been corrected.

  • The recipient lists of agent alert notifications are configurable via the health_alarm_notify.conf file. A stock file with default configurations can be modified using edit-config. @jamgregory noticed that the default settings in that file can make changing role recipients confusing. Unless the edited configuration file included every setting of the original stock file, the resulting behavior was unintuitive. @jamgregory kindly added a PR to fix the handling of custom role recipient configurations.

  • A bug in our collection and reporting of Infiniband bandwidth was discovered and fixed.

  • We noticed memory buffer overflows under some very specific conditions. We adjusted the relevant buffers and the calls to strncpyz to prevent such overflows.

  • A memory leak in certain circumstances was found in the ACLK code. We fixed the the incorrect data handling that caused it.

  • An unrelated memory leak was discovered in the ACLK code and has also been fixed.

  • Exposing the anomaly rate right on top of each chart in Netdata Cloud surfaced an issue of bad ML models on some very noisy metrics. We addressed the issue by suppressing the indications that these noisy metrics would produce. This change gives the ML model a chance to improve, based on additional collected data.

  • Finally, we improved the handling of errors during ML transactions, so that transactions are properly rolled back, instead of failing in the middle.

Support options

As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:

  • Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
  • GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
  • GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
  • Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
  • Discord: Jump into the Netdata Discord and hangout with like-minded sysadmins, DevOps, SREs and other troubleshooters. More than 1300 engineers are already using it!