The open-source observability platform everyone needs!
systemd-journal
logs release!Steady to our schedule, this is another great Netdata release!
65.5 k GitHub Stars ⭐ Since October 2023, Netdata is leading the observability category in the CNCF landscape, surpassing Elasticsearch. Thank you for your love ❤️! Give Netdata a ⭐ too, on GitHub!
595 M docker hub pulls Netdata runs with about 200k docker hub downloads per day. Since June 2023 we are a Verified Publisher, so that Netdata pulls don't count against docker hub pull limits for our users, allowing all our users to integrate Netdata to their CI/CD toolchains.
This release is the most robust and reliable Netdata we have ever built.
These are the main areas Netdata has improved since the last release:
Logs
Today we release an almost rewritten version of systemd-journal
, to improve its performance and visualization capabilities. systemd-journal
holds critical systems and security information and given the lack of systemd-journal
visualization tools, we focused first on filling this gap. At the same time, we are standardizing the way logs should be as a part of Netdata, enabling us to support more log management engines, like Loki and Elasticsearch.
Instances Slice and Dice
Given the capabilities of the new Netdata Agent UI (v2), we are changing the way some of our collectors collect and expose metrics, to allow easier slicing and dicing of the data and be more OpenTelemetry compatible in terms of specifications. So, in this release we changed the way apps.plugin
exposes charts in the Applications
section of the dashboard. Following the NIDL framework, each application group is now an instance, allowing better aggregation of processes utilization across nodes. Similarly, our systemd
units charts have been updated to have an instance for each systemd
unit. For the same reasons, disk charts now have additional labels (id
, model
and serial
) to help us identify disks from the charts. Unfortunately, such changes tend to make the older dashboards (v1, v0) less usable, especially on servers with many hundreds of instances.
Stock Alerts A number of changes have been implemented to the Netdata Health engine, to allow better integration with the new dashboard. More changes in this area are about to come, as part of the next release: a) allow multi-node alerts on parents, b) allow evaluating and configuring alerts from the UI.
Alerts Accuracy Netdata has by default 3 tiers of metrics, each with a different resolution. The Netdata query planner is automatically picking the right tier to satisfy a query, based on the number of points requested in the response. For alerts there was a side effect. Since alerts request only 1 point of data in the response, the query planner was picking the "easier" tier to query, which is of course the one with the lower resolution. Now alerts are always run on tier 0, the higher resolution one.
Lower Resources Utilization
Several changes have been implemented for Netdata to better take care of itself. That includes lower memory usage, lower disk footprint, self vacuuming of SQLite databases, and more. Probably the most notable change is that now Netdata needs only 1 pointer (8 bytes on 64 bit, 4 bytes on 32 bit) for each use of a label name-value
combination. This improves drastically Netdata's memory requirements in setups like busy k8s clusters, that containers come and go all the time, increasing the labels cardinality significantly.
32bit Netdata on 64bit IoT machines A common request when Netdata is installed on 64bit IoT devices, is to have a 32bit Netdata running there. Before this release, this was not possible. Now a 32bit Netdata will nicely run on a 64bit operating system.
Netdata Cloud on prem Netdata Cloud is now available to be installed on-prem! Several companies have already deployed it and are currently testing it. If you want to join them, submit this form.
systemd-journal
systemd-journal
was first included in Netdata v1.42.0. Immediately after release, we recognized the wider need for this feature, so we've rewritten the plugin almost entirely, to provide the best possible experience. This work is also fundamental for supporting more log monitoring integrations - stay tuned!
The major improvements done on systemd-journal
logs function were:
journalctl -f
, showing new logs entries immediately after they are receivedjournalctl
doesIf you want to take a look at a full presentation of the systemd-journal
plugin, how it works, how you can take full advantage of this and even instructions on configuration of a logs centralization server, check the documentation for the plugin.
You can experience the power of systemd-journal
logs function in one of our Netdata demo rooms here
or check our latest YouTube video on it.
Want to know why you should untap the full potential of systemd-journal
logs? Check out Netdata's founder, Costa Tsaousis @ktsaou, blogpost on it here.
With the increased feedback and requests on VMware vCenter Server collectors we have:
host
, datacenter
, cluster
, vm
It is with this feedback from the Community that we can keep working on improving Netdata to ensure it meets your needs!
We are currently working on the following areas, which we hope to release next month:
Logs Explorer for Loki and Elasticsearch
Similar to systemd-journal
, allow Netdata to explore, query and visualize logs from Loki and Elasticsearch.
Collectors Configuration from the UI In the last release we presented the Integrations Marketplace. Since then, we work to make all integrations configurable via the dashboard. This will allow all of us to configure our Netdata servers directly from the UI, without touching configuration files, improving significantly the usability and easiness of Netdata.
Alerts Configuration from the UI Similarly, we work to allow configuring alerts directly from the UI, without text file configurations, so the all of us can create powerful alerts on the spot.
Netdata Mobile App We are at the final stage of releasing our Netdata Mobile App (iOS and Android) for receiving mobile push notifications and exploring alerts statuses.
Scalability Given the wide adoption of Netdata, we are committed to make Netdata scale better in larger environments. Especially when it comes to Netdata parents, we aim to provide the best scalability possible. We are currently finalizing the necessary changes to allow Netdata achieve:
Of course, the numbers depend on the CPU and its clock, but they shouldn't vary significantly on modern systems.
At the same time, we work to integrate Gorilla compression to our database. This will provide a significantly better overall memory footprint for Netdata.
We would like to thank our dedicated, talented contributors that make up this amazing community. The time and expertise that you volunteer are essential to our success. We thank you and look forward to continuing to grow together to build a remarkable product.
netdata/ansible
playbookdelete old models param
to ML readme (#15873, @andrewm4894)netdata/ansible
(netdata/ansible#6, @luisj1983
anomaly_detection.detector_events
chart (#16028, @andrewm4894)anomaly_detection.type_anomaly_rate
stacked (#15895, @andrewm4894)In accordance with our previous deprecation notice, the following items in this release have been changed:
Component | Type | Change | Action |
---|---|---|---|
apps.plugin | collector | a dimension for each group/user/user group => a chart for each group/user/user group | |
cgroups.plugin | collector | a dimension for each systemd service => a chart for each systemd service | |
proc.plugin | collector | all "Networking Stack" metrics except "tcp" have been moved to "IPv4 Networking" | |
family attribute |
alert configuration and Health API | deprecated | use chart labels |
We plan to change in the next release (v1.44.0):
Component | Type | Change | Action |
---|---|---|---|
charts.d/nut | collector | deprecated | use go.d/upsd |
Join the Netdata team on the 18th of October at 16:30 UTC for the Netdata Release Meetup.
Together we’ll cover:
RSVP now - we look forward to meeting you.
As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:
Netdata v1.42.4 is a patch release to address issues discovered since v1.42.3.
This patch release provides the following bug fixes and updates:
As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:
Netdata v1.42.3 is a patch release to address issues discovered since v1.42.2.
This patch release provides the following bug fixes and updates:
We would like to thank our dedicated, talented contributors that make up this amazing community. The time and expertise that you volunteer are essential to our success. We thank you and look forward to continuing to grow together to build a remarkable product.
As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:
Netdata v1.42.2 is a patch release to address issues discovered since v1.42.1.
This patch release provides the following bug fixes and updates:
We would like to thank our dedicated, talented contributors that make up this amazing community. The time and expertise that you volunteer are essential to our success. We thank you and look forward to continuing to grow together to build a remarkable product.
As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:
Netdata v1.42.1 is a patch release to address issues discovered since v1.42.0.
This patch release provides the following bug fixes and updates:
As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:
Steady to our schedule, this is another great Netdata release!
64.5 k GitHub Stars ⭐
Netdata got at the top trending repos on GitHub, after the last release. ❤️ Thank you for your love! 🚀 You rock!
580+ M docker hub pulls, running at 200+ k per day.
Netdata is a verified publisher on Docker Hub, and our users enjoy free unlimited Docker Hub pulls!
A beta version of the Netdata Marketplace is included in this release:
More than 800 integrations are available, directly from the dashboard. For each integration, all the information required to get it up and running is included:
Integrations are still in beta. We improve it every day, but we think it is already quite useful.
A new Netdata Function has been added to query the systemd journal logs:
The function respects the current date-time picker, so it can query any possible timeframe the systemd journal has data for.
IMPORTANT
Netdata Functions are available only when you are signed in to Netdata and your Netdata Agent is claimed. This has been done to protect your privacy. Netdata Cloud checks that the users of the Agent dashboard are allowed to view this information.
IMPORTANT
Thesystemd-journal
function is currently available only on Netdata Agents that have been installed from source, or with native packages of the Linux distribution (RPM, DEB). For users running static builds of Netdata or running Netdata in a Docker container, we are working to bringsystemd-journal
to them too. Stay tuned...
You can now connect your agents to Netdata Cloud, via the dashboard:
The UI verifies that you are the owner of a Netdata, by asking you to provide a random key that is saved to a file on disk. Once you provide the right key, Netdata is automatically claimed to your space at Netdata Cloud.
The UI has an AR
button above the menu. When you press it, the dashboard queries the Netdata Metrics Scoring Engine, to find the anomaly rates for the visible timeframe, across the metrics included in the dashboard. Then it add a badge next to each category and subcategory, showing its anomaly rate.
This way, you can quickly spot what is anomalous on the current view of the dashboard.
We would like to thank our dedicated, talented contributors that make up this amazing community. The time and expertise that you volunteer are essential to our success. We thank you and look forward to continuing to grow together to build a remarkable product.
diskquota
collector to third party collectors list (#15524, @andrewm4894)We plan to change the following items in the next release (v1.43.0):
Component | Type | Change | Action |
---|---|---|---|
apps.plugin | collector | a dimension for each group/user/user group => a chart for each group/user/user group | |
cgroups.plugin | collector | a dimension for each systemd service => a chart for each systemd service | |
proc.plugin | collector | all "Networking Stack" metrics except "tcp" => "IPv4 Networking" | |
python.d/nvidia_smi | collector | deprecated | use go.d/nvidia_smi |
family attribute |
alert configuration and Health API | deprecated | use chart labels |
Join the Netdata team on the 11th of August at 17:00 UTC for the Netdata Release Meetup.
Together we’ll cover:
RSVP now - we look forward to meeting you.
As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:
Checkout the v1.41 release meetup recording or read on to learn more about the new UI and other features in this release.
Steady to our schedule, this is another great Netdata release!
❤️ Thank you for your love! 🚀 You rock!
Netdata Agents and Parents now have a new UI!
New CHARTS :green_circle: New SUMMARIES :green_circle: MACHINE-LEARNING FIRST :green_circle: INFRASTRUCTURE LEVEL DASHBOARDS :green_circle: FILTER, SLICE, and DICE any dataset :green_circle: ANOMALY ADVISOR :green_circle: METRICS CORRELATIONS :green_circle: NETDATA FUNCTIONS :green_circle: EVENTS FEED :green_circle: HEATMAPS :green_circle:
In the last few months, we have ported and open-sourced all Netdata Cloud APIs to the Netdata Agent, allowing Netdata Parents to drive the same multi-node / infrastructure level dashboards Netdata Cloud provides!
So, as of today, Netdata Agents and Parents present the same UI, exactly the same dashboard, charts and features with Netdata Cloud!
Apart from the entirely new look, single-node dashboards now group similar charts together. So, all disk drives, network interfaces, cgroups (containers and VMs), are now a single set of charts.
This allows Netdata to aggregate a vast amount of datasets in a chart, like the following, where almost 20k containers are now manageable:
To make it easier for you to navigate, filter, slice, and dice the data, the menus above each chart give you easy access to all the data of the chart:
When Netdata Agents are configured as Parents (multiple other agents stream metrics to them), they now present multi-node and multi-instance charts. At the top right corner of the dashboard, there is the global nodes filter, from which you can slice the entire dashboard for one or a few of your nodes.
Get a firsthand walkthrough with Costa Tsaousis, Netdata's Founder, on the rationale for this change and the path Netdata is taking by checking the video from Netdata Office Hours on YouTube.
You can still access all versions of the dashboards, as follows:
http://your.server:19999/
The default dashboard is now a live version of the new UI. The dashboard static files are served by Cloudflare and are automatically updated when we release a new version of the UI, so that your Netdata agent is always up to date.
http://your.server:19999/v2/
A local copy of the latest dashboard, as it was at the time the agent was released. This is distributed with Netdata under the Netdata Cloud UI License v1.0. The local copy is automatically used if for any reason the web browser cannot download the live version of it.
http://your.server:19999/v1/
The previous single-node version of the Netdata Agent dashboard.
http://your.server:19999/v0/
The now ancient, original version of the Netdata Agent dashboard.
Netdata Assistant: Your AI-Powered Troubleshooting Sidekick
The Netdata Assistant is an AI-powered tool that uses large language models and our community's knowledge to guide you during troubleshooting and help you get to the root cause sooner.
The goal of the Netdata Assistant is straightforward: to make your troubleshooting process easier. It's here to save you from the hassle of sifting through tons of information so you can focus on solving the problem at hand.
It will give you the lowdown on the alert, why it's happening, and why you should care. It'll also guide you on how to troubleshoot it and even offer some handy web links for more info if you're interested.
Read more about it on the Netdata blog here.
Netdata got a new FreeIPMI collector. The new collector is able to collect IPMI sensors at a much better data collection rate, and it is more reliable and robust compared to the previous one.
We have also categorized all sensors based on the component they monitor:
And provided as labels the exact sensor name each metric refers to:
"FD" stands for "file descriptor". A file descriptor is an integer that the operating system assigns to an open file to track it. This includes regular data files, directories, network sockets, pipes, and other types of I/O streams.
In Linux, everything is treated as a file, which includes hardware devices, directories, and sockets. Each open file is assigned a file descriptor. When a file is closed, its file descriptor is freed up for reuse. However, if an application doesn't close a file when it's done with it, that's called a "file descriptor leak".
File descriptor leaks can cause several problems:
Resource exhaustion: Each process has a limit to the number of file descriptors it can open. If a process continually leaks file descriptors without closing them, it will eventually hit this limit and won't be able to open any more files, which often causes the process to crash.
Unexpected behavior: Open file descriptors hold resources, like network sockets, that might be expected to be available for other uses. If these resources are tied up due to a leak, it can cause unexpected behavior.
Security issues: File descriptors can sometimes be used to gain unauthorized access to data if they're not properly managed.
apps.plugins
is now able to track the usage of FDs against the limits set for each application. We have added an fds
category in the Applications
section of the dashboard. The first chart shows the percentage of FDs used by each application against its limits:
We would like to thank our dedicated, talented contributors that make up this amazing community. The time and expertise that you volunteer are essential to our success. We thank you and look forward to continuing to grow together to build a remarkable product.
info
macro with a less generic name.nan
(#15348, @ilyam8)error
function (#15296, @thiagoftsm)info
macro with a less generic name (#15266, @carlocab)There is not an obvious list of items that will be deprecated in the upcoming release (v1.42.0). Feel free to check and elaborate on the upcoming backlog
In accordance with our previous deprecation notice, the following items in this release:
Component | Type | Will be replaced by |
---|---|---|
python.d/nvidia_smi | collector | go.d/nvidia_smi |
family attribute |
alert configuration and Health API | chart labels attribute (more details on netdata#15030) |
Join the Netdata team on the 21st of July at 17:00 UTC for the Netdata Release Meetup.
Together we’ll cover:
RSVP now - we look forward to meeting you.
As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:
Netdata v1.40.1 is a patch release to address issues discovered since v1.40.0.
This patch release provides the following bug fixes:
We would like to thank our dedicated, talented contributors that make up this amazing community. The time and expertise that you volunteer are essential to our success. We thank you and look forward to continuing to grow together to build a remarkable product.
As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:
🚀 Our community growth is increasing steadily. ❤️ Thank you! Your love and acceptance give us the energy and passion to work harder to simplify and make monitoring easier, more effective and more fun to use.
Let the world know you love Netdata. Give Netdata a ⭐ on GitHub now. Motivate us to keep pushing forward!
To help our community use Netdata more broadly, we just signed an agreement with Docker for the purchase of Rate Limit Removal, which will remove all Docker Hub pull limits for the Netdata repos at Docker Hub. We expect this add-on to be applied to our repos in the following few days, so that you will enjoy unlimited Docker Hub pulls of Netdata Docker images for free!
Netdata Cloud dashboards have been improved to provide instant summary tiles for most of their sections. This includes system overview, disks, network interfaces, memory, mysql, postgresql, nginx, apache, and dozens more.
To accomplish this, we extended the query engine of Netdata to support multiple grouping passes, so that queries like "sum metrics by label X, and then average by node" are now possible. At the same time we made room for presenting anomaly rates on them (vertical purple bar on the right) and significantly improved the tile placement algorithm to support multi-line summary headers and precise sizing and positioning, providing a look and feel like this:
The following chart tile types have been added:
To improve the efficiency of using these tiles, each of these tiles supports the following interactive actions:
Some examples that you can see from the Netdata Demo space:
Although Netdata Agent alerts support silencing, centrally dispatched alert notifications from Netdata Cloud were missing that feature. Today, we release alert notifications silencing rules for Netdata Cloud!
Silencing rules are applied on any combination of the following: users, rooms, nodes, host labels, contexts (charts), alert name, alert role. For the matching alerts, silencing can optionally have a starting date and time and/or an ending date time.
With this feature you can now easily setup silencing rules, which can be set to be applied immediately or at a defined schedule, allowing you to plan for upcoming schedule maintenance windows - see some examples here.
Read more about Silencing Alert notifications on our documentation.
Netdata trains ML models for each metric, using its past data. This allows Netdata to detect anomalous behaviors in metrics, based exclusively on the recent past data of the metric itself.
Before this release Netdata was training one model of each metric, learning the behavior of each metric during the last 4 hours. In the previous release we introduced persisting these models to disk and loading them back when Netdata restarts.
In this release we change the default ML settings to support multiple models per metric, maintaining multiple trained models per metric, covering the behavior of each metric for last 24 hours. All these models are now consulted automatically in order to decide if a data collection point is anomalous or not.
This has been implemented in a way to avoid introducing additional CPU overhead on Netdata agents. So, instead of training one model for 24 hours which would introduce significant query overhead on the server, we train each metric every 3 hours using the last 6 hours of data, and we keep 9 models per metric. The most recent model is consulted first during anomaly detection. Additional models are consulted as long as the previous ones predict an anomaly. So only when all 9 models agree that a data collection is anomalous, we mark the collected sample as anomalous in the database.
The impact of these changes is more accurate anomaly detection out of the box, with much fewer false positives.
You can read more about it in this deck presented during a recent office hours (office hours recording).
The SSL support at the Netdata Agent has been completely rewritten. The new code now reliably support SSL connections for both the Netdata internal web server and streaming. It is also easier to understand, troubleshoot and expand. At the same time performance has been improved by removing redundant checks.
During this process a long-standing bug on streaming connection timeouts has been identified and fixed, making streaming reliable and robust overall.
To keep building up on our set of existing alert notification methods we added Mattermost as another notification integration option on Netdata Cloud. As part of our commitment to expanding our set of alert notification methods, Mattermost provides another reliable way to deliver alerts to your team, ensuring the continuity and reliability of your services.
Business Plan users can now configure Netdata Cloud to send alert notifications to their team on Mattermost.
On top of the work done on release v1.38, where we introduced real-time functions that enable you to trigger specific routines to be executed by a given Agent on demand. Our initial function provided detailed information on currently running processes on the node, effectively replacing top and iotop.
We have now added the capability to group your results by specific attributes. For example, on the Processes function you are now able to group the results by: Category, Cmd or User. With this capability you can now get a consolidated view of your reported statistics over any of these attributes.
The agent core has been improved when it comes to integration with external plugins. Under certain conditions, a failed plugin would not be correctly acknowledged by the agent resulting in a defunc (i.e. zombie) plugin process. This is now fixed.
Starting with this release, our official DEB/RPM packages have been split so that each external data collection plugin is in its own package instead of having everything bundled into a single package. We have previously had our CUPS and FreeIPMI collectors split out like this, but this change extends that to almost all of our external data collectors. This is the first step towards making these external collectors optional on installs that use our native packages, which will in turn allow users to avoid installing things they don’t actually need.
Short-term, these external collectors are listed as required dependencies to ensure that updates work correctly. At some point in the future almost all of them will be changed to be optional dependencies so that users can pick and choose which ones they want installed.
This change also includes a large number of fixes for minor issues in our native packages, including better handling of user accounts and file permissions and more prevalent usage of file capabilities to improve the security of our native packages.
We would like to thank our dedicated, talented contributors that make up this amazing community. The time and expertise that you volunteer are essential to our success. We thank you and look forward to continuing to grow together to build a remarkable product.
The following items will be removed in our next minor release (v1.41.0):
Patch releases (if any) will not be affected.
Component | Type | Will be replaced by |
---|---|---|
python.d/nvidia_smi | collector | go.d/nvidia_smi |
family attribute |
alert configuration and Health API | chart labels attribute (more details on netdata#15030) |
When using Netdata Cloud, the required agent version to take most benefits from the latest features is one version before the last stable.
On this release this will become v1.39.1
and you'll be notified and guided to take action on the UI if you are running agents on lower versions.
Check here for details on how to Update Netdata agents.
Join the Netdata team on the 19th of June at 16:00 UTC for the Netdata Release Meetup.
Together we’ll cover:
RSVP now - we look forward to meeting you.
As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:
Helps us make Netdata even greater! We are trying to gather valuable information that is key for us to better position Netdata and ensure we keep bringing more value to you.
We would appreciate if you could take some time to answer this short survey (4 questions only).
This patch release provides the following bug fixes:
We noticed that claiming and enabling auto-updates have been failing due to incorrect permissions when kickstart.sh
was doing a static installation. The issue has affected all static installations, including the one done from the Windows MSI installer. The permissions have now been corrected.
The recipient lists of agent alert notifications are configurable via the health_alarm_notify.conf
file. A stock file with default configurations can be modified using edit-config
. @jamgregory noticed that the default settings in that file can make changing role recipients confusing. Unless the edited configuration file included every setting of the original stock file, the resulting behavior was unintuitive. @jamgregory kindly added a PR to fix the handling of custom role recipient configurations.
A bug in our collection and reporting of Infiniband bandwidth was discovered and fixed.
We noticed memory buffer overflows under some very specific conditions. We adjusted the relevant buffers and the calls to strncpyz
to prevent such overflows.
A memory leak in certain circumstances was found in the ACLK code. We fixed the the incorrect data handling that caused it.
An unrelated memory leak was discovered in the ACLK code and has also been fixed.
Exposing the anomaly rate right on top of each chart in Netdata Cloud surfaced an issue of bad ML models on some very noisy metrics. We addressed the issue by suppressing the indications that these noisy metrics would produce. This change gives the ML model a chance to improve, based on additional collected data.
Finally, we improved the handling of errors during ML transactions, so that transactions are properly rolled back, instead of failing in the middle.
As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels: