Kubernetes observability and automation, with an awesome Prometheus integration
Reduce alert fatigue by grouping similar alerts and summarize them into a Slack thread. Fully customizable based on severity, type of alerts, labels and more.
- slack_sink:
# other slack sink params
grouping:
group_by:
- cluster
interval: 86400
notification_mode:
summary:
threaded: true
by:
- identifier
- severity
Receive Robusta alerts in Zulip. Contributed by community member @oscgu. See here for detailed instructions.
nameOverride
and fullnameOverride
to helm chart by @kristeey in https://github.com/robusta-dev/robusta/pull/1388
namespace_labels
support to the sink scope mechanism by @RobertSzefler in https://github.com/robusta-dev/robusta/pull/1390
Full Changelog: https://github.com/robusta-dev/robusta/compare/0.11.0...0.12.0
Reduce alert fatigue by grouping similar alerts and summarize them into a Slack thread. Fully customizable based on severity, type of alerts, labels and more.
- slack_sink:
# other slack sink params
grouping:
group_by:
- cluster
interval: 86400
notification_mode:
summary:
threaded: true
by:
- identifier
- severity
Receive Robusta alerts in Zulip. Contributed by community member @oscgu. See here for detailed instructions.
nameOverride
and fullnameOverride
to helm chart by @kristeey in https://github.com/robusta-dev/robusta/pull/1388
namespace_labels
support to the sink scope mechanism by @RobertSzefler in https://github.com/robusta-dev/robusta/pull/1390
Full Changelog: https://github.com/robusta-dev/robusta/compare/0.11.0...0.11.1-alpha
Trigger a playbook when a resource field changes. Learn more
customPlaybooks:
- name: "NotifyOnImageChange"
triggers:
- on_deployment_update: {}
change_filters:
include:
- image
actions:
- resource_babysitter: {}
Triggers now have an extensive include/exclude definition. Explore examples
customPlaybooks:
- name: "PostgresWarning"
triggers:
- on_event_create:
scope:
include:
- attributes:
- "type=Warning, involvedObject.name=postgres, event.reason=FailedScheduling"
actions:
- create_finding: #
title: "Failed scheduling postgres"
aggregation_key: "FailedScheduling"
Users can now monitor and manage Argo rollouts directly within the Robusta UI. See PR
Mattermost
sink by @IdeoG in #1374.image_pull_backoff_reporter
is now ImagePullBackoff
)
If you're using Sink Matchers
based on the identifier
you will need to update your sinks configuration.
For example:
Image_pull_backoff_reporter
→ ImagePullBackoff
job_failure
→ JobFailure
krr_report
→ KrrReport
pod_oom_killer_enricher
→ PodOOMKilled
report_crash_loop
→ CrashLoopBackoff
job_failure -> JobFailure
image_pull_backoff_reporter -> ImagePullBackoff
krr_report -> KrrReport
pod_oom_killer_enricher -> PodOOMKilled
report_crash_loop -> CrashLoopBackoff
show_stackoverflow_search -> ShowStackoverflowSearch
argo_app_sync -> ArgoAppSync
scale_hpa_callback -> ScaleHpaAction
alert_on_hpa_reached_limit -> HpaReachedMaximum
daemonset_fix_config -> DaemonsetFixConfig
daemonset_silence_false_alarm -> DaemonsetSilenceFalseAlarm
report_rendering_task -> GrafanaReport
disk_benchmark -> DiskBenchmark
report_image_changes -> ReportImageChanges
http_get -> HttpGet
http_post -> http_post
http_put -> HttpPut
java_process_inspector -> JavaProcessInspector
pod_jmap_pid -> PodJmapPid
pod_jstack_pid -> PodJstackPid
job_restart_on_oomkilled_community -> JobRestartOnOomkilledCommunity
node_not_ready -> NodeNotReady
count_pod_creations -> CountPodCreations
volume_analysis -> VolumeAnalysis
python_profiler -> PythonProfiler
pod_processes -> PodProcesses
python_memory_allocations -> PythonMemoryAllocations
debugger_stack_trace -> DebuggerStackTrace
python_process_inspector -> PythonProcessInspector
python_debugger -> PythonDebugger
popeye_report -> PopeyeReport
volume_snapshot_error -> VolumeSnapshotError
volume_snapshot -> VolumeSnapshot
restart_loop_reporter -> CrashLoopBackoff
http_stress_test -> HttpStressTest
Generic finding key -> GenericFindingKey
Generic Change -> GenericChange
General scheduled task -> GeneralScheduledTask
crash_loop -> CrashLoop
fields_to_monitor
was removed from resource_babysitter
action. If you added a custom playbook, with this action, you may need to update the action configuration, see here about how to be alerted on custom configuration changes
Full Changelog: Compare Versions
Full Changelog: https://github.com/robusta-dev/robusta/compare/0.10.31...0.10.32-alpha.1
Full Changelog: https://github.com/robusta-dev/robusta/compare/0.10.31...0.10.32-alpha
Full Changelog: https://github.com/robusta-dev/robusta/compare/0.10.30...0.10.31
Full Changelog: https://github.com/robusta-dev/robusta/compare/0.10.30...0.10.31-alpha
You can now define a Scope for each Sink, using list of include and/or exclude conditions for each Sink.
sinksConfig:
- slack_sink:
name: prod_slack_sink
slack_channel: prod-notifications
api_key: secret-key
scope:
# AND between namespace and labels, but OR within each selector
include:
- namespace: default
labels: "instance=1,foo!=x.*"
- namespace: bla
name:
- foo
- qux
exclude:
- type: ISSUE
title: .*crash.*
- name: bar[a-z]*
Find out more here
All builtin Playbooks now have names, and can easily be disabled, or overridden.
disabledPlaybooks:
- ImagePullBackOff
customPlaybooks:
- name: "CustomImagePullBackOff"
triggers:
- on_image_pull_backoff:
fire_delay: 300 # fire only if failing to pull the image for 5 min
actions:
- image_pull_backoff_reporter: {}
Find out more here
And much more...
Full Changelog: https://github.com/robusta-dev/robusta/compare/0.10.29...0.10.30
sinksConfig:
- service_now_sink:
name: some_name
instance: abcd123
username: admin
password: blah-blah
caller_id: robusta
...And many small improvements and bug-fixes.
This is relevant only if you're using the mention_enricher
(to mention Slack users/groups)
The configuration was changed from ["<@U44V9P1JJ1Z>", "<!subteam^S22H3Q3Q111>"]
to ["U44V9P1JJ1Z", "S22H3Q3Q111"]
The change was because Kubernetes labels
doesn't allow special characters
Full Changelog: https://github.com/robusta-dev/robusta/compare/0.10.28...0.10.29