Grafana Mimir Versions Save

Grafana Mimir provides horizontally scalable, highly available, multi-tenant, long-term storage for Prometheus.

mimir-2.12.0

1 month ago

mimir-2.12.0-rc.1

1 month ago

This release contains 525 PRs from 60 authors, including new contributors Benoit Schipper, Derek Cadzow, Edwin, Itay Kalfon, Ivan Farré Vicente, Jan O. Rundshagen, Jorge Turrado Ferrero, Lukas Monkevicius, Mickaël Canévet, Rafael Sathler, Rajakavitha Kodhandapani, Tim Kotowski, Vladimir Varankin, Zach, Zach Day, Zirko, blut, github-actions[bot], ncharaf, zhehao-grafana. Thank you!

Grafana Mimir version 2.12.0-rc.1 release notes

Grafana Labs is excited to announce version 2.12 of Grafana Mimir.

The highlights that follow include the top features, enhancements, and bug fixes in this release. For the complete list of changes, refer to the CHANGELOG.

Features and enhancements

  • Added support to only count series that are considered active through the Cardinality API endpoint /api/v1/cardinality/label_names by passing the count_method parameter. If set to active it counts only series that are considered active according to the -ingester.active-series-metrics-idle-timeout flag setting rather than counting all in-memory series.

  • The "Store-gateway: bucket tenant blocks" admin page contains a new column "No Compact". If block no compaction marker is set, it specifies the reason and the date the marker is added.

  • The estimated number of compaction jobs based on the current bucket-index is now computed by the compactor. The result is tracked by the new cortex_bucket_index_compaction_jobs metric. If this computation fails, the cortex_bucket_index_compaction_jobs_errors_total metric is updated instead. The estimated number of compaction jobs is also shown in Top tenants, Tenants, and Compactor dashboards.

  • Added mimir-distroless container image built upon a distroless image (gcr.io/distroless/static-debian12). This improvement minimizes attack surfaces and potential CVEs by trimming down the dependencies within the image. After comprehensive testing, the Mimir maintainers plan to shift from the current image to the distroless version.

Additionally, the following previously experimental features are now considered stable:

  • The number of pre-allocated workers used to forward push requests to the ingesters, configurable via the -distributor.reusable-ingester-push-workers CLI flag on distributors. It now defaults to 2000. Note that this is a performance optimization, and not a limiting feature. If not enough workers available, new goroutines will be spawned.

  • The number of gRPC server workers used to serve the requests, configurable via the -server.grpc.num-workers CLI flag. It now defaults to 100. Note that this is the number of pre-allocated long-lived workers, and not a limiting feature. If not enough workers are available, new goroutines will be spawned.

  • The maximum number of concurrent index header loads across all tenants, configurable via the -blocks-storage.bucket-store.index-header.lazy-loading-concurrency CLI flag on store-gateways. It defaults to 4.

  • The maximum time to wait for the query-frontend to become ready before rejecting requests, configurable via the -query-frontend.not-running-timeout CLI flag on query-frontends. It now defaults to 2s.

  • The CLI flag that allows queriers to reduce pressure on ingesters by initially querying only the minimum set of ingesters required to reach quorum, -querier.minimize-ingester-requests. It is now enabled by default.

  • Spread-minimizing token-related CLI flags: -ingester.ring.token-generation-strategy, -ingester.ring.spread-minimizing-zones and -ingester.ring.spread-minimizing-join-ring-in-order. You can read more about this feature in our blog post.

Important changes

In Grafana Mimir 2.12 the following behavior has changed:

  • Store-gateway now persists a sparse version of the index-header to disk on construction and loads sparse index-headers from disk instead of the whole index-header. This improves the speed at which index headers are lazy-loaded from disk by up to 90%. The added disk usage is in the order of 1-2%.

  • Alertmanager deprecated the v1 API. All v1 API endpoints now respond with a JSON deprecation notice and a status code of 410. All endpoints have a v2 equivalent. The list of endpoints is:

    • <alertmanager-web.external-url>/api/v1/alerts
    • <alertmanager-web.external-url>/api/v1/receivers
    • <alertmanager-web.external-url>/api/v1/silence/{id}
    • <alertmanager-web.external-url>/api/v1/silences
    • <alertmanager-web.external-url>/api/v1/status
  • Exemplar's label traceID has been changed to trace_id to be consistent with the OpenTelemetry standard.

  • Errors returned by ingesters now contain only gRPC status codes. Previously they contained both gRPC and HTTP status codes. To guarantee backwards compatibility when migrating from a version prior to 2.11, it's necessary to first migrate to version 2.11, and then to version 2.12. Otherwise, it might happen that during the migration, some ingester errors with HTTP status code 4xx won't be recognized, and the corresponding request will be repeated.

  • Responses with gRPC status codes are now reported as status_code labels in the cortex_request_duration_seconds and cortex_ingester_client_request_duration_seconds metrics.

  • Responses with HTTP 4xx status codes are now treated as errors and used in status_code label of request duration metric.

The default value of the following CLI flags have been changed:

  • -blocks-storage.tsdb.head-postings-for-matchers-cache-max-bytes from 10MB to 100MB.
  • -blocks-storage.tsdb.block-postings-for-matchers-cache-max-bytes from 10MB to 100MB.
  • -blocks-storage.bucket-store.tenant-sync-concurrency from 10 to 1.
  • -query-frontend.max-cache-freshness from 1m to 10m.
  • -distributor.write-requests-buffer-pooling-enabled from false to true.
  • -locks-storage.bucket-store.block-sync-concurrency from 20 to 4.
  • -memberlist.stream-timeout from 10s to 2s.
  • -server.report-grpc-codes-in-instrumentation-label-enabled from false to true.

The following deprecated configuration options are removed in Grafana Mimir 2.12:

  • The YAML setting frontend.cache_unaligned_requests.
  • Experimental CLI flag -querier.prefer-streaming-chunks-from-ingesters.

The following configuration options are deprecated and will be removed in Grafana Mimir 2.14:

  • The CLI flag -ingester.limit-inflight-requests-using-grpc-method-limiter. It now defaults to true.

  • The CLI flag -ingester.return-only-grpc-errors. It now defaults to true. To guarantee backwards compatibility when migrating from a version prior to 2.11, it's necessary to first migrate to version 2.11, and then to version 2.12. Otherwise, it might happen that during the migration, some ingester errors with HTTP status code 4xx won't be recognized, and the corresponding request will be repeated.

  • The CLI flag -ingester.client.report-grpc-codes-in-instrumentation-label-enabled. It now defaults to true.

  • The CLI flag -distributor.limit-inflight-requests-using-grpc-method-limiter. It now defaults to true.

  • The CLI flag -distributor.enable-otlp-metadata-storage. It now defaults to true.

  • The CLI flag -querier.max-query-into-future.

The following metrics are removed or deprecated:

  • cortex_bucket_store_blocks_loaded_by_duration has been removed.
  • cortex_distributor_sample_delay_seconds has been deprecated and will be removed in Mimir 2.14.

Experimental features

Grafana Mimir 2.12 includes new features that are considered experimental and disabled by default. Use them with caution and report any issues you encounter:

  • The maximum number of tenant IDs that may be for a federated query can be configured via the -tenant-federation.max-tenants CLI flag on query-frontends. By default, it's 0, meaning that the limit is disabled.

  • Sharding of active series queries can be enabled via the -query-frontend.shard-active-series-queries CLI flag on query-frontends.

  • Timely head compaction can be enabled via the -blocks-storage.tsdb.timely-head-compaction-enabled on ingesters. If enabled, the head compaction happens when the min block range can no longer be appended, without requiring 1.5x the chunk range worth of data in the head.

  • Streaming of responses from querier to query-frontend can be enabled via the -querier.response-streaming-enabled CLI flag on queriers. This is currently supported only for responses from the /api/v1/cardinality/active_series endpoint.

  • The maximum response size for active series queries, in bytes, can be set via the -querier.active-series-results-max-size-bytes CLI flag on queriers.

  • Metric relabeling on a per-tenant basis can be forcefully disabled via the -distributor.metric-relabeling-enabled CLI flag on rulers. Metrics relabeling is enabled by default.

  • Query Queue Load Balancing by Query Component. Tenant query queues in the query-scheduler can now be split into subqueues by which query component is expected to be utilized to complete the query: ingesters, store-gateways, both, or uncategorized. Dequeuing queries for a given tenant will rotate through the query component subqueues via simple round-robin. In the event that the one of the query components (ingesters or store-gateways) experience a slowdown, queries only utilizing the the other query component can continue to be serviced. This feature is recommended to be enabled. The following CLI flags must be set to true in order to be in effect:

    • -query-frontend.additional-query-queue-dimensions-enabled on the query-frontend.
    • -query-scheduler.additional-query-queue-dimensions-enabled on the query-scheduler.
  • Owned series tracking in ingesters can be enabled via the -ingester.track-ingester-owned-series CLI flag. When enabled, ingesters will track the number of in-memory series that still map to the ingester based on the ring state. These counts are more reactive to ring and shard changes than in-memory series, and can be used when enforcing tenant series limits by enabling the -ingester.use-ingester-owned-series-for-limits CLI flag. This feature requires zone-aware replication to be enabled, and the replication factor to be equal to the number of zones.

Bug fixes

  • Distributor: fixed an issue where -distributor.metric-relabeling-enabled could cause distributors to panic.
  • Distributor: fix an issue where -distributor.metric-relabeling-enabled could cause distributors to write unsorted labels and corrupt blocks.
  • Ingester: errors encountered while iterating through chunks or samples in response to a query request aren't ignored anymore.
  • Compactor: out-of-order blocks aren't allowed to prevent timely compaction anymore.
  • Querier: requests to store-gateway when a query gets canceled aren't retried anymore.
  • Querier: status code 499 is now returned instead of 500 when a request to remote read endpoint gets canceled.
  • Querier: fixed an issue where -querier.max-fetched-series-per-query wasn't applied to /series endpoint in case series loaded from ingesters.
  • Querier: fixed an issue with the remote-read requests HTTP status code translations. Previously, remote-read had conflicting behaviours: when returning samples all internal errors were translated to HTTP 400, while when returning chunks all internal errors were translated to HTTP 500. With this fix, all validation errors will be translated into HTTP 400 errors, while all other errors will be translated into HTTP 500 errors.
  • Query-frontend: the cortex_query_frontend_queries_total metric incorrectly reported op="query" for any request which wasn't a range query. Now the op label value can be one of the following:
    • query: instant query
    • query_range: range query
    • cardinality: cardinality query
    • label_names_and_values: label names / values query
    • active_series: active series query
    • other: any other request
  • Ruler: fixed an issue where "failed to remotely evaluate query expression, will retry" messages were logged without context such as the trace ID and didn't appear in trace events.
  • Ruler: requests to remote querier when server's response exceeds its configured max payload size aren't retried anymore.
  • Ruler: fixed a regression that caused client errors to be tracked in cortex_ruler_write_requests_failed_total metric.
  • Ruler: fixed an issue with recording rule result being corruption due to an usage of a bad native histogram pointer.

Helm chart improvements

The Grafana Mimir and Grafana Enterprise Metrics Helm charts are released independently. Refer to the Grafana Mimir Helm chart documentation.

Changelog

2.12.0-rc.1

Grafana Mimir

  • [BUGFIX] Query-frontend: Fix memory leak on every request. #7654

All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.12.0-rc.0...mimir-2.12.0-rc.1

mimir-2.12.0-rc.0

2 months ago

This release contains 525 PRs from 60 authors, including new contributors Benoit Schipper, Derek Cadzow, Edwin, Itay Kalfon, Ivan Farré Vicente, Jan O. Rundshagen, Jorge Turrado Ferrero, Lukas Monkevicius, Mickaël Canévet, Rafael Sathler, Rajakavitha Kodhandapani, Tim Kotowski, Vladimir Varankin, Zach, Zach Day, Zirko, blut, github-actions[bot], ncharaf, zhehao-grafana. Thank you!

Grafana Mimir version 2.12.0-rc.0 release notes

Grafana Labs is excited to announce version 2.12 of Grafana Mimir.

The highlights that follow include the top features, enhancements, and bug fixes in this release. For the complete list of changes, refer to the CHANGELOG.

Features and enhancements

  • Added support to only count series that are considered active through the Cardinality API endpoint /api/v1/cardinality/label_names by passing the count_method parameter. If set to active it counts only series that are considered active according to the -ingester.active-series-metrics-idle-timeout flag setting rather than counting all in-memory series.

  • The "Store-gateway: bucket tenant blocks" admin page contains a new column "No Compact". If block no compaction marker is set, it specifies the reason and the date the marker is added.

  • The estimated number of compaction jobs based on the current bucket-index is now computed by the compactor. The result is tracked by the new cortex_bucket_index_compaction_jobs metric. If this computation fails, the cortex_bucket_index_compaction_jobs_errors_total metric is updated instead. The estimated number of compaction jobs is also shown in Top tenants, Tenants, and Compactor dashboards.

  • Added mimir-distroless container image built upon a distroless image (gcr.io/distroless/static-debian12). This improvement minimizes attack surfaces and potential CVEs by trimming down the dependencies within the image. After comprehensive testing, the Mimir maintainers plan to shift from the current image to the distroless version.

Additionally, the following previously experimental features are now considered stable:

  • The number of pre-allocated workers used to forward push requests to the ingesters, configurable via the -distributor.reusable-ingester-push-workers CLI flag on distributors. It now defaults to 2000. Note that this is a performance optimization, and not a limiting feature. If not enough workers available, new goroutines will be spawned.

  • The number of gRPC server workers used to serve the requests, configurable via the -server.grpc.num-workers CLI flag. It now defaults to 100. Note that this is the number of pre-allocated long-lived workers, and not a limiting feature. If not enough workers are available, new goroutines will be spawned.

  • The maximum number of concurrent index header loads across all tenants, configurable via the -blocks-storage.bucket-store.index-header.lazy-loading-concurrency CLI flag on store-gateways. It defaults to 4.

  • The maximum time to wait for the query-frontend to become ready before rejecting requests, configurable via the -query-frontend.not-running-timeout CLI flags on query-frontends. It now defaults to 2s.

  • Spread-minimizing token-related CLI flags: -ingester.ring.token-generation-strategy, -ingester.ring.spread-minimizing-zones and -ingester.ring.spread-minimizing-join-ring-in-order. You can read more about this feature in our blog post.

Important changes

In Grafana Mimir 2.12 the following behavior has changed:

  • Store-gateway now persists a sparse version of the index-header to disk on construction and loads sparse index-headers from disk instead of the whole index-header. This improves the speed at which index headers are lazy-loaded from disk by up to 90%. The added disk usage is in the order of 1-2%.

  • Alertmanager deprecated the v1 API. All v1 API endpoints now respond with a JSON deprecation notice and a status code of 410. All endpoints have a v2 equivalent. The list of endpoints is:

    • <alertmanager-web.external-url>/api/v1/alerts
    • <alertmanager-web.external-url>/api/v1/receivers
    • <alertmanager-web.external-url>/api/v1/silence/{id}
    • <alertmanager-web.external-url>/api/v1/silences
    • <alertmanager-web.external-url>/api/v1/status
  • Exemplar's label traceID has been changed to trace_id to be consistent with the OpenTelemetry standard.

  • Errors returned by ingesters now contain only gRPC status codes. Previously they contained both gRPC and HTTP status codes. To guarantee backwards compatibility when migrating from a version prior to 2.11, it's necessary to first migrate to version 2.11, and then to version 2.12. Otherwise, it might happen that during the migration, some ingester errors with HTTP status code 4xx won't be recognized, and the corresponding request will be repeated.

  • Responses with gRPC status codes are now reported as status_code labels in the cortex_request_duration_seconds and cortex_ingester_client_request_duration_seconds metrics.

  • Responses with HTTP 4xx status codes are now treated as errors and used in status_code label of request duration metric.

The default value of the following CLI flags have been changed:

  • -blocks-storage.tsdb.head-postings-for-matchers-cache-max-bytes from 10MB to 100MB.
  • -blocks-storage.tsdb.block-postings-for-matchers-cache-max-bytes from 10MB to 100MB.
  • -blocks-storage.bucket-store.tenant-sync-concurrency from 10 to 1.
  • -query-frontend.max-cache-freshness from 1m to 10m.
  • -distributor.write-requests-buffer-pooling-enabled from false to true.
  • -locks-storage.bucket-store.block-sync-concurrency from 20 to 4.
  • -memberlist.stream-timeout from 10s to 2s.
  • -server.report-grpc-codes-in-instrumentation-label-enabled from false to true.

The following deprecated configuration options are removed in Grafana Mimir 2.12:

  • The YAML setting frontend.cache_unaligned_requests.

The following configuration options are deprecated and will be removed in Grafana Mimir 2.14:

  • The CLI flag -ingester.limit-inflight-requests-using-grpc-method-limiter. It now defaults to true.

  • The CLI flag -ingester.return-only-grpc-errors. It now defaults to true. To guarantee backwards compatibility when migrating from a version prior to 2.11, it's necessary to first migrate to version 2.11, and then to version 2.12. Otherwise, it might happen that during the migration, some ingester errors with HTTP status code 4xx won't be recognized, and the corresponding request will be repeated.

  • The CLI flag -ingester.client.report-grpc-codes-in-instrumentation-label-enabled. It now defaults to true.

  • The CLI flag -distributor.limit-inflight-requests-using-grpc-method-limiter. It now defaults to true.

  • The CLI flag -distributor.enable-otlp-metadata-storage. It now defaults to true.

  • The CLI flag -querier.max-query-into-future.

The following metrics are removed or deprecated:

  • cortex_bucket_store_blocks_loaded_by_duration has been removed.
  • cortex_distributor_sample_delay_seconds has been deprecated and will be removed in Mimir 2.14.

Experimental features

Grafana Mimir 2.12 includes new features that are considered experimental and disabled by default. Use them with caution and report any issues you encounter:

  • The maximum number of tenant IDs that may be for a federated query can be configured via the -tenant-federation.max-tenants CLI flag on query-frontends. By default, it's 0, meaning that the limit is disabled.

  • Sharding of active series queries can be enabled via the -query-frontend.shard-active-series-queries CLI flag on query-frontends.

  • Timely head compaction can be enabled via the -blocks-storage.tsdb.timely-head-compaction-enabled on ingesters. If enabled, the head compaction happens when the min block range can no longer be appended, without requiring 1.5x the chunk range worth of data in the head.

  • Streaming of responses from querier to query-frontend can be enabled via the -querier.response-streaming-enabled CLI flag on queriers. This is currently supported only for responses from the /api/v1/cardinality/active_series endpoint.

  • The maximum response size for active series queries, in bytes, can be set via the -querier.active-series-results-max-size-bytes CLI flag on queriers.

  • Metric relabeling on a per-tenant basis can be forcefully disabled via the -distributor.metric-relabeling-enabled CLI flag on rulers. Metrics relabeling is enabled by default.

  • Query Queue Load Balancing by Query Component. Tenant query queues in the query-scheduler can now be split into subqueues by which query component is expected to be utilized to complete the query: ingesters, store-gateways, both, or uncategorized. Dequeuing queries for a given tenant will rotate through the query component subqueues via simple round-robin. In the event that the one of the query components (ingesters or store-gateways) experience a slowdown, queries only utilizing the the other query component can continue to be serviced. This feature is recommended to be enabled. The following CLI flags must be set to true in order to be in effect:

    • -query-frontend.additional-query-queue-dimensions-enabled on the query-frontend.
    • -query-scheduler.additional-query-queue-dimensions-enabled on the query-scheduler.
  • Owned series tracking in ingesters can be enabled via the -ingester.track-ingester-owned-series CLI flag. When enabled, ingesters will track the number of in-memory series that still map to the ingester based on the ring state. These counts are more reactive to ring and shard changes than in-memory series, and can be used when enforcing tenant series limits by enabling the -ingester.use-ingester-owned-series-for-limits CLI flag. This feature requires zone-aware replication to be enabled, and the replication factor to be equal to the number of zones.

Bug fixes

  • Distributor: fixed an issue where -distributor.metric-relabeling-enabled could cause distributors to panic.
  • Distributor: fix an issue where -distributor.metric-relabeling-enabled could cause distributors to write unsorted labels and corrupt blocks.
  • Ingester: errors encountered while iterating through chunks or samples in response to a query request aren't ignored anymore.
  • Compactor: out-of-order blocks aren't allowed to prevent timely compaction anymore.
  • Querier: requests to store-gateway when a query gets canceled aren't retried anymore.
  • Querier: status code 499 is now returned instead of 500 when a request to remote read endpoint gets canceled.
  • Querier: fixed an issue where -querier.max-fetched-series-per-query wasn't applied to /series endpoint in case series loaded from ingesters.
  • Querier: fixed an issue with the remote-read requests HTTP status code translations. Previously, remote-read had conflicting behaviours: when returning samples all internal errors were translated to HTTP 400, while when returning chunks all internal errors were translated to HTTP 500. With this fix, all validation errors will be translated into HTTP 400 errors, while all other errors will be translated into HTTP 500 errors.
  • Query-frontend: the cortex_query_frontend_queries_total metric incorrectly reported op="query" for any request which wasn't a range query. Now the op label value can be one of the following:
    • query: instant query
    • query_range: range query
    • cardinality: cardinality query
    • label_names_and_values: label names / values query
    • active_series: active series query
    • other: any other request
  • Ruler: fixed an issue where "failed to remotely evaluate query expression, will retry" messages were logged without context such as the trace ID and didn't appear in trace events.
  • Ruler: requests to remote querier when server's response exceeds its configured max payload size aren't retried anymore.
  • Ruler: fixed a regression that caused client errors to be tracked in cortex_ruler_write_requests_failed_total metric.
  • Ruler: fixed an issue with recording rule result being corruption due to an usage of a bad native histogram pointer.

Helm chart improvements

The Grafana Mimir and Grafana Enterprise Metrics Helm charts are released independently. Refer to the Grafana Mimir Helm chart documentation.

Changelog

2.12.0-rc.0

Grafana Mimir

  • [CHANGE] Alertmanager: Deprecates the v1 API. All v1 API endpoints now respond with a JSON deprecation notice and a status code of 410. All endpoints have a v2 equivalent. The list of endpoints is: #7103
    • <alertmanager-web.external-url>/api/v1/alerts
    • <alertmanager-web.external-url>/api/v1/receivers
    • <alertmanager-web.external-url>/api/v1/silence/{id}
    • <alertmanager-web.external-url>/api/v1/silences
    • <alertmanager-web.external-url>/api/v1/status
  • [CHANGE] Ingester: Increase default value of -blocks-storage.tsdb.head-postings-for-matchers-cache-max-bytes and -blocks-storage.tsdb.block-postings-for-matchers-cache-max-bytes to 100 MiB (previous default value was 10 MiB). #6764
  • [CHANGE] Validate tenant IDs according to documented behavior even when tenant federation is not enabled. Note that this will cause some previously accepted tenant IDs to be rejected such as those longer than 150 bytes or containing | characters. #6959
  • [CHANGE] Ruler: don't use backoff retry on remote evaluation in case of 4xx errors. #7004
  • [CHANGE] Server: responses with HTTP 4xx status codes are now treated as errors and used in status_code label of request duration metric. #7045
  • [CHANGE] Memberlist: change default for -memberlist.stream-timeout from 10s to 2s. #7076
  • [CHANGE] Memcached: remove legacy thanos_cache_memcached_* and thanos_memcached_* prefixed metrics. Instead, Memcached and Redis cache clients now emit thanos_cache_* prefixed metrics with a backend label. #7076
  • [CHANGE] Ruler: the following metrics, exposed when the ruler is configured to discover Alertmanager instances via service discovery, have been renamed: #7057
    • prometheus_sd_failed_configs renamed to cortex_prometheus_sd_failed_configs
    • prometheus_sd_discovered_targets renamed to cortex_prometheus_sd_discovered_targets
    • prometheus_sd_received_updates_total renamed to cortex_prometheus_sd_received_updates_total
    • prometheus_sd_updates_delayed_total renamed to cortex_prometheus_sd_updates_delayed_total
    • prometheus_sd_updates_total renamed to cortex_prometheus_sd_updates_total
    • prometheus_sd_refresh_failures_total renamed to cortex_prometheus_sd_refresh_failures_total
    • prometheus_sd_refresh_duration_seconds renamed to cortex_prometheus_sd_refresh_duration_seconds
  • [CHANGE] Query-frontend: the default value for -query-frontend.not-running-timeout has been changed from 0 (disabled) to 2s. The configuration option has also been moved from "experimental" to "advanced". #7126
  • [CHANGE] Store-gateway: to reduce disk contention on HDDs the default value for blocks-storage.bucket-store.tenant-sync-concurrency has been changed from 10 to 1 and the default value for blocks-storage.bucket-store.block-sync-concurrency has been changed from 20 to 4. #7136
  • [CHANGE] Store-gateway: Remove deprecated CLI flags -blocks-storage.bucket-store.index-header-lazy-loading-enabled and -blocks-storage.bucket-store.index-header-lazy-loading-idle-timeout and their corresponding YAML settings. Instead, use -blocks-storage.bucket-store.index-header.lazy-loading-enabled and -blocks-storage.bucket-store.index-header.lazy-loading-idle-timeout. #7521
  • [CHANGE] Store-gateway: Mark experimental CLI flag -blocks-storage.bucket-store.index-header.lazy-loading-concurrency and its corresponding YAML settings as advanced. #7521
  • [CHANGE] Store-gateway: Remove experimental CLI flag -blocks-storage.bucket-store.index-header.sparse-persistence-enabled since this is now the default behavior. #7535
  • [CHANGE] All: set -server.report-grpc-codes-in-instrumentation-label-enabled to true by default, which enables reporting gRPC status codes as status_code labels in the cortex_request_duration_seconds metric. #7144
  • [CHANGE] Distributor: report gRPC status codes as status_code labels in the cortex_ingester_client_request_duration_seconds metric by default. #7144
  • [CHANGE] Distributor: CLI flag -ingester.client.report-grpc-codes-in-instrumentation-label-enabled has been deprecated, and its default value is set to true. #7144
  • [CHANGE] Ingester: CLI flag -ingester.return-only-grpc-errors has been deprecated, and its default value is set to true. To ensure backwards compatibility, during a migration from a version prior to 2.11.0 to 2.12 or later, -ingester.return-only-grpc-errors should be set to false. Once all the components are migrated, the flag can be removed. #7151
  • [CHANGE] Ingester: the following CLI flags have been moved from "experimental" to "advanced": #7169
    • -ingester.ring.token-generation-strategy
    • -ingester.ring.spread-minimizing-zones
    • -ingester.ring.spread-minimizing-join-ring-in-order
  • [CHANGE] Query-frontend: the default value of the CLI flag -query-frontend.max-cache-freshness (and its respective YAML configuration parameter) has been changed from 1m to 10m. #7161
  • [CHANGE] Distributor: default the optimization -distributor.write-requests-buffer-pooling-enabled to true. #7165
  • [CHANGE] Tracing: Move query information to span attributes instead of span logs. #7046
  • [CHANGE] Distributor: the default value of circuit breaker's CLI flag -ingester.client.circuit-breaker.cooldown-period has been changed from 1m to 10s. #7310
  • [CHANGE] Store-gateway: remove cortex_bucket_store_blocks_loaded_by_duration. cortex_bucket_store_series_blocks_queried is better suited for detecting when compactors are not able to keep up with the number of blocks to compact. #7309
  • [CHANGE] Ingester, Distributor: the support for rejecting push requests received via gRPC before reading them into memory, enabled via -ingester.limit-inflight-requests-using-grpc-method-limiter and -distributor.limit-inflight-requests-using-grpc-method-limiter, is now stable and enabled by default. The configuration options have been deprecated and will be removed in Mimir 2.14. #7360
  • [CHANGE] Distributor: Change-distributor.enable-otlp-metadata-storage flag's default to true, and deprecate it. The flag will be removed in Mimir 2.14. #7366
  • [CHANGE] Store-gateway: Use a shorter TTL for cached items related to temporary blocks. #7407 #7534
  • [CHANGE] Standardise exemplar label as "trace_id". #7475
  • [CHANGE] The configuration option -querier.max-query-into-future has been deprecated and will be removed in Mimir 2.14. #7496
  • [CHANGE] Distributor: the metric cortex_distributor_sample_delay_seconds has been deprecated and will be removed in Mimir 2.14. #7516
  • [CHANGE] Query-frontend: The deprecated YAML setting frontend.cache_unaligned_requests has been moved to limits.cache_unaligned_requests. #7519
  • [FEATURE] Introduce -server.log-source-ips-full option to log all IPs from Forwarded, X-Real-IP, X-Forwarded-For headers. #7250
  • [FEATURE] Introduce -tenant-federation.max-tenants option to limit the max number of tenants allowed for requests when federation is enabled. #6959
  • [FEATURE] Cardinality API: added a new count_method parameter which enables counting active label values. #7085
  • [FEATURE] Querier / query-frontend: added -querier.promql-experimental-functions-enabled CLI flag (and respective YAML config option) to enable experimental PromQL functions. The experimental functions introduced are: mad_over_time(), sort_by_label() and sort_by_label_desc(). #7057
  • [FEATURE] Alertmanager API: added -alertmanager.grafana-alertmanager-compatibility-enabled CLI flag (and respective YAML config option) to enable an experimental API endpoints that support the migration of the Grafana Alertmanager. #7057
  • [FEATURE] Alertmanager: Added -alertmanager.utf8-strict-mode-enabled to control support for any UTF-8 character as part of Alertmanager configuration/API matchers and labels. It's default value is set to false. #6898
  • [FEATURE] Querier: added histogram_avg() function support to PromQL. #7293
  • [FEATURE] Ingester: added -blocks-storage.tsdb.timely-head-compaction flag, which enables more timely head compaction, and defaults to false. #7372
  • [FEATURE] Compactor: Added /compactor/tenants and /compactor/tenant/{tenant}/planned_jobs endpoints that provide functionality that was provided by tools/compaction-planner -- listing of planned compaction jobs based on tenants' bucket index. #7381
  • [FEATURE] Add experimental support for streaming response bodies from queriers to frontends via -querier.response-streaming-enabled. This is currently only supported for the /api/v1/cardinality/active_series endpoint. #7173
  • [FEATURE] Release: Added mimir distroless docker image. #7371
  • [FEATURE] Add support for the new grammar of {"metric_name", "l1"="val"} to promql and some of the exposition formats. #7475 #7541
  • [ENHANCEMENT] Distributor: Add a new metric cortex_distributor_otlp_requests_total to track the total number of OTLP requests. #7385
  • [ENHANCEMENT] Vault: add lifecycle manager for token used to authenticate to Vault. This ensures the client token is always valid. Includes a gauge (cortex_vault_token_lease_renewal_active) to check whether token renewal is active, and the counters cortex_vault_token_lease_renewal_success_total and cortex_vault_auth_success_total to see the total number of successful lease renewals / authentications. #7337
  • [ENHANCEMENT] Store-gateway: add no-compact details column on store-gateway tenants admin UI. #6848
  • [ENHANCEMENT] PromQL: ignore small errors for bucketQuantile #6766
  • [ENHANCEMENT] Distributor: improve efficiency of some errors #6785
  • [ENHANCEMENT] Ruler: exclude vector queries from being tracked in cortex_ruler_queries_zero_fetched_series_total. #6544
  • [ENHANCEMENT] Ruler: local storage backend now supports reading a rule group via /config/api/v1/rules/{namespace}/{groupName} configuration API endpoint. #6632
  • [ENHANCEMENT] Query-Frontend and Query-Scheduler: split tenant query request queues by query component with query-frontend.additional-query-queue-dimensions-enabled and query-scheduler.additional-query-queue-dimensions-enabled. #6772
  • [ENHANCEMENT] Distributor: support disabling metric relabel rules per-tenant via the flag -distributor.metric-relabeling-enabled or associated YAML. #6970
  • [ENHANCEMENT] Distributor: -distributor.remote-timeout is now accounted from the first ingester push request being sent. #6972
  • [ENHANCEMENT] Storage Provider: -<prefix>.s3.sts-endpoint sets a custom endpoint for AWS Security Token Service (AWS STS) in s3 storage provider. #6172
  • [ENHANCEMENT] Querier: add cortex_querier_queries_storage_type_total metric that indicates how many queries have executed for a source, ingesters or store-gateways. Add cortex_querier_query_storegateway_chunks_total metric to count the number of chunks fetched from a store gateway. #7099,#7145
  • [ENHANCEMENT] Query-frontend: add experimental support for sharding active series queries via -query-frontend.shard-active-series-queries. #6784
  • [ENHANCEMENT] Distributor: set -distributor.reusable-ingester-push-workers=2000 by default and mark feature as advanced. #7128
  • [ENHANCEMENT] All: set -server.grpc.num-workers=100 by default and mark feature as advanced. #7131
  • [ENHANCEMENT] Distributor: invalid metric name error message gets cleaned up to not include non-ascii strings. #7146
  • [ENHANCEMENT] Store-gateway: add source, level, and out_or_order to cortex_bucket_store_series_blocks_queried metric that indicates the number of blocks that were queried from store gateways by block metadata. #7112 #7262 #7267
  • [ENHANCEMENT] Compactor: After updating bucket-index, compactor now also computes estimated number of compaction jobs based on current bucket-index, and reports the result in cortex_bucket_index_estimated_compaction_jobs metric. If computation of jobs fails, cortex_bucket_index_estimated_compaction_jobs_errors_total is updated instead. #7299
  • [ENHANCEMENT] Mimir: Integrate profiling into tracing instrumentation. #7363
  • [ENHANCEMENT] Alertmanager: Adds metric cortex_alertmanager_notifications_suppressed_total that counts the total number of notifications suppressed for being silenced, inhibited, outside of active time intervals or within muted time intervals. #7384
  • [ENHANCEMENT] Query-scheduler: added more buckets to cortex_query_scheduler_queue_duration_seconds histogram metric, in order to better track queries staying in the queue for longer than 10s. #7470
  • [ENHANCEMENT] A type label is added to prometheus_tsdb_head_out_of_order_samples_appended_total metric. #7475
  • [ENHANCEMENT] Distributor: Optimize OTLP endpoint. #7475
  • [ENHANCEMENT] API: Use github.com/klauspost/compress for faster gzip and deflate compression of API responses. #7475
  • [ENHANCEMENT] Ingester: Limiting on owned series (-ingester.use-ingester-owned-series-for-limits) now prevents discards in cases where a tenant is sharded across all ingesters (or shuffle sharding is disabled) and the ingester count increases. #7411
  • [ENHANCEMENT] Block upload: include converted timestamps in the error message if block is from the future. #7538
  • [ENHANCEMENT] Query-frontend: Introduce -query-frontend.active-series-write-timeout to allow configuring the server-side write timeout for active series requests. #7553 #7569
  • [BUGFIX] Ingester: don't ignore errors encountered while iterating through chunks or samples in response to a query request. #6451
  • [BUGFIX] Fix issue where queries can fail or omit OOO samples if OOO head compaction occurs between creating a querier and reading chunks #6766
  • [BUGFIX] Fix issue where concatenatingChunkIterator can obscure errors #6766
  • [BUGFIX] Fix panic during tsdb Commit #6766
  • [BUGFIX] tsdb/head: wlog exemplars after samples #6766
  • [BUGFIX] Ruler: fix issue where "failed to remotely evaluate query expression, will retry" messages are logged without context such as the trace ID and do not appear in trace events. #6789
  • [BUGFIX] Ruler: do not retry requests to remote querier when server's response exceeds its configured max payload size. #7216
  • [BUGFIX] Querier: fix issue where spans in query request traces were not nested correctly. #6893
  • [BUGFIX] Fix issue where all incoming HTTP requests have duplicate trace spans. #6920
  • [BUGFIX] Querier: do not retry requests to store-gateway when a query gets canceled. #6934
  • [BUGFIX] Querier: return 499 status code instead of 500 when a request to remote read endpoint gets canceled. #6934
  • [BUGFIX] Querier: fix issue where -querier.max-fetched-series-per-query is not applied to /series endpoint if the series are loaded from ingesters. #7055
  • [BUGFIX] Distributor: fix issue where -distributor.metric-relabeling-enabled may cause distributors to panic #7176
  • [BUGFIX] Distributor: fix issue where -distributor.metric-relabeling-enabled may cause distributors to write unsorted labels and corrupt blocks #7326
  • [BUGFIX] Query-frontend: the cortex_query_frontend_queries_total report incorrectly reported op="query" for any request which wasn't a range query. Now the op label value can be one of the following: #7207
    • query: instant query
    • query_range: range query
    • cardinality: cardinality query
    • label_names_and_values: label names / values query
    • active_series: active series query
    • other: any other request
  • [BUGFIX] Fix performance regression introduced in Mimir 2.11.0 when uploading blocks to AWS S3. #7240
  • [BUGFIX] Query-frontend: fix race condition when sharding active series is enabled (see above) and response is compressed with snappy. #7290
  • [BUGFIX] Query-frontend: "query stats" log unsuccessful replies from downstream as "failed". #7296
  • [BUGFIX] Packaging: remove reload from systemd file as mimir does not take into account SIGHUP. #7345
  • [BUGFIX] Compactor: do not allow out-of-order blocks to prevent timely compaction. #7342
  • [BUGFIX] Update google.golang.org/grpc to resolve occasional issues with gRPC server closing its side of connection before it was drained by the client. #7380
  • [BUGFIX] Query-frontend: abort response streaming for active_series requests when the request context is canceled. #7378
  • [BUGFIX] Compactor: improve compaction of sporadic blocks. #7329
  • [BUGFIX] Ruler: fix regression that caused client errors to be tracked in cortex_ruler_write_requests_failed_total metric. #7472
  • [BUGFIX] promql: Fix Range selectors with an @ modifier are wrongly scoped in range queries. #7475
  • [BUGFIX] Fix metadata API using wrong JSON field names. #7475
  • [BUGFIX] Ruler: fix native histogram recording rule result corruption. #7552

Mixin

  • [CHANGE] The job label matcher for distributor and gateway have been extended to include any deployment matching distributor.* and cortex-gw.* respectively. This change allows to match custom and multi-zone distributor and gateway deployments too. #6817
  • [ENHANCEMENT] Dashboards: Add panels for alertmanager activity of a tenant #6826
  • [ENHANCEMENT] Dashboards: Add graphs to "Slow Queries" dashboard. #6880
  • [ENHANCEMENT] Dashboards: Update all deprecated "graph" panels to "timeseries" panels. #6864 #7413 #7457
  • [ENHANCEMENT] Dashboards: Make most columns in "Slow Queries" sortable. #7000
  • [ENHANCEMENT] Dashboards: Render graph panels at full resolution as opposed to at half resolution. #7027
  • [ENHANCEMENT] Dashboards: show query-scheduler queue length on "Reads" and "Remote Ruler Reads" dashboards. #7088
  • [ENHANCEMENT] Dashboards: Add estimated number of compaction jobs to "Compactor", "Tenants" and "Top tenants" dashboards. #7449 #7481
  • [ENHANCEMENT] Recording rules: add native histogram recording rules to cortex_request_duration_seconds. #7528
  • [ENHANCEMENT] Dashboards: Add total owned series, and per-ingester in-memory and owned series to "Tenants" dashboard. #7511
  • [BUGFIX] Dashboards: drop step parameter from targets as it is not supported. #7157
  • [BUGFIX] Recording rules: drop rules for metrics removed in 2.0: cortex_memcache_request_duration_seconds and cortex_cache_request_duration_seconds. #7514

Jsonnet

  • [CHANGE] Distributor: Increase JAEGER_REPORTER_MAX_QUEUE_SIZE from the default (100) to 1000, to avoid dropping tracing spans. #7259
  • [CHANGE] Querier: Increase JAEGER_REPORTER_MAX_QUEUE_SIZE from 1000 to 5000, to avoid dropping tracing spans. #6764
  • [CHANGE] rollout-operator: remove default CPU limit. #7066
  • [CHANGE] Store-gateway: Increase JAEGER_REPORTER_MAX_QUEUE_SIZE from the default (100) to 1000, to avoid dropping tracing spans. #7068
  • [CHANGE] Query-frontend, ingester, ruler, backend and write instances: Increase JAEGER_REPORTER_MAX_QUEUE_SIZE from the default (100), to avoid dropping tracing spans. #7086
  • [CHANGE] Ring: relaxed the hash ring heartbeat period and timeout for distributor, ingester, store-gateway and compactor: #6860
    • -distributor.ring.heartbeat-period set to 1m
    • -distributor.ring.heartbeat-timeout set to 4m
    • -ingester.ring.heartbeat-period set to 2m
    • -store-gateway.sharding-ring.heartbeat-period set to 1m
    • -store-gateway.sharding-ring.heartbeat-timeout set to 4m
    • -compactor.ring.heartbeat-period set to 1m
    • -compactor.ring.heartbeat-timeout set to 4m
  • [CHANGE] Ruler-querier: the topology spread constrain max skew is now configured through the configuration option ruler_querier_topology_spread_max_skew instead of querier_topology_spread_max_skew. #7204
  • [CHANGE] Distributor: -server.grpc.keepalive.max-connection-age lowered from 2m to 60s and configured -shutdown-delay=90s and termination grace period to 100 seconds in order to reduce the chances of failed gRPC write requests when distributors gracefully shutdown. #7361
  • [FEATURE] Added support for the following root-level settings to configure the list of matchers to apply to node affinity: #6782 #6829
    • alertmanager_node_affinity_matchers
    • compactor_node_affinity_matchers
    • continuous_test_node_affinity_matchers
    • distributor_node_affinity_matchers
    • ingester_node_affinity_matchers
    • ingester_zone_a_node_affinity_matchers
    • ingester_zone_b_node_affinity_matchers
    • ingester_zone_c_node_affinity_matchers
    • mimir_backend_node_affinity_matchers
    • mimir_backend_zone_a_node_affinity_matchers
    • mimir_backend_zone_b_node_affinity_matchers
    • mimir_backend_zone_c_node_affinity_matchers
    • mimir_read_node_affinity_matchers
    • mimir_write_node_affinity_matchers
    • mimir_write_zone_a_node_affinity_matchers
    • mimir_write_zone_b_node_affinity_matchers
    • mimir_write_zone_c_node_affinity_matchers
    • overrides_exporter_node_affinity_matchers
    • querier_node_affinity_matchers
    • query_frontend_node_affinity_matchers
    • query_scheduler_node_affinity_matchers
    • rollout_operator_node_affinity_matchers
    • ruler_node_affinity_matchers
    • ruler_node_affinity_matchers
    • ruler_querier_node_affinity_matchers
    • ruler_query_frontend_node_affinity_matchers
    • ruler_query_scheduler_node_affinity_matchers
    • store_gateway_node_affinity_matchers
    • store_gateway_node_affinity_matchers
    • store_gateway_zone_a_node_affinity_matchers
    • store_gateway_zone_b_node_affinity_matchers
    • store_gateway_zone_c_node_affinity_matchers
  • [FEATURE] Ingester: Allow automated zone-by-zone downscaling, that can be enabled via the ingester_automated_downscale_enabled flag. It is disabled by default. #6850
  • [ENHANCEMENT] Alerts: Add MimirStoreGatewayTooManyFailedOperations warning alert that triggers when Mimir store-gateway report error when interacting with the object storage. #6831
  • [ENHANCEMENT] Querier HPA: improved scaling metric and scaling policies, in order to scale up and down more gradually. #6971
  • [ENHANCEMENT] Rollout-operator: upgraded to v0.13.0. #7469
  • [ENHANCEMENT] Rollout-operator: add tracing configuration to rollout-operator container (when tracing is enabled and configured). #7469
  • [ENHANCEMENT] Query-frontend: configured -shutdown-delay, -server.grpc.keepalive.max-connection-age and termination grace period to reduce the likelihood of queries hitting terminated query-frontends. #7129
  • [ENHANCEMENT] Autoscaling: add support for KEDA's ignoreNullValues option for Prometheus scaler. #7471
  • [BUGFIX] Update memcached-exporter to 0.14.1 due to CVE-2023-39325. #6861

Mimirtool

  • [FEATURE] Add command migrate-utf8 to migrate Alertmanager configurations for Alertmanager versions 0.27.0 and later. #7383
  • [ENHANCEMENT] Add template render command to render locally a template. #7325
  • [ENHANCEMENT] Add --extra-headers option to mimirtool rules command to add extra headers to requests for auth. #7141
  • [ENHANCEMENT] Analyze Prometheus: set tenant header. #6737
  • [ENHANCEMENT] Add argument --output-dir to mimirtool alertmanager get where the config and templates will be written to and can be loaded via mimirtool alertmanager load #6760
  • [BUGFIX] Analyze rule-file: .metricsUsed field wasn't populated. #6953

Mimir Continuous Test

  • [ENHANCEMENT] Include comparison of all expected and actual values when any float sample does not match. #6756

Query-tee

  • [BUGFIX] Fix issue where Host HTTP header was not being correctly changed for the proxy targets. #7386
  • [ENHANCEMENT] Allow using the value of X-Scope-OrgID for basic auth username in the forwarded request if URL username is set as __REQUEST_HEADER_X_SCOPE_ORGID__. #7452

Documentation

  • [CHANGE] No longer mark OTLP distributor endpoint as experimental. #7348
  • [ENHANCEMENT] Added runbook for KubePersistentVolumeFillingUp alert. #7297
  • [ENHANCEMENT] Add Grafana Cloud recommendations to OTLP documentation. #7375
  • [BUGFIX] Fixed typo on single zone->zone aware replication Helm page. #7327

Tools

  • [CHANGE] copyblocks: The flags for copyblocks have been changed to align more closely with other tools. #6607
  • [CHANGE] undelete-blocks: undelete-blocks-gcs has been removed and replaced with undelete-blocks, which supports recovering deleted blocks in versioned buckets from ABS, GCS, and S3-compatible object storage. #6607
  • [FEATURE] copyprefix: Add tool to copy objects between prefixes. Supports ABS, GCS, and S3-compatible object storage. #6607

All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.11.0...mimir-2.12.0-rc.0

mimir-2.11.0

4 months ago

This release contains 532 PRs from 55 authors, including new contributors Benjamin, Dominik Kepinski, Jonathan Donzallaz, Juraj Michálek, Kai.Ke, Ludovic Terrier, Luke, Maciej Lech, Matthew Penner, Michael Potter, Mihai Țimbota-Belin, Rasmus Werner Salling, Ying WANG, chencs, fayzal-g, kalle (jag), sarthaktyagi-505, whoami. Thank you!

Grafana Mimir version 2.11.0 release notes

Grafana Labs is excited to announce version 2.11 of Grafana Mimir.

The highlights that follow include the top features, enhancements, and bugfixes in this release. For the complete list of changes, see the changelog.

Features and enhancements

  • Sampled logging of errors in the ingester. A high-traffic Mimir cluster can occasionally become bogged down logging high volumes of repeated errors. You can now reduce the amount of errors outputted to logs by setting a sample rate via the -ingester.error-sample-rate CLI flag.
  • Add total request size instance limit for ingesters. This limit protects the ingesters against requests that together may cause an OOM. Enable this feature by setting the -ingester.instance-limits.max-inflight-push-requests-bytes CLI flag in combination with the -ingester.limit-inflight-requests-using-grpc-method-limiter CLI flag.
  • Reduce the resolution of incoming native histograms samples if the incoming sample has too many buckets compared to -validation.max-native-histogram-buckets. This is enabled by default but can be turned off by setting the -validation.reduce-native-histogram-over-max-buckets CLI flag to false.
  • Improved query-scheduler performance under load. This is particularly apparent for clusters with large numbers of queriers.
  • Ingester to querier chunks streaming reduces the memory utilization of queriers and reduces the likelihood of OOMs.
  • Ingester query request minimization reduces the number of query requests to ingesters, improving performance and resource utilization for both ingesters and queriers.

Experimental features

Grafana Mimir 2.11 includes new features that are considered experimental and disabled by default. Please use them with caution and report any issue you encounter:

  • Block specified queries on a per-tenant basis. This is configured via the blocked_queries limit. See the docs for more information.
  • Store metadata when ingesting metrics via OTLP. This makes metric description and type available when ingesting metrics via OTLP. You can enable this feature by setting the CLI flag -distributor.enable-otlp-metadata-storage to true.
  • Reject gRPC push requests that the ingester/distributor is unable to accept before reading them into memory. You can enable this feature by using the -ingester.limit-inflight-requests-using-grpc-method-limiter and/or the -distributor.limit-inflight-requests-using-grpc-method-limiter CLI flags for the ingester and/or the distributor, respectively.
  • Customize the memcached client write and read buffer size. The buffer allocated for each memcached connection can be configured via the following CLI flags:
    • For the blocks storage:
      • -blocks-storage.bucket-store.chunks-cache.memcached.read-buffer-size-bytes
      • -blocks-storage.bucket-store.chunks-cache.memcached.write-buffer-size-bytes
      • -blocks-storage.bucket-store.index-cache.memcached.read-buffer-size-bytes
      • -blocks-storage.bucket-store.index-cache.memcached.write-buffer-size-bytes
      • -blocks-storage.bucket-store.metadata-cache.memcached.read-buffer-size-bytes
      • -blocks-storage.bucket-store.metadata-cache.memcached.write-buffer-size-bytes
    • For the query frontend:
      • -query-frontend.results-cache.memcached.read-buffer-size-bytes
      • -query-frontend.results-cache.memcached.write-buffer-size-bytes
    • For the ruler storage:
      • -ruler-storage.cache.memcached.read-buffer-size-bytes
      • -ruler-storage.cache.memcached.write-buffer-size-bytes
  • Configure the number of long-living workers used to process gRPC requests. This can decrease CPU usage by reducing the number of stack allocations. Configure this feature by using the -server.grpc.num-workers CLI flag.
  • Enforce a limit in bytes on the PostingsForMatchers cache used by ingesters. This limit can be configured via the -blocks-storage.tsdb.head-postings-for-matchers-cache-max-bytes and -blocks-storage.tsdb.block-postings-for-matchers-cache-max-bytes CLI flags.
  • Pre-allocate the pool of workers in the distributor that are used to send push requests to ingesters. This can decrease CPU usage by reducing the number of stack allocations. You can enable this feature by using the -distributor.reusable-ingester-push-worker flag.
  • Include a Retry-After header in recoverable error responses from the distributor. This can protect your Mimir cluster from clients including Prometheus that default to retrying very quickly. Enable this feature by setting the -distributor.retry-after-header.enabled CLI flag.

Helm chart improvements

The Grafana Mimir and Grafana Enterprise Metrics Helm chart is now released independently. See the Grafana Mimir Helm chart documentation.

Important changes

In Grafana Mimir 2.11 the following behavior has changed:

  • The utilization-based read path limiter now operates on Go heap size instead of RSS from the Linux proc file system.

The following configuration options had been previously deprecated and are removed in Grafana Mimir 2.11:

  • The CLI flag -querier.iterators.
  • The CLI flag -query.batch-iterators.
  • The CLI flag -blocks-storage.bucket-store.bucket-index.enabled.
  • The CLI flag -blocks-storage.bucket-store.chunk-pool-min-bucket-size-bytes.
  • The CLI flag -blocks-storage.bucket-store.chunk-pool-max-bucket-size-bytes.
  • The CLI flag -blocks-storage.bucket-store.max-chunk-pool-bytes.

The following configuration options are deprecated and will be removed in Grafana Mimir 2.13:

  • The CLI flag -log.buffered; this is now the default behavior.

The following metrics are removed:

  • cortex_query_frontend_workers_enqueued_requests_total; use cortex_query_frontend_enqueue_duration_seconds_count instead.

The following configuration option defaults were changed:

  • The CLI flag -blocks-storage.bucket-store.index-header.sparse-persistence-enabled now defaults to true.
  • The default value for the CLI flag -blocks-storage.bucket-store.index-header.lazy-loading-concurrency was changed from 0 to 4.
  • The default value for the CLI flag -blocks-storage.tsdb.series-hash-cache-max-size-bytes was changed from 1GB to 350MB.
  • The default value for the CLI flag -blocks-storage.tsdb.early-head-compaction-min-estimated-series-reduction-percentage was changed from 10 to 15.

Bug fixes

  • Ingester: Respect context cancelation during query execution. PR 6085
  • Distributor: Return 529 when ingestion rate limit is hit and the distributor.service_overload_status_code_on_rate_limit_enabled flag is active. PR 6549
  • Query-scheduler: Prevent accumulation of stale querier connections. PR 6100
  • Packaging: Fix preremove script preventing upgrades on RHEL based OS. PR 6067

Changelog

2.11.0

Grafana Mimir

  • [CHANGE] The following deprecated configurations have been removed: #6673 #6779 #6808 #6814
    • -querier.iterators
    • -querier.batch-iterators
    • -blocks-storage.bucket-store.max-chunk-pool-bytes
    • -blocks-storage.bucket-store.chunk-pool-min-bucket-size-bytes
    • -blocks-storage.bucket-store.chunk-pool-max-bucket-size-bytes
    • -blocks-storage.bucket-store.bucket-index.enabled
  • [CHANGE] Querier: Split worker GRPC config into separate client configs for the frontend and scheduler to allow TLS to be configured correctly when specifying the tls_server_name. The GRPC config specified under -querier.frontend-client.* will no longer apply to the scheduler client, and will need to be set explicitly under -querier.scheduler-client.*. #6445 #6573
  • [CHANGE] Store-gateway: enable sparse index headers by default. Sparse index headers reduce the time to load an index header up to 90%. #6005
  • [CHANGE] Store-gateway: lazy-loading concurrency limit default value is now 4. #6004
  • [CHANGE] General: enabled -log.buffered by default. The -log.buffered has been deprecated and will be removed in Mimir 2.13. #6131
  • [CHANGE] Ingester: changed default -blocks-storage.tsdb.series-hash-cache-max-size-bytes setting from 1GB to 350MB. The new default cache size is enough to store the hashes for all series in a ingester, assuming up to 2M in-memory series per ingester and using the default 13h retention period for local TSDB blocks in the ingesters. #6130
  • [CHANGE] Query-frontend: removed cortex_query_frontend_workers_enqueued_requests_total. Use cortex_query_frontend_enqueue_duration_seconds_count instead. #6121
  • [CHANGE] Ingester / querier: enable ingester to querier chunks streaming by default and mark it as stable. #6174
  • [CHANGE] Ingester / querier: enable ingester query request minimisation by default and mark it as stable. #6174
  • [CHANGE] Ingester: changed the default value for the experimental configuration parameter -blocks-storage.tsdb.early-head-compaction-min-estimated-series-reduction-percentage from 10 to 15. #6186
  • [CHANGE] Ingester: /ingester/push HTTP endpoint has been removed. This endpoint was added for testing and troubleshooting, but was never documented or used for anything. #6299
  • [CHANGE] Experimental setting -log.rate-limit-logs-per-second-burst renamed to -log.rate-limit-logs-burst-size. #6230
  • [CHANGE] Distributor: instead of errors with HTTP status codes, Push() now returns errors with gRPC codes: #6377
    • http.StatusAccepted (202) code is replaced with codes.AlreadyExists.
    • http.BadRequest (400) code is replaced with codes.FailedPrecondition.
    • http.StatusTooManyRequests (429) and the non-standard 529 (The service is overloaded) codes are replaced with codes.ResourceExhausted.
  • [CHANGE] Ingester: by setting the newly introduced experimental CLI flag -ingester.return-only-grpc-errors to true, ingester will return only gRPC errors. This feature changes the following status codes: #6443 #6680 #6723
    • http.StatusBadRequest (400) is replaced with codes.FailedPrecondition on the write path.
    • http.StatusServiceUnavailable (503) is replaced with codes.Internal on the write path, and with codes.ResourceExhausted on the read path.
    • codes.Unknown is replaced with codes.Internal on both write and read path.
  • [CHANGE] Upgrade Node.js to v20. #6540
  • [CHANGE] Querier: cortex_querier_blocks_consistency_checks_failed_total is now incremented when a block couldn't be queried from any attempted store-gateway as opposed to incremented after each attempt. Also cortex_querier_blocks_consistency_checks_total is incremented once per query as opposed to once per attempt (with 3 attempts). #6590
  • [CHANGE] Ingester: Modify utilization based read path limiter to base memory usage on Go heap size. #6584
  • [FEATURE] Distributor: added option -distributor.retry-after-header.enabled to include the Retry-After header in recoverable error responses. #6608
  • [FEATURE] Query-frontend: add experimental support for query blocking. Queries are blocked on a per-tenant basis and is configured via the limit blocked_queries. #5609
  • [FEATURE] Vault: Added support for new Vault authentication methods: AppRole, Kubernetes, UserPass and Token. #6143
  • [FEATURE] Add experimental endpoint /api/v1/cardinality/active_series to return the set of active series for a given selector. #6536 #6619 #6651 #6667
  • [FEATURE] Added -<prefix>.s3.part-size flag to configure the S3 minimum file size in bytes used for multipart uploads. #6592
  • [FEATURE] Add the experimental -<prefix>.s3.send-content-md5 flag (defaults to false) to configure S3 Put Object requests to send a Content-MD5 header. Setting this flag is not recommended unless your object storage does not support checksums. #6622
  • [FEATURE] Distributor: add an experimental flag -distributor.reusable-ingester-push-worker that can be used to pre-allocate a pool of workers to be used to send push requests to the ingesters. #6660
  • [FEATURE] Distributor: Support enabling of automatically generated name suffixes for metrics ingested via OTLP, through the flag -distributor.otel-metric-suffixes-enabled. #6542
  • [ENHANCEMENT] Query-frontend: don't treat cancel as an error. #4648
  • [ENHANCEMENT] Ingester: exported summary cortex_ingester_inflight_push_requests_summary tracking total number of inflight requests in percentile buckets. #5845
  • [ENHANCEMENT] Query-scheduler: add cortex_query_scheduler_enqueue_duration_seconds metric that records the time taken to enqueue or reject a query request. #5879
  • [ENHANCEMENT] Query-frontend: add cortex_query_frontend_enqueue_duration_seconds metric that records the time taken to enqueue or reject a query request. When query-scheduler is in use, the metric has the scheduler_address label to differentiate the enqueue duration by query-scheduler backend. #5879 #6087 #6120
  • [ENHANCEMENT] Store-gateway: add metric cortex_bucket_store_blocks_loaded_by_duration for counting the loaded number of blocks based on their duration. #6074 #6129
  • [ENHANCEMENT] Expose /sync/mutex/wait/total:seconds Go runtime metric as go_sync_mutex_wait_total_seconds_total from all components. #5879
  • [ENHANCEMENT] Query-scheduler: improve latency with many concurrent queriers. #5880
  • [ENHANCEMENT] Ruler: add new per-tenant cortex_ruler_queries_zero_fetched_series_total metric to track rules that fetched no series. #5925
  • [ENHANCEMENT] Implement support for limit, limit_per_metric and metric parameters for <Prometheus HTTP prefix>/api/v1/metadata endpoint. #5890
  • [ENHANCEMENT] Distributor: add experimental support for storing metadata when ingesting metrics via OTLP. This makes metrics description and type available when ingesting metrics via OTLP. Enable with -distributor.enable-otlp-metadata-storage=true. #5693 #6035 #6254
  • [ENHANCEMENT] Ingester: added support for sampling errors, which can be enabled by setting -ingester.error-sample-rate. This way each error will be logged once in the configured number of times. All the discarded samples will still be tracked by the cortex_discarded_samples_total metric. #5584 #6014
  • [ENHANCEMENT] Ruler: Fetch secrets used to configure TLS on the Alertmanager client from Vault when -vault.enabled is true. #5239
  • [ENHANCEMENT] Query-frontend: added query-sharding support for group by aggregation queries. #6024
  • [ENHANCEMENT] Fetch secrets used to configure server-side TLS from Vault when -vault.enabled is true. #6052.
  • [ENHANCEMENT] Packaging: add logrotate config file. #6142
  • [ENHANCEMENT] Ingester: add the experimental configuration options -blocks-storage.tsdb.head-postings-for-matchers-cache-max-bytes and -blocks-storage.tsdb.block-postings-for-matchers-cache-max-bytes to enforce a limit in bytes on the PostingsForMatchers() cache used by ingesters (the cache limit is per TSDB head and block basis, not a global one). The experimental configuration options -blocks-storage.tsdb.head-postings-for-matchers-cache-size and -blocks-storage.tsdb.block-postings-for-matchers-cache-size have been deprecated. #6151
  • [ENHANCEMENT] Ingester: use the PostingsForMatchers() in-memory cache for label values queries with matchers too. #6151
  • [ENHANCEMENT] Ingester / store-gateway: optimized regex matchers. #6168 #6250
  • [ENHANCEMENT] Distributor: Include ingester IDs in circuit breaker related metrics and logs. #6206
  • [ENHANCEMENT] Querier: improve errors and logging when streaming chunks from ingesters and store-gateways. #6194 #6309
  • [ENHANCEMENT] Querier: Add cortex_querier_federation_exemplar_tenants_queried and cortex_querier_federation_tenants_queried metrics to track the number of tenants queried by multi-tenant queries. #6374 #6409
  • [ENHANCEMENT] All: added an experimental -server.grpc.num-workers flag that configures the number of long-living workers used to process gRPC requests. This could decrease the CPU usage by reducing the number of stack allocations. #6311
  • [ENHANCEMENT] All: improved IPv6 support by using the proper host:port formatting. #6311
  • [ENHANCEMENT] Querier: always return error encountered during chunks streaming, rather than the stream has already been exhausted. #6345 #6433
  • [ENHANCEMENT] Query-frontend: add instance_enable_ipv6 to support IPv6. #6111
  • [ENHANCEMENT] Store-gateway: return same detailed error messages as queriers when chunks or series limits are reached. #6347
  • [ENHANCEMENT] Querier: reduce memory consumed for queries that hit store-gateways. #6348
  • [ENHANCEMENT] Ruler: include corresponding trace ID with log messages associated with rule evaluation. #6379 #6520
  • [ENHANCEMENT] Querier: clarify log messages and span events emitted while querying ingesters, and include both ingester name and address when relevant. #6381
  • [ENHANCEMENT] Memcached: introduce new experimental configuration parameters -<prefix>.memcached.write-buffer-size-bytes -<prefix>.memcached.read-buffer-size-bytes to customise the memcached client write and read buffer size (the buffer is allocated for each memcached connection). #6468
  • [ENHANCEMENT] Ingester, Distributor: added experimental support for rejecting push requests received via gRPC before reading them into memory, if ingester or distributor is unable to accept the request. This is activated by using -ingester.limit-inflight-requests-using-grpc-method-limiter for ingester, and -distributor.limit-inflight-requests-using-grpc-method-limiter for distributor. #5976 #6300
  • [ENHANCEMENT] Add capability in store-gateways to accept number of tokens through config. -store-gateway.sharding-ring.num-tokens, default-value=512 #4863
  • [ENHANCEMENT] Query-frontend: return warnings generated during query evaluation. #6391
  • [ENHANCEMENT] Server: Add the option -server.http-read-header-timeout to enable specifying a timeout for reading HTTP request headers. It defaults to 0, in which case reading of headers can take up to -server.http-read-timeout, leaving no time for reading body, if there's any. #6517
  • [ENHANCEMENT] Add connection-string option, -<prefix>.azure.connection-string, for Azure Blob Storage. #6487
  • [ENHANCEMENT] Ingester: Add -ingester.instance-limits.max-inflight-push-requests-bytes. This limit protects the ingester against requests that together may cause an OOM. #6492
  • [ENHANCEMENT] Ingester: add new per-tenant cortex_ingester_local_limits metric to expose the calculated local per-tenant limits seen at each ingester. Exports the local per-tenant series limit with label {limit="max_global_series_per_user"} #6403
  • [ENHANCEMENT] Query-frontend: added "queue_time_seconds" field to "query stats" log. This is total time that query and subqueries spent in the queue, before queriers picked it up. #6537
  • [ENHANCEMENT] Server: Add -server.report-grpc-codes-in-instrumentation-label-enabled CLI flag to specify whether gRPC status codes should be used in status_code label of cortex_request_duration_seconds metric. It defaults to false, meaning that successful and erroneous gRPC status codes are represented with success and error respectively. #6562
  • [ENHANCEMENT] Server: Add -ingester.client.report-grpc-codes-in-instrumentation-label-enabled CLI flag to specify whether gRPC status codes should be used in status_code label of cortex_ingester_client_request_duration_seconds metric. It defaults to false, meaning that successful and erroneous gRPC status codes are represented with 2xx and error respectively. #6562
  • [ENHANCEMENT] Server: Add -server.http-log-closed-connections-without-response-enabled option to log details about connections to HTTP server that were closed before any data was sent back. This can happen if client doesn't manage to send complete HTTP headers before timeout. #6612
  • [ENHANCEMENT] Query-frontend: include length of query, time since the earliest and latest points of a query, time since the earliest and latest points of a query, cached/uncached bytes in "query stats" logs. Time parameters (start/end/time) are always formatted as RFC3339 now. #6473 #6477 #6709 #6710
  • [ENHANCEMENT] Distributor: added support for reducing the resolution of native histogram samples upon ingestion if the sample has too many buckets compared to -validation.max-native-histogram-buckets. This is enabled by default and can be turned off by setting -validation.reduce-native-histogram-over-max-buckets to false. #6535
  • [ENHANCEMENT] Query-frontend: optionally wait for the frontend to complete startup if requests are received while the frontend is still starting. Disabled by default, set -query-frontend.not-running-timeout to a non-zero value to enable. #6621
  • [ENHANCEMENT] Distributor: Include source IPs in OTLP push handler logs. #6652
  • [ENHANCEMENT] Query-frontend: return clearer error message when a query request is received while shutting down. #6675
  • [ENHANCEMENT] Querier: return clearer error message when a query request is cancelled by the caller. #6697
  • [BUGFIX] Distributor: return server overload error in the event of exceeding the ingestion rate limit. #6549
  • [BUGFIX] Ring: Ensure network addresses used for component hash rings are formatted correctly when using IPv6. #6068
  • [BUGFIX] Query-scheduler: don't retain connections from queriers that have shut down, leading to gradually increasing enqueue latency over time. #6100 #6145
  • [BUGFIX] Ingester: prevent query logic from continuing to execute after queries are canceled. #6085
  • [BUGFIX] Ensure correct nesting of children of the querier.Select tracing span. #6085
  • [BUGFIX] Packaging: fix preremove script preventing upgrades on RHEL based OS. #6067
  • [BUGFIX] Querier: return actual error rather than attempted to read series at index XXX from stream, but the stream has already been exhausted (or even no error at all) when streaming chunks from ingesters or store-gateways is enabled and an error occurs while streaming chunks. #6346
  • [BUGFIX] Querier: reduce log volume when querying ingesters with zone-awareness enabled and one or more instances in a single zone unavailable. #6381
  • [BUGFIX] Querier: don't try to query further ingesters if ingester query request minimization is enabled and a query limit is reached as a result of the responses from the initial set of ingesters. #6402
  • [BUGFIX] Ingester: Don't cache context cancellation error when querying. #6446
  • [BUGFIX] Ingester: don't ignore errors encountered while iterating through chunks or samples in response to a query request. #6469
  • [BUGFIX] All: fix issue where traces for some inter-component gRPC calls would incorrectly show the call as failing due to cancellation. #6470
  • [BUGFIX] Querier: correctly mark streaming requests to ingesters or store-gateways as successful, not cancelled, in metrics and traces. #6471 #6505
  • [BUGFIX] Querier: fix issue where queries fail with "context canceled" error when an ingester or store-gateway fails healthcheck while the query is in progress. #6550
  • [BUGFIX] Tracing: When creating an OpenTelemetry tracing span, add it to the context for later retrieval. #6614
  • [BUGFIX] Querier: always report query results to query-frontends, even when cancelled, to ensure query-frontends don't wait for results that will otherwise never arrive. #6703
  • [BUGFIX] Querier: attempt to query ingesters in PENDING state, to reduce the likelihood that scaling up the number of ingesters in multiple zones simultaneously causes a read outage. #6726 #6727
  • [BUGFIX] Querier: don't cancel inflight queries from a query-scheduler if the stream between the querier and query-scheduler is broken. #6728
  • [BUGFIX] Store-gateway: Fix double-counting of some duration metrics. #6616
  • [BUGFIX] Fixed possible series matcher corruption leading to wrong series being included in query results. #6884

Mixin

  • [CHANGE] Dashboards: enabled reporting gRPC codes as status_code label in Mimir dashboards. In case of gRPC calls, the successful status_code label on cortex_request_duration_seconds and gRPC client request duration metrics has changed from 'success' and '2xx' to 'OK'. #6561
  • [CHANGE] Alerts: remove MimirGossipMembersMismatch alert and replace it with MimirGossipMembersTooHigh and MimirGossipMembersTooLow alerts that should have a higher signal-to-noise ratio. #6508
  • [ENHANCEMENT] Dashboards: Optionally show rejected requests on Mimir Writes dashboard. Useful when used together with "early request rejection" in ingester and distributor. #6132 #6556
  • [ENHANCEMENT] Alerts: added a critical alert for CompactorSkippedBlocksWithOutOfOrderChunks when multiple blocks are affected. #6410
  • [ENHANCEMENT] Dashboards: Added the min-replicas for autoscaling dashboards. #6528
  • [BUGFIX] Alerts: fixed issue where GossipMembersMismatch warning message referred to per-instance labels that were not produced by the alert query. #6146
  • [BUGFIX] Dashboards: Fix autoscaling dashboard panels for KEDA > 2.9. Requires scraping the KEDA operator for metrics since they moved. #6528
  • [BUGFIX] Alerts: Fix autoscaling alerts for KEDA > 2.9. Requires scraping the KEDA operator for metrics since they moved. #6528

Jsonnet

  • [CHANGE] Ingester: reduce -server.grpc-max-concurrent-streams to 500. #5666
  • [CHANGE] Changed default _config.cluster_domain from cluster.local to cluster.local. to reduce the number of DNS lookups made by Mimir. #6389
  • [CHANGE] Query-frontend: changed default _config.autoscaling_query_frontend_cpu_target_utilization from 1 to 0.75. #6395
  • [CHANGE] Distributor: Increase HPA scale down period such that distributors are slower to scale down after autoscaling up. #6589
  • [FEATURE] Store-gateway: Allow automated zone-by-zone downscaling, that can be enabled via the store_gateway_automated_downscale_enabled flag. It is disabled by default. #6149
  • [FEATURE] Ingester: Allow to configure TSDB Head early compaction using the following _config parameters: #6181
    • ingester_tsdb_head_early_compaction_enabled (disabled by default)
    • ingester_tsdb_head_early_compaction_reduction_percentage
    • ingester_tsdb_head_early_compaction_min_in_memory_series
  • [ENHANCEMENT] Double the amount of rule groups for each user tier. #5897
  • [ENHANCEMENT] Set maxUnavailable to 0 for distributor, overrides-exporter, querier, query-frontend, query-scheduler ruler-querier, ruler-query-frontend, ruler-query-scheduler and consul deployments, to ensure they don't become completely unavailable during a rollout. #5924
  • [ENHANCEMENT] Update rollout-operator to v0.9.0. #6022 #6110 #6558 #6681
  • [ENHANCEMENT] Update memcached to memcached:1.6.22-alpine. #6585
  • [ENHANCEMENT] Store-gateway: replaced the following deprecated CLI flags: #6319
    • -blocks-storage.bucket-store.index-header-lazy-loading-enabled replaced with -blocks-storage.bucket-store.index-header.lazy-loading-enabled
    • -blocks-storage.bucket-store.index-header-lazy-loading-idle-timeout replaced with -blocks-storage.bucket-store.index-header.lazy-loading-idle-timeout
  • [ENHANCEMENT] Store-gateway: Allow selective enablement of store-gateway automated scaling on a per-zone basis. #6302
  • [BUGFIX] Autoscaling: KEDA > 2.9 removed the ability to set metricName in the trigger metadata. To help discern which metric is used by the HPA, we set the trigger name to what was the metricName. This is available as the scaler label on keda_* metrics. #6528

Mimirtool

  • [ENHANCEMENT] Analyze Grafana: Improve support for variables in range. #6657
  • [BUGFIX] Fix out of bounds error on export with large timespans and/or series count. #5700
  • [BUGFIX] Fix the issue where --read-timeout was applied to the entire mimirtool analyze grafana invocation rather than to individual Grafana API calls. #5915
  • [BUGFIX] Fix incorrect remote-read path joining for mimirtool remote-read commands on Windows. #6011
  • [BUGFIX] Fix template files full path being sent in mimirtool alertmanager load command. #6138
  • [BUGFIX] Analyze rule-file: .metricsUsed field wasn't populated. #6953

Mimir Continuous Test

Query-tee

Documentation

  • [ENHANCEMENT] Document the concept of native histograms and how to send them to Mimir, migration path. #5956 #6488 #6539
  • [ENHANCEMENT] Document native histograms query and visualization. #6231

Tools

  • [CHANGE] tsdb-index: Rename tool to tsdb-series. #6317
  • [FEATURE] tsdb-labels: Add tool to print label names and values of a TSDB block. #6317
  • [ENHANCEMENT] trafficdump: Trafficdump can now parse OTEL requests. Entire request is dumped to output, there's no filtering of fields or matching of series done. #6108

All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.10.5...mimir-2.11.0

mimir-2.11.0-rc.0

4 months ago

This release contains 531 PRs from 55 authors, including new contributors Benjamin, Dominik Kepinski, Jonathan Donzallaz, Juraj Michálek, Kai.Ke, Ludovic Terrier, Luke, Maciej Lech, Matthew Penner, Michael Potter, Mihai Țimbota-Belin, Rasmus Werner Salling, Ying WANG, chencs, fayzal-g, kalle (jag), renovate[bot], sarthaktyagi-505, whoami. Thank you!

Grafana Mimir version 2.11.0-rc.0 release notes

Grafana Labs is excited to announce version 2.11 of Grafana Mimir.

The highlights that follow include the top features, enhancements, and bugfixes in this release. For the complete list of changes, see the changelog.

Features and enhancements

  • Sampled logging of errors in the ingester. A high-traffic Mimir cluster can occasionally become bogged down logging high volumes of repeated errors. You can now reduce the amount of errors outputted to logs by setting a sample rate via the -ingester.error-sample-rate CLI flag.
  • Add total request size instance limit for ingesters. This limit protects the ingesters against requests that together may cause an OOM. Enable this feature by setting the -ingester.instance-limits.max-inflight-push-requests-bytes CLI flag in combination with the -ingester.limit-inflight-requests-using-grpc-method-limiter CLI flag.
  • Reduce the resolution of incoming native histograms samples if the incoming sample has too many buckets compared to -validation.max-native-histogram-buckets. This is enabled by default but can be turned off by setting the -validation.reduce-native-histogram-over-max-buckets CLI flag to false.
  • Improved query-scheduler performance under load. This is particularly apparent for clusters with large numbers of queriers.
  • Ingester to querier chunks streaming reduces the memory utilization of queriers and reduces the likelihood of OOMs.
  • Ingester query request minimization reduces the number of query requests to ingesters, improving performance and resource utilization for both ingesters and queriers.

Experimental features

Grafana Mimir 2.11 includes new features that are considered experimental and disabled by default. Please use them with caution and report any issue you encounter:

  • Block specified queries on a per-tenant basis. This is configured via the blocked_queries limit. See the docs for more information.
  • Store metadata when ingesting metrics via OTLP. This makes metric description and type available when ingesting metrics via OTLP. You can enable this feature by setting the CLI flag -distributor.enable-otlp-metadata-storage to true.
  • Reject gRPC push requests that the ingester/distributor is unable to accept before reading them into memory. You can enable this feature by using the -ingester.limit-inflight-requests-using-grpc-method-limiter and/or the -distributor.limit-inflight-requests-using-grpc-method-limiter CLI flags for the ingester and/or the distributor, respectively.
  • Customize the memcached client write and read buffer size. The buffer allocated for each memcached connection can be configured via the following CLI flags:
    • For the blocks storage:
      • -blocks-storage.bucket-store.chunks-cache.memcached.read-buffer-size-bytes
      • -blocks-storage.bucket-store.chunks-cache.memcached.write-buffer-size-bytes
      • -blocks-storage.bucket-store.index-cache.memcached.read-buffer-size-bytes
      • -blocks-storage.bucket-store.index-cache.memcached.write-buffer-size-bytes
      • -blocks-storage.bucket-store.metadata-cache.memcached.read-buffer-size-bytes
      • -blocks-storage.bucket-store.metadata-cache.memcached.write-buffer-size-bytes
    • For the query frontend:
      • -query-frontend.results-cache.memcached.read-buffer-size-bytes
      • -query-frontend.results-cache.memcached.write-buffer-size-bytes
    • For the ruler storage:
      • -ruler-storage.cache.memcached.read-buffer-size-bytes
      • -ruler-storage.cache.memcached.write-buffer-size-bytes
  • Configure the number of long-living workers used to process gRPC requests. This can decrease CPU usage by reducing the number of stack allocations. Configure this feature by using the -server.grpc.num-workers CLI flag.
  • Enforce a limit in bytes on the PostingsForMatchers cache used by ingesters. This limit can be configured via the -blocks-storage.tsdb.head-postings-for-matchers-cache-max-bytes and -blocks-storage.tsdb.block-postings-for-matchers-cache-max-bytes CLI flags.
  • Pre-allocate the pool of workers in the distributor that are used to send push requests to ingesters. This can decrease CPU usage by reducing the number of stack allocations. You can enable this feature by using the -distributor.reusable-ingester-push-worker flag.
  • Include a Retry-After header in recoverable error responses from the distributor. This can protect your Mimir cluster from clients including Prometheus that default to retrying very quickly. Enable this feature by setting the -distributor.retry-after-header.enabled CLI flag.

Helm chart improvements

The Grafana Mimir and Grafana Enterprise Metrics Helm chart is now released independently. See the Grafana Mimir Helm chart documentation.

Important changes

In Grafana Mimir 2.11 the following behavior has changed:

  • The utilization-based read path limiter now operates on Go heap size instead of RSS from the Linux proc file system.

The following configuration options had been previously deprecated and are removed in Grafana Mimir 2.11:

  • The CLI flag -querier.iterators.
  • The CLI flag -query.batch-iterators.
  • The CLI flag -blocks-storage.bucket-store.bucket-index.enabled.
  • The CLI flag -blocks-storage.bucket-store.chunk-pool-min-bucket-size-bytes.
  • The CLI flag -blocks-storage.bucket-store.chunk-pool-max-bucket-size-bytes.
  • The CLI flag -blocks-storage.bucket-store.max-chunk-pool-bytes.

The following configuration options are deprecated and will be removed in Grafana Mimir 2.13:

  • The CLI flag -log.buffered; this is now the default behavior.

The following metrics are removed:

  • cortex_query_frontend_workers_enqueued_requests_total; use cortex_query_frontend_enqueue_duration_seconds_count instead.

The following configuration option defaults were changed:

  • The CLI flag -blocks-storage.bucket-store.index-header.sparse-persistence-enabled now defaults to true.
  • The default value for the CLI flag -blocks-storage.bucket-store.index-header.lazy-loading-concurrency was changed from 0 to 4.
  • The default value for the CLI flag -blocks-storage.tsdb.series-hash-cache-max-size-bytes was changed from 1GB to 350MB.
  • The default value for the CLI flag -blocks-storage.tsdb.early-head-compaction-min-estimated-series-reduction-percentage was changed from 10 to 15.

Bug fixes

  • Ingester: Respect context cancelation during query execution. PR 6085
  • Distributor: Return 529 when ingestion rate limit is hit and the distributor.service_overload_status_code_on_rate_limit_enabled flag is active. PR 6549
  • Query-scheduler: Prevent accumulation of stale querier connections. PR 6100
  • Packaging: Fix preremove script preventing upgrades on RHEL based OS. PR 6067

Changelog

2.11.0-rc.0

Grafana Mimir

  • [CHANGE] The following deprecated configurations have been removed: #6673 #6779 #6808 #6814
    • -querier.iterators
    • -querier.batch-iterators
    • -blocks-storage.bucket-store.max-chunk-pool-bytes
    • -blocks-storage.bucket-store.chunk-pool-min-bucket-size-bytes
    • -blocks-storage.bucket-store.chunk-pool-max-bucket-size-bytes
    • -blocks-storage.bucket-store.bucket-index.enabled
  • [CHANGE] Querier: Split worker GRPC config into separate client configs for the frontend and scheduler to allow TLS to be configured correctly when specifying the tls_server_name. The GRPC config specified under -querier.frontend-client.* will no longer apply to the scheduler client, and will need to be set explicitly under -querier.scheduler-client.*. #6445 #6573
  • [CHANGE] Store-gateway: enable sparse index headers by default. Sparse index headers reduce the time to load an index header up to 90%. #6005
  • [CHANGE] Store-gateway: lazy-loading concurrency limit default value is now 4. #6004
  • [CHANGE] General: enabled -log.buffered by default. The -log.buffered has been deprecated and will be removed in Mimir 2.13. #6131
  • [CHANGE] Ingester: changed default -blocks-storage.tsdb.series-hash-cache-max-size-bytes setting from 1GB to 350MB. The new default cache size is enough to store the hashes for all series in a ingester, assuming up to 2M in-memory series per ingester and using the default 13h retention period for local TSDB blocks in the ingesters. #6130
  • [CHANGE] Query-frontend: removed cortex_query_frontend_workers_enqueued_requests_total. Use cortex_query_frontend_enqueue_duration_seconds_count instead. #6121
  • [CHANGE] Ingester / querier: enable ingester to querier chunks streaming by default and mark it as stable. #6174
  • [CHANGE] Ingester / querier: enable ingester query request minimisation by default and mark it as stable. #6174
  • [CHANGE] Ingester: changed the default value for the experimental configuration parameter -blocks-storage.tsdb.early-head-compaction-min-estimated-series-reduction-percentage from 10 to 15. #6186
  • [CHANGE] Ingester: /ingester/push HTTP endpoint has been removed. This endpoint was added for testing and troubleshooting, but was never documented or used for anything. #6299
  • [CHANGE] Experimental setting -log.rate-limit-logs-per-second-burst renamed to -log.rate-limit-logs-burst-size. #6230
  • [CHANGE] Distributor: instead of errors with HTTP status codes, Push() now returns errors with gRPC codes: #6377
    • http.StatusAccepted (202) code is replaced with codes.AlreadyExists.
    • http.BadRequest (400) code is replaced with codes.FailedPrecondition.
    • http.StatusTooManyRequests (429) and the non-standard 529 (The service is overloaded) codes are replaced with codes.ResourceExhausted.
  • [CHANGE] Ingester: by setting the newly introduced experimental CLI flag -ingester.return-only-grpc-errors to true, ingester will return only gRPC errors. This feature changes the following status codes: #6443 #6680 #6723
    • http.StatusBadRequest (400) is replaced with codes.FailedPrecondition on the write path.
    • http.StatusServiceUnavailable (503) is replaced with codes.Internal on the write path, and with codes.ResourceExhausted on the read path.
    • codes.Unknown is replaced with codes.Internal on both write and read path.
  • [CHANGE] Upgrade Node.js to v20. #6540
  • [CHANGE] Querier: cortex_querier_blocks_consistency_checks_failed_total is now incremented when a block couldn't be queried from any attempted store-gateway as opposed to incremented after each attempt. Also cortex_querier_blocks_consistency_checks_total is incremented once per query as opposed to once per attempt (with 3 attempts). #6590
  • [CHANGE] Ingester: Modify utilization based read path limiter to base memory usage on Go heap size. #6584
  • [FEATURE] Distributor: added option -distributor.retry-after-header.enabled to include the Retry-After header in recoverable error responses. #6608
  • [FEATURE] Query-frontend: add experimental support for query blocking. Queries are blocked on a per-tenant basis and is configured via the limit blocked_queries. #5609
  • [FEATURE] Vault: Added support for new Vault authentication methods: AppRole, Kubernetes, UserPass and Token. #6143
  • [FEATURE] Add experimental endpoint /api/v1/cardinality/active_series to return the set of active series for a given selector. #6536 #6619 #6651 #6667
  • [FEATURE] Added -<prefix>.s3.part-size flag to configure the S3 minimum file size in bytes used for multipart uploads. #6592
  • [FEATURE] Add the experimental -<prefix>.s3.send-content-md5 flag (defaults to false) to configure S3 Put Object requests to send a Content-MD5 header. Setting this flag is not recommended unless your object storage does not support checksums. #6622
  • [FEATURE] Distributor: add an experimental flag -distributor.reusable-ingester-push-worker that can be used to pre-allocate a pool of workers to be used to send push requests to the ingesters. #6660
  • [FEATURE] Distributor: Support enabling of automatically generated name suffixes for metrics ingested via OTLP, through the flag -distributor.otel-metric-suffixes-enabled. #6542
  • [ENHANCEMENT] Query-frontend: don't treat cancel as an error. #4648
  • [ENHANCEMENT] Ingester: exported summary cortex_ingester_inflight_push_requests_summary tracking total number of inflight requests in percentile buckets. #5845
  • [ENHANCEMENT] Query-scheduler: add cortex_query_scheduler_enqueue_duration_seconds metric that records the time taken to enqueue or reject a query request. #5879
  • [ENHANCEMENT] Query-frontend: add cortex_query_frontend_enqueue_duration_seconds metric that records the time taken to enqueue or reject a query request. When query-scheduler is in use, the metric has the scheduler_address label to differentiate the enqueue duration by query-scheduler backend. #5879 #6087 #6120
  • [ENHANCEMENT] Store-gateway: add metric cortex_bucket_store_blocks_loaded_by_duration for counting the loaded number of blocks based on their duration. #6074 #6129
  • [ENHANCEMENT] Expose /sync/mutex/wait/total:seconds Go runtime metric as go_sync_mutex_wait_total_seconds_total from all components. #5879
  • [ENHANCEMENT] Query-scheduler: improve latency with many concurrent queriers. #5880
  • [ENHANCEMENT] Ruler: add new per-tenant cortex_ruler_queries_zero_fetched_series_total metric to track rules that fetched no series. #5925
  • [ENHANCEMENT] Implement support for limit, limit_per_metric and metric parameters for <Prometheus HTTP prefix>/api/v1/metadata endpoint. #5890
  • [ENHANCEMENT] Distributor: add experimental support for storing metadata when ingesting metrics via OTLP. This makes metrics description and type available when ingesting metrics via OTLP. Enable with -distributor.enable-otlp-metadata-storage=true. #5693 #6035 #6254
  • [ENHANCEMENT] Ingester: added support for sampling errors, which can be enabled by setting -ingester.error-sample-rate. This way each error will be logged once in the configured number of times. All the discarded samples will still be tracked by the cortex_discarded_samples_total metric. #5584 #6014
  • [ENHANCEMENT] Ruler: Fetch secrets used to configure TLS on the Alertmanager client from Vault when -vault.enabled is true. #5239
  • [ENHANCEMENT] Query-frontend: added query-sharding support for group by aggregation queries. #6024
  • [ENHANCEMENT] Fetch secrets used to configure server-side TLS from Vault when -vault.enabled is true. #6052.
  • [ENHANCEMENT] Packaging: add logrotate config file. #6142
  • [ENHANCEMENT] Ingester: add the experimental configuration options -blocks-storage.tsdb.head-postings-for-matchers-cache-max-bytes and -blocks-storage.tsdb.block-postings-for-matchers-cache-max-bytes to enforce a limit in bytes on the PostingsForMatchers() cache used by ingesters (the cache limit is per TSDB head and block basis, not a global one). The experimental configuration options -blocks-storage.tsdb.head-postings-for-matchers-cache-size and -blocks-storage.tsdb.block-postings-for-matchers-cache-size have been deprecated. #6151
  • [ENHANCEMENT] Ingester: use the PostingsForMatchers() in-memory cache for label values queries with matchers too. #6151
  • [ENHANCEMENT] Ingester / store-gateway: optimized regex matchers. #6168 #6250
  • [ENHANCEMENT] Distributor: Include ingester IDs in circuit breaker related metrics and logs. #6206
  • [ENHANCEMENT] Querier: improve errors and logging when streaming chunks from ingesters and store-gateways. #6194 #6309
  • [ENHANCEMENT] Querier: Add cortex_querier_federation_exemplar_tenants_queried and cortex_querier_federation_tenants_queried metrics to track the number of tenants queried by multi-tenant queries. #6374 #6409
  • [ENHANCEMENT] All: added an experimental -server.grpc.num-workers flag that configures the number of long-living workers used to process gRPC requests. This could decrease the CPU usage by reducing the number of stack allocations. #6311
  • [ENHANCEMENT] All: improved IPv6 support by using the proper host:port formatting. #6311
  • [ENHANCEMENT] Querier: always return error encountered during chunks streaming, rather than the stream has already been exhausted. #6345 #6433
  • [ENHANCEMENT] Query-frontend: add instance_enable_ipv6 to support IPv6. #6111
  • [ENHANCEMENT] Store-gateway: return same detailed error messages as queriers when chunks or series limits are reached. #6347
  • [ENHANCEMENT] Querier: reduce memory consumed for queries that hit store-gateways. #6348
  • [ENHANCEMENT] Ruler: include corresponding trace ID with log messages associated with rule evaluation. #6379 #6520
  • [ENHANCEMENT] Querier: clarify log messages and span events emitted while querying ingesters, and include both ingester name and address when relevant. #6381
  • [ENHANCEMENT] Memcached: introduce new experimental configuration parameters -<prefix>.memcached.write-buffer-size-bytes -<prefix>.memcached.read-buffer-size-bytes to customise the memcached client write and read buffer size (the buffer is allocated for each memcached connection). #6468
  • [ENHANCEMENT] Ingester, Distributor: added experimental support for rejecting push requests received via gRPC before reading them into memory, if ingester or distributor is unable to accept the request. This is activated by using -ingester.limit-inflight-requests-using-grpc-method-limiter for ingester, and -distributor.limit-inflight-requests-using-grpc-method-limiter for distributor. #5976 #6300
  • [ENHANCEMENT] Add capability in store-gateways to accept number of tokens through config. -store-gateway.sharding-ring.num-tokens, default-value=512 #4863
  • [ENHANCEMENT] Query-frontend: return warnings generated during query evaluation. #6391
  • [ENHANCEMENT] Server: Add the option -server.http-read-header-timeout to enable specifying a timeout for reading HTTP request headers. It defaults to 0, in which case reading of headers can take up to -server.http-read-timeout, leaving no time for reading body, if there's any. #6517
  • [ENHANCEMENT] Add connection-string option, -<prefix>.azure.connection-string, for Azure Blob Storage. #6487
  • [ENHANCEMENT] Ingester: Add -ingester.instance-limits.max-inflight-push-requests-bytes. This limit protects the ingester against requests that together may cause an OOM. #6492
  • [ENHANCEMENT] Ingester: add new per-tenant cortex_ingester_local_limits metric to expose the calculated local per-tenant limits seen at each ingester. Exports the local per-tenant series limit with label {limit="max_global_series_per_user"} #6403
  • [ENHANCEMENT] Query-frontend: added "queue_time_seconds" field to "query stats" log. This is total time that query and subqueries spent in the queue, before queriers picked it up. #6537
  • [ENHANCEMENT] Server: Add -server.report-grpc-codes-in-instrumentation-label-enabled CLI flag to specify whether gRPC status codes should be used in status_code label of cortex_request_duration_seconds metric. It defaults to false, meaning that successful and erroneous gRPC status codes are represented with success and error respectively. #6562
  • [ENHANCEMENT] Server: Add -ingester.client.report-grpc-codes-in-instrumentation-label-enabled CLI flag to specify whether gRPC status codes should be used in status_code label of cortex_ingester_client_request_duration_seconds metric. It defaults to false, meaning that successful and erroneous gRPC status codes are represented with 2xx and error respectively. #6562
  • [ENHANCEMENT] Server: Add -server.http-log-closed-connections-without-response-enabled option to log details about connections to HTTP server that were closed before any data was sent back. This can happen if client doesn't manage to send complete HTTP headers before timeout. #6612
  • [ENHANCEMENT] Query-frontend: include length of query, time since the earliest and latest points of a query, time since the earliest and latest points of a query, cached/uncached bytes in "query stats" logs. Time parameters (start/end/time) are always formatted as RFC3339 now. #6473 #6477 #6709 #6710
  • [ENHANCEMENT] Distributor: added support for reducing the resolution of native histogram samples upon ingestion if the sample has too many buckets compared to -validation.max-native-histogram-buckets. This is enabled by default and can be turned off by setting -validation.reduce-native-histogram-over-max-buckets to false. #6535
  • [ENHANCEMENT] Query-frontend: optionally wait for the frontend to complete startup if requests are received while the frontend is still starting. Disabled by default, set -query-frontend.not-running-timeout to a non-zero value to enable. #6621
  • [ENHANCEMENT] Distributor: Include source IPs in OTLP push handler logs. #6652
  • [ENHANCEMENT] Query-frontend: return clearer error message when a query request is received while shutting down. #6675
  • [ENHANCEMENT] Querier: return clearer error message when a query request is cancelled by the caller. #6697
  • [BUGFIX] Distributor: return server overload error in the event of exceeding the ingestion rate limit. #6549
  • [BUGFIX] Ring: Ensure network addresses used for component hash rings are formatted correctly when using IPv6. #6068
  • [BUGFIX] Query-scheduler: don't retain connections from queriers that have shut down, leading to gradually increasing enqueue latency over time. #6100 #6145
  • [BUGFIX] Ingester: prevent query logic from continuing to execute after queries are canceled. #6085
  • [BUGFIX] Ensure correct nesting of children of the querier.Select tracing span. #6085
  • [BUGFIX] Packaging: fix preremove script preventing upgrades on RHEL based OS. #6067
  • [BUGFIX] Querier: return actual error rather than attempted to read series at index XXX from stream, but the stream has already been exhausted (or even no error at all) when streaming chunks from ingesters or store-gateways is enabled and an error occurs while streaming chunks. #6346
  • [BUGFIX] Querier: reduce log volume when querying ingesters with zone-awareness enabled and one or more instances in a single zone unavailable. #6381
  • [BUGFIX] Querier: don't try to query further ingesters if ingester query request minimization is enabled and a query limit is reached as a result of the responses from the initial set of ingesters. #6402
  • [BUGFIX] Ingester: Don't cache context cancellation error when querying. #6446
  • [BUGFIX] Ingester: don't ignore errors encountered while iterating through chunks or samples in response to a query request. #6469
  • [BUGFIX] All: fix issue where traces for some inter-component gRPC calls would incorrectly show the call as failing due to cancellation. #6470
  • [BUGFIX] Querier: correctly mark streaming requests to ingesters or store-gateways as successful, not cancelled, in metrics and traces. #6471 #6505
  • [BUGFIX] Querier: fix issue where queries fail with "context canceled" error when an ingester or store-gateway fails healthcheck while the query is in progress. #6550
  • [BUGFIX] Tracing: When creating an OpenTelemetry tracing span, add it to the context for later retrieval. #6614
  • [BUGFIX] Querier: always report query results to query-frontends, even when cancelled, to ensure query-frontends don't wait for results that will otherwise never arrive. #6703
  • [BUGFIX] Querier: attempt to query ingesters in PENDING state, to reduce the likelihood that scaling up the number of ingesters in multiple zones simultaneously causes a read outage. #6726 #6727
  • [BUGFIX] Querier: don't cancel inflight queries from a query-scheduler if the stream between the querier and query-scheduler is broken. #6728
  • [BUGFIX] Store-gateway: Fix double-counting of some duration metrics. #6616
  • [BUGFIX] Fixed possible series matcher corruption leading to wrong series being included in query results. #6884

Mixin

  • [CHANGE] Dashboards: enabled reporting gRPC codes as status_code label in Mimir dashboards. In case of gRPC calls, the successful status_code label on cortex_request_duration_seconds and gRPC client request duration metrics has changed from 'success' and '2xx' to 'OK'. #6561
  • [CHANGE] Alerts: remove MimirGossipMembersMismatch alert and replace it with MimirGossipMembersTooHigh and MimirGossipMembersTooLow alerts that should have a higher signal-to-noise ratio. #6508
  • [ENHANCEMENT] Dashboards: Optionally show rejected requests on Mimir Writes dashboard. Useful when used together with "early request rejection" in ingester and distributor. #6132 #6556
  • [ENHANCEMENT] Alerts: added a critical alert for CompactorSkippedBlocksWithOutOfOrderChunks when multiple blocks are affected. #6410
  • [ENHANCEMENT] Dashboards: Added the min-replicas for autoscaling dashboards. #6528
  • [BUGFIX] Alerts: fixed issue where GossipMembersMismatch warning message referred to per-instance labels that were not produced by the alert query. #6146
  • [BUGFIX] Dashboards: Fix autoscaling dashboard panels for KEDA > 2.9. Requires scraping the KEDA operator for metrics since they moved. #6528
  • [BUGFIX] Alerts: Fix autoscaling alerts for KEDA > 2.9. Requires scraping the KEDA operator for metrics since they moved. #6528

Jsonnet

  • [CHANGE] Ingester: reduce -server.grpc-max-concurrent-streams to 500. #5666
  • [CHANGE] Changed default _config.cluster_domain from cluster.local to cluster.local. to reduce the number of DNS lookups made by Mimir. #6389
  • [CHANGE] Query-frontend: changed default _config.autoscaling_query_frontend_cpu_target_utilization from 1 to 0.75. #6395
  • [CHANGE] Distributor: Increase HPA scale down period such that distributors are slower to scale down after autoscaling up. #6589
  • [FEATURE] Store-gateway: Allow automated zone-by-zone downscaling, that can be enabled via the store_gateway_automated_downscale_enabled flag. It is disabled by default. #6149
  • [FEATURE] Ingester: Allow to configure TSDB Head early compaction using the following _config parameters: #6181
    • ingester_tsdb_head_early_compaction_enabled (disabled by default)
    • ingester_tsdb_head_early_compaction_reduction_percentage
    • ingester_tsdb_head_early_compaction_min_in_memory_series
  • [ENHANCEMENT] Double the amount of rule groups for each user tier. #5897
  • [ENHANCEMENT] Set maxUnavailable to 0 for distributor, overrides-exporter, querier, query-frontend, query-scheduler ruler-querier, ruler-query-frontend, ruler-query-scheduler and consul deployments, to ensure they don't become completely unavailable during a rollout. #5924
  • [ENHANCEMENT] Update rollout-operator to v0.9.0. #6022 #6110 #6558 #6681
  • [ENHANCEMENT] Update memcached to memcached:1.6.22-alpine. #6585
  • [ENHANCEMENT] Store-gateway: replaced the following deprecated CLI flags: #6319
    • -blocks-storage.bucket-store.index-header-lazy-loading-enabled replaced with -blocks-storage.bucket-store.index-header.lazy-loading-enabled
    • -blocks-storage.bucket-store.index-header-lazy-loading-idle-timeout replaced with -blocks-storage.bucket-store.index-header.lazy-loading-idle-timeout
  • [ENHANCEMENT] Store-gateway: Allow selective enablement of store-gateway automated scaling on a per-zone basis. #6302
  • [BUGFIX] Autoscaling: KEDA > 2.9 removed the ability to set metricName in the trigger metadata. To help discern which metric is used by the HPA, we set the trigger name to what was the metricName. This is available as the scaler label on keda_* metrics. #6528

Mimirtool

  • [ENHANCEMENT] Analyze Grafana: Improve support for variables in range. #6657
  • [BUGFIX] Fix out of bounds error on export with large timespans and/or series count. #5700
  • [BUGFIX] Fix the issue where --read-timeout was applied to the entire mimirtool analyze grafana invocation rather than to individual Grafana API calls. #5915
  • [BUGFIX] Fix incorrect remote-read path joining for mimirtool remote-read commands on Windows. #6011
  • [BUGFIX] Fix template files full path being sent in mimirtool alertmanager load command. #6138
  • [BUGFIX] Analyze rule-file: .metricsUsed field wasn't populated. #6953

Mimir Continuous Test

Query-tee

Documentation

  • [ENHANCEMENT] Document the concept of native histograms and how to send them to Mimir, migration path. #5956 #6488 #6539
  • [ENHANCEMENT] Document native histograms query and visualization. #6231

Tools

  • [CHANGE] tsdb-index: Rename tool to tsdb-series. #6317
  • [FEATURE] tsdb-labels: Add tool to print label names and values of a TSDB block. #6317
  • [ENHANCEMENT] trafficdump: Trafficdump can now parse OTEL requests. Entire request is dumped to output, there's no filtering of fields or matching of series done. #6108

All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.10.4...mimir-2.11.0-rc.0

mimir-2.10.5

5 months ago

Changelog

2.10.5

Grafana Mimir

  • [ENHANCEMENT] Update Docker base images from alpine:3.18.3 to alpine:3.18.5. #6897
  • [BUGFIX] Fixed possible series matcher corruption leading to wrong series being included in query results. #6886

Documentation

  • [ENHANCEMENT] Document the concept of native histograms and how to send them to Mimir, migration path. #6757
  • [ENHANCEMENT] Document native histograms query and visualization. #6757

All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.10.4...mimir-2.10.5

mimir-2.9.4

5 months ago

Changelog

2.9.4

Grafana Mimir

  • [ENHANCEMENT] Update Docker base images from alpine:3.18.3 to alpine:3.18.5. #6895

All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.9.3...mimir-2.9.4

mimir-2.9.3

6 months ago

This release contains 1 PR from 1 author. Thank you!

Changelog

2.9.3

  • [BUGFIX] Update go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp to 0.44 which includes a fix for CVE-2023-45142. #6637

All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.9.2...mimir-2.9.3

mimir-2.10.4

6 months ago

This release contains 3 PRs from 1 authors. Thank you!

Changelog

2.10.4

Grafana Mimir

  • [BUGFIX] Update otelhttp library to v0.44.0 as a mitigation for CVE-2023-45142. #6634

All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.10.3...mimir-2.10.4

mimir-2.10.3

7 months ago

This release contains 1 PR from 1 author. Thank you!

Changelog

2.10.3

Grafana Mimir

  • [BUGFIX] Update grpc-go library to 1.57.2-dev that includes a fix for a bug introduced in 1.57.1. #6419

All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.10.2...mimir-2.10.3