Grafana Mimir Versions Save

Grafana Mimir provides horizontally scalable, highly available, multi-tenant, long-term storage for Prometheus.

mimir-2.9.2

8 months ago

This release contains 5 PRs from 3 authors. Thank you!

Grafana Mimir version 2.9.2 release notes

Changelog

2.9.2

  • [BUGFIX] Update grpc-go library to 1.56.3 and golang.org/x/net to 0.17, which include fix for CVE-2023-44487. #6353 #6364

All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.9.1...mimir-2.9.2

mimir-2.10.2

8 months ago

This release contains 2 PRs from 1 authors. Thank you!

Warning

This release contains a known bug in the grpc-go library that drastically affects network performance of the servers. Mimir 2.10.3 was released fixing this issue.

Changelog

2.10.2

Grafana Mimir

  • [BUGFIX] Update grpc-go library to 1.57.1 and golang.org/x/net to 0.17, which include fix for CVE-2023-44487. #6349

All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.10.1...mimir-2.10.2

mimir-2.10.1

8 months ago

This release contains 6 PRs from 4 authors. Thank you!

Changelog

2.10.1

Grafana Mimir

  • [CHANGE] Update Go version to 1.21.3. #6244 #6325
  • [BUGFIX] Query-frontend: Don't retry read requests rejected by the ingester due to utilization based read path limiting. #6032
  • [BUGFIX] Ingester: fix panic in WAL replay of certain native histograms. #6086

All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.10.0...mimir-2.10.1

mimir-2.10.0

9 months ago

This release contains 455 PRs from 54 authors, including new contributors Aaron Sanders, Alexander Proschek, Aljoscha Pörtner, balazs92117, Francois Gouteroux, Franco Posa, Heather Yuan, jingyang, kendrickclark, m4r1u2, Milan Plžík, Samir Teymurov, Sven Haardiek, Thomas Schaaf, Tiago Posse. Thank you!

Grafana Mimir logo Grafana Mimir version 2.10.0 release notes

Grafana Labs is excited to announce version 2.10 of Grafana Mimir.

The highlights that follow include the top features, enhancements, and bugfixes in this release. For the complete list of changes, see the changelog.

Features and enhancements

  • Added support for rule filtering by passing file, ruler_group and rule_name parameters to the ruler endpoint /api/v1/rules.
  • Added support to only count series that are considered active through the Cardinality API endpoint /api/v1/cardinality/label_values by passing the count_method parameter. You can set it to active to count only series that are considered active according to the -ingester.active-series-metrics-idle-timeout flag setting rather than counting all in-memory series.
  • Reduced the overall memory consumption by changing the internal data structure for labels. Expect ingesters to use around 15% less memory with this change, depending on the pattern of labels used, number of tenants, etc.
  • Reduced the memory usage of the Active Series Tracker in the ingester.
  • Added a buffered logging implementation that can be enabled through the -log.buffered CLI flag. This should reduce contention and resource usage under heavy usage patterns.
  • Improved the performance of the OTLP ingestion and more detailed information was added to the traces in order to make troubleshooting problems easier.
  • Improved the performance of series matching in the store-gateway by always including the __name__ posting group causing a reduction in the number of object storage API calls.
  • Improved the performance of label values with matchers calls when number of matched series is small. If you're using Grafana to query Grafana Mimir, you'll need to be sure your Prometheus data source configuration has the Prometheus type set to Mimir and the Version set correctly in order to benefit from this improvement.
  • Support to cache cardinality, label names and label values query responses in query frontend. The cache will be used when -query-frontend.cache-results is enabled, and -query-frontend.results-cache-ttl-for-cardinality-query or -query-frontend.results-cache-ttl-for-labels-query is set to a value greater than 0.
  • Reduced wasted effort spent computing results that won't be used by having queriers cancel the requests sent to the ingesters in a zone upon receiving first error from that zone.
  • Reduced object storage use by enhancing the compactor to remove the bucket index, markers, and debug files when it detects zero remaining blocks in the bucket index. This cleanup process can be enabled by setting the -compactor.no-blocks-file-cleanup-enabled option to true.
  • Added new debug HTTP endpoints /ingester/tenants and /ingester/tsdb/{tenant} to the ingester that provide debug information about tenants and their TSDBs.
  • Added new metrics for tracking native histograms in active series: cortex_ingester_active_native_histogram_series, cortex_ingester_active_native_histogram_series_custom_tracker, cortex_ingester_active_native_histogram_buckets, cortex_ingester_active_native_histogram_buckets_custom_tracker. The first 2 are the subsets of the existing and unmodified cortex_ingester_active_series and cortex_ingester_active_series_custom_tracker respectively, only tracking native histogram series, and the last 2 are the equivalent for tracking the number of buckets in native histogram series.

Additionally, the following previously experimental features are now considered stable:

  • Support for a ruler storage cache. This cache should reduce the number of "list objects" API calls issued to the object storage when there are 2+ ruler replicas running in a Mimir cluster. The cache can be configured by setting the -ruler-storage.cache.* CLI flags or their respective YAML config options.
  • Query sharding cardinality estimation. This feature allows query sharding to take into account cardinality of similar requests executed previously when computing the maximum number of shards to use. You can enable it through the advanced CLI configuration flag -query-frontend.query-sharding-target-series-per-shard; we recommend starting with a value of 2500.
  • Query expression size limit. You can limit the size in bytes of the queries allowed to be processed through the CLI configuration flag -query-frontend.max-query-expression-size-bytes.
  • Peer discovery / tenant sharding for overrides exporters. You can enable it through the CLI configuration flag -overrides-exporter.ring.enabled.
  • Overrides exporter enabled metrics selection. You can select which metrics the overrides exporter should export through the CLI configuration flag -overrides-exporter.enabled-metrics.
  • Per-tenant results cache TTL. The time-to-live duration for cached query results can be configured using the results_cache_ttl and results_cache_ttl_for_out_of_order_time_window parameters.

Experimental features

Grafana Mimir 2.10 includes new features that are considered as experimental and disabled by default. Please use them with caution and report any issues you encounter:

  • Support for ingesting exponential histograms in OpenTelemetry format. The exponential histograms that are over the native histogram scale limit of 8 are downscaled to allow their ingestion.
  • Store-gateway index-header loading improvements, which include the ability to persist the sparse index-header to disk instead of reconstructing it on every restart (-blocks-storage.bucket-store.index-header-sparse-persistence-enabled) as well as the ability to persist the list of block IDs that were lazy-loaded while running to eagerly load them upon startup to prevent starting up with no loaded blocks (-blocks-storage.bucket-store.index-header.eager-loading-startup-enabled) and an option to limit the number of concurrent index-header loads when lazy-loading (-blocks-storage.bucket-store.index-header-lazy-loading-concurrency).
  • Option to allow queriers to reduce pressure on ingesters by initially querying only the minimum set of ingesters required to reach quorum. (-querier.minimize-ingester-requests).
  • Early TSDB Head compaction in the ingesters to reduce in-memory series when a certain threshold is reached. Useful to deal with high series churning rate. (-blocks-storage.tsdb.early-head-compaction-min-in-memory-series).
  • Spread-minimizing token generation algorithm for the ingesters. This new method drastically reduces the difference in series pushed to different ingesters. Please note that a migration process is required to switch from previous random generation algorithm, which will be detailed once the feature is declared stable.
  • Support for chunks streaming from store-gateways to queriers that should reduce the memory usage in the queriers. Can be enabled through the -querier.prefer-streaming-chunks-from-store-gateways option.
  • Support for circuit-breaking the distributor write requests to the ingesters. This can be enabled through the -ingester.client.circuit-breaker.* configuration options and should serve to let ingesters recover when under high pressure.
  • Support to limit read requests based on CPU/memory utilization. This should alleviate pressure on the ingesters after receiving heavy queries and reduce the likelihood of disrupting the write path. (-ingester.read-path-cpu-utilization-limit, -ingester.read-path-memory-utilization-limit, -ingester.log-utilization-based-limiter-cpu-samples).

Helm chart improvements

The Grafana Mimir and Grafana Enterprise Metrics Helm chart is now released independently. See the Grafana Mimir Helm chart documentation.

Important changes

In Grafana Mimir 2.10 we have changed the following behaviors:

  • Query requests are initiated only to ingesters in the ACTIVE state in the ring. This is not expected to introduce any degradation in terms of query results correctness or high-availability.
  • Per-instance limit errors are not logged anymore, to reduce resource usage when ingesters are under pressure. We encourage you to use metrics and alerting to monitor them instead. The following metrics have been added to count the number of requests rejected for hitting per-instance limits:
    • cortex_distributor_instance_rejected_requests_total
    • cortex_ingester_instance_rejected_requests_total
  • The CLI flag -validation.create-grace-period is now enforced in the ingester. If you've configured -validation.create-grace-period, make sure the configuration is applied to ingesters too.
  • The CLI flag -validation.create-grace-period is now enforced for exemplars. The cortex_discarded_exemplars_total{reason="exemplar_too_far_in_future",user="..."} series is incremented when exemplars are dropped because their timestamp is greater than "now + grace_period".
  • The CLI flag -validation.create-grace-period is now enforced in the query-frontend even when the configured value is 0. When the value is 0, the query end time range is truncated to the current real-world time.

The following metrics were removed:

  • cortex_ingester_shipper_dir_syncs_total
  • cortex_ingester_shipper_dir_sync_failures_total

The following configuration options are deprecated and will be removed in Grafana Mimir 2.12:

  • The CLI flag -blocks-storage.bucket-store.index-header-lazy-loading-enabled is deprecated, use the new configuration -blocks-storage.bucket-store.index-header.lazy-loading-enabled.
  • The CLI flag -blocks-storage.bucket-store.index-header-lazy-loading-idle-timeout is deprecated, use the new configuration -blocks-storage.bucket-store.index-header.lazy-loading-idle-timeout.
  • The CLI flag -blocks-storage.bucket-store.index-header-lazy-loading-concurrency is deprecated, use the new configuration -blocks-storage.bucket-store.index-header.lazy-loading-concurrency.

The following configuration options that were deprecated in Grafana Mimir 2.8 are removed:

  • The CLI flag blocks-storage.tsdb.max-tsdb-opening-concurrency-on-startup.

The following experimental configuration options were renamed or removed:

  • The CLI flag -querier.prefer-streaming-chunks was renamed to -querier.prefer-streaming-chunks-from-ingesters.
  • The CLI flag -blocks-storage.bucket-store.chunks-cache.fine-grained-chunks-caching-enabled was removed.
  • The CLI flag -blocks-storage.bucket-store.fine-grained-chunks-caching-ranges-per-series was removed.

The following experimental options are now stable:

  • The CLI flag -shutdown-delay.
  • The CLI flag -ingester.ring.excluded-zones.

The following configuration option defaults were changed:

  • The default value for the CLI flag -querier.streaming-chunks-per-ingester-buffer-size was changed from 512 to 256.
  • The default value for gRPC clients connect timeout was set to 5s (default inherited from gRPC client was 20s) with a default max backoff delay of 5s (default inherited from gRPC client was 120s).

Bug fixes

  • Ruler: fixed graceful shutdown for rule evaluations.
  • Ingester: fixed ingesters getting stuck when previous state is LEAVING and the number of tokens has changed upon restarting.
  • Querier: fixed timestamp() function fail with execution: attempted to read series at index 0 from stream, but the stream has already been exhausted if the experimental feature to stream chunks from ingesters to queriers is enabled.
  • Memberlist: brought back memberlist_client_kv_store_count metric that used to exist in Cortex, but got lost during grafana/dskit updates before Mimir 2.0.
  • Store-gateway: fixed an issue where stopping a store-gateway could cause all store-gateways to unload all blocks.
  • Ingester: prevented setting "last update time" of TSDB into the future when opening TSDB. This could prevent detecting of idle TSDB for a long time, if sample in distant future was ingested.
  • General: changed ballast to allocate smaller blocks to avoid problem when entire ballast was kept in memory working set.

Changelog

2.10.0

Grafana Mimir

  • [CHANGE] Store-gateway: skip verifying index header integrity upon loading. To enable verification set blocks_storage.bucket_store.index_header.verify_on_load: true. #5174
  • [CHANGE] Querier: change the default value of the experimental -querier.streaming-chunks-per-ingester-buffer-size flag to 256. #5203
  • [CHANGE] Querier: only initiate query requests to ingesters in the ACTIVE state in the ring. #5342
  • [CHANGE] Querier: renamed -querier.prefer-streaming-chunks to -querier.prefer-streaming-chunks-from-ingesters to enable streaming chunks from ingesters to queriers. #5182
  • [CHANGE] Querier: -query-frontend.cache-unaligned-requests has been moved from a global flag to a per-tenant override. #5312
  • [CHANGE] Ingester: removed cortex_ingester_shipper_dir_syncs_total and cortex_ingester_shipper_dir_sync_failures_total metrics. The former metric was not much useful, and the latter was never incremented. #5396
  • [CHANGE] Ingester: removed logging of errors related to hitting per-instance limits to reduce resource usage when ingesters are under pressure. #5585
  • [CHANGE] gRPC clients: use default connect timeout of 5s, and therefore enable default connect backoff max delay of 5s. #5562
  • [CHANGE] Ingester: the -validation.create-grace-period is now enforced in the ingester too, other than distributor and query-frontend. If you've configured -validation.create-grace-period then make sure the configuration is applied to ingesters too. #5712
  • [CHANGE] Distributor: the -validation.create-grace-period is now enforced for examplars too in the distributor. If an examplar has timestamp greater than "now + grace_period", then the exemplar will be dropped and the metric cortex_discarded_exemplars_total{reason="exemplar_too_far_in_future",user="..."} increased. #5761
  • [CHANGE] Query-frontend: the -validation.create-grace-period is now enforced in the query-frontend even when the configured value is 0. When the value is 0, the query end time range is truncated to the current real-world time. #5829
  • [CHANGE] Store-gateway: deprecated configuration parameters for index header under blocks-storage.bucket-store and use a new configurations in blocks-storage.bucket-store.index-header, deprecated configuration will be removed in Mimir 2.12. Configuration changes: #5726
    • -blocks-storage.bucket-store.index-header-lazy-loading-enabled is deprecated, use the new configuration -blocks-storage.bucket-store.index-header.lazy-loading-enabled
    • -blocks-storage.bucket-store.index-header-lazy-loading-idle-timeout is deprecated, use the new configuration -blocks-storage.bucket-store.index-header.lazy-loading-idle-timeout
    • -blocks-storage.bucket-store.index-header-lazy-loading-concurrency is deprecated, use the new configuration -blocks-storage.bucket-store.index-header.lazy-loading-concurrency
  • [CHANGE] Store-gateway: remove experimental fine-grained chunks caching. The following experimental configuration parameters have been removed -blocks-storage.bucket-store.chunks-cache.fine-grained-chunks-caching-enabled, -blocks-storage.bucket-store.fine-grained-chunks-caching-ranges-per-series. #5816 #5875
  • [CHANGE] Ingester: remove deprecated blocks-storage.tsdb.max-tsdb-opening-concurrency-on-startup. #5850
  • [FEATURE] Introduced -distributor.service-overload-status-code-on-rate-limit-enabled flag for configuring status code to 529 instead of 429 upon rate limit exhaustion. #5752
  • [FEATURE] Cardinality API: added a new count_method parameter which enables counting active series. #5136
  • [FEATURE] Query-frontend: added experimental support to cache cardinality, label names and label values query responses. The cache will be used when -query-frontend.cache-results is enabled, and -query-frontend.results-cache-ttl-for-cardinality-query or -query-frontend.results-cache-ttl-for-labels-query set to a value greater than 0. The following metrics have been added to track the query results cache hit ratio per request_type: #5212 #5235 #5426 #5524
    • cortex_frontend_query_result_cache_requests_total{request_type="query_range|cardinality|label_names_and_values"}
    • cortex_frontend_query_result_cache_hits_total{request_type="query_range|cardinality|label_names_and_values"}
  • [FEATURE] Added -<prefix>.s3.list-objects-version flag to configure the S3 list objects version. #5099
  • [FEATURE] Ingester: add optional CPU/memory utilization based read request limiting, considered experimental. Disabled by default, enable by configuring limits via both of the following flags: #5012 #5392 #5394 #5526 #5508 #5704
    • -ingester.read-path-cpu-utilization-limit
    • -ingester.read-path-memory-utilization-limit
    • -ingester.log-utilization-based-limiter-cpu-samples
  • [FEATURE] Ruler: support filtering results from rule status endpoint by file, rule_group and rule_name. #5291
  • [FEATURE] Ingester: add experimental support for creating tokens by using spread minimizing strategy. This can be enabled with -ingester.ring.token-generation-strategy: spread-minimizing and -ingester.ring.spread-minimizing-zones: <all available zones>. In that case -ingester.ring.tokens-file-path must be empty. #5308 #5324
  • [FEATURE] Storegateway: Persist sparse index-headers to disk and read from disk on index-header loads instead of reconstructing. #5465 #5651 #5726
  • [FEATURE] Ingester: add experimental CLI flag -ingester.ring.spread-minimizing-join-ring-in-order that allows an ingester to register tokens in the ring only after all previous ingesters (with ID lower than its own ID) have already been registered. #5541
  • [FEATURE] Ingester: add experimental support to compact the TSDB Head when the number of in-memory series is equal or greater than -blocks-storage.tsdb.early-head-compaction-min-in-memory-series, and the ingester estimates that the per-tenant TSDB Head compaction will reduce in-memory series by at least -blocks-storage.tsdb.early-head-compaction-min-estimated-series-reduction-percentage. #5371
  • [FEATURE] Ingester: add new metrics for tracking native histograms in active series: cortex_ingester_active_native_histogram_series, cortex_ingester_active_native_histogram_series_custom_tracker, cortex_ingester_active_native_histogram_buckets, cortex_ingester_active_native_histogram_buckets_custom_tracker. The first 2 are the subsets of the existing and unmodified cortex_ingester_active_series and cortex_ingester_active_series_custom_tracker respectively, only tracking native histogram series, and the last 2 are the equivalents for tracking the number of buckets in native histogram series. #5318
  • [FEATURE] Add experimental CLI flag -<prefix>.s3.native-aws-auth-enabled that allows to enable the default credentials provider chain of the AWS SDK. #5636
  • [FEATURE] Distributor: add experimental support for circuit breaking when writing to ingesters via -ingester.client.circuit-breaker.enabled, -ingester.client.circuit-breaker.failure-threshold, or -ingester.client.circuit-breaker.cooldown-period or their corresponding YAML. #5650
  • [FEATURE] The following features are no longer considered experimental. #5701 #5872
    • Ruler storage cache (-ruler-storage.cache.*)
    • Exclude ingesters running in specific zones (-ingester.ring.excluded-zones)
    • Cardinality-based query sharding (-query-frontend.query-sharding-target-series-per-shard)
    • Cardinality query result caching (-query-frontend.results-cache-ttl-for-cardinality-query)
    • Label names and values query result caching (-query-frontend.results-cache-ttl-for-labels-query)
    • Query expression size limit (-query-frontend.max-query-expression-size-bytes)
    • Peer discovery / tenant sharding for overrides exporters (-overrides-exporter.ring.enabled)
    • Configuring enabled metrics in overrides exporter (-overrides-exporter.enabled-metrics)
    • Per-tenant results cache TTL (-query-frontend.results-cache-ttl, -query-frontend.results-cache-ttl-for-out-of-order-time-window)
    • Shutdown delay (-shutdown-delay)
  • [FEATURE] Querier: add experimental CLI flag -tenant-federation.max-concurrent to adjust the max number of per-tenant queries that can be run at a time when executing a single multi-tenant query. #5874
  • [FEATURE] Alertmanager: add Microsoft Teams as a supported integration. #5840
  • [ENHANCEMENT] Overrides-exporter: Add new metrics for write path and alertmanager (max_global_metadata_per_user, max_global_metadata_per_metric, request_rate, request_burst_size, alertmanager_notification_rate_limit, alertmanager_max_dispatcher_aggregation_groups, alertmanager_max_alerts_count, alertmanager_max_alerts_size_bytes) and added flag -overrides-exporter.enabled-metrics to explicitly configure desired metrics, e.g. -overrides-exporter.enabled-metrics=request_rate,ingestion_rate. Default value for this flag is: ingestion_rate,ingestion_burst_size,max_global_series_per_user,max_global_series_per_metric,max_global_exemplars_per_user,max_fetched_chunks_per_query,max_fetched_series_per_query,ruler_max_rules_per_rule_group,ruler_max_rule_groups_per_tenant. #5376
  • [ENHANCEMENT] Cardinality API: when zone aware replication is enabled, the label values cardinality API can now tolerate single zone failure #5178
  • [ENHANCEMENT] Distributor: optimize sending requests to ingesters when incoming requests don't need to be modified. For now this feature can be disabled by setting -timeseries-unmarshal-caching-optimization-enabled=false. #5137
  • [ENHANCEMENT] Add advanced CLI flags to control gRPC client behaviour: #5161
    • -<prefix>.connect-timeout
    • -<prefix>.connect-backoff-base-delay
    • -<prefix>.connect-backoff-max-delay
    • -<prefix>.initial-stream-window-size
    • -<prefix>.initial-connection-window-size
  • [ENHANCEMENT] Query-frontend: added "response_size_bytes" field to "query stats" log. #5196
  • [ENHANCEMENT] Querier: refine error messages for per-tenant query limits, informing the user of the preferred strategy for not hitting the limit, in addition to how they may tweak the limit. #5059
  • [ENHANCEMENT] Distributor: optimize sending of requests to ingesters by reusing memory buffers for marshalling requests. This optimization can be enabled by setting -distributor.write-requests-buffer-pooling-enabled to true. #5195 #5805 #5830
  • [ENHANCEMENT] Querier: add experimental -querier.minimize-ingester-requests option to initially query only the minimum set of ingesters required to reach quorum. #5202 #5259 #5263
  • [ENHANCEMENT] Querier: improve error message when streaming chunks from ingesters to queriers and a query limit is reached. #5245
  • [ENHANCEMENT] Use new data structure for labels, to reduce memory consumption. #3555 #5731
  • [ENHANCEMENT] Update alpine base image to 3.18.2. #5276
  • [ENHANCEMENT] Ruler: add cortex_ruler_sync_rules_duration_seconds metric, tracking the time spent syncing all rule groups owned by the ruler instance. #5311
  • [ENHANCEMENT] Store-gateway: add experimental blocks-storage.bucket-store.index-header-lazy-loading-concurrency config option to limit the number of concurrent index-headers loads when lazy loading. #5313 #5605
  • [ENHANCEMENT] Ingester and querier: improve level of detail in traces emitted for queries that hit ingesters. #5315
  • [ENHANCEMENT] Querier: add cortex_querier_queries_rejected_total metric that counts the number of queries rejected due to hitting a limit (eg. max series per query or max chunks per query). #5316 #5440 #5450
  • [ENHANCEMENT] Querier: add experimental -querier.minimize-ingester-requests-hedging-delay option to initiate requests to further ingesters when request minimisation is enabled and not all initial requests have completed. #5368
  • [ENHANCEMENT] Clarify docs for -ingester.client.* flags to make it clear that these are used by both queriers and distributors. #5375
  • [ENHANCEMENT] Querier and store-gateway: add experimental support for streaming chunks from store-gateways to queriers while evaluating queries. This can be enabled with -querier.prefer-streaming-chunks-from-store-gateways=true. #5182
  • [ENHANCEMENT] Querier: enforce max-chunks-per-query limit earlier in query processing when streaming chunks from ingesters to queriers to avoid unnecessarily consuming resources for queries that will be aborted. #5369 #5447
  • [ENHANCEMENT] Ingester: added cortex_ingester_shipper_last_successful_upload_timestamp_seconds metric tracking the last successful TSDB block uploaded to the bucket (unix timestamp in seconds). #5396
  • [ENHANCEMENT] Ingester: add two metrics tracking resource utilization calculated by utilization based limiter: #5496
    • cortex_ingester_utilization_limiter_current_cpu_load: The current exponential weighted moving average of the ingester's CPU load
    • cortex_ingester_utilization_limiter_current_memory_usage_bytes: The current ingester memory utilization
  • [ENHANCEMENT] Ruler: added insight=true field to ruler's prometheus component for rule evaluation logs. #5510
  • [ENHANCEMENT] Distributor Ingester: add metrics to count the number of requests rejected for hitting per-instance limits, cortex_distributor_instance_rejected_requests_total and cortex_ingester_instance_rejected_requests_total respectively. #5551
  • [ENHANCEMENT] Distributor: add support for ingesting exponential histograms that are over the native histogram scale limit of 8 in OpenTelemetry format by downscaling them. #5532 #5607
  • [ENHANCEMENT] General: buffered logging: #5506
    • -log.buffered CLI flag enable buffered logging.
  • [ENHANCEMENT] Distributor: add more detailed information to traces generated while processing OTLP write requests. #5539
  • [ENHANCEMENT] Distributor: improve performance ingesting OTLP payloads. #5531 #5607 #5616
  • [ENHANCEMENT] Ingester: optimize label-values with matchers call when number of matched series is small. #5600
  • [ENHANCEMENT] Compactor: delete bucket-index, markers and debug files if there are no blocks left in the bucket index. This cleanup must be enabled by using -compactor.no-blocks-file-cleanup-enabled option. #5648
  • [ENHANCEMENT] Ingester: reduce memory usage of active series tracker. #5665
  • [ENHANCEMENT] Store-gateway: added -store-gateway.sharding-ring.auto-forget-enabled configuration parameter to control whether store-gateway auto-forget feature should be enabled or disabled (enabled by default). #5702
  • [ENHANCEMENT] Compactor: added per tenant block upload counters cortex_block_upload_api_blocks_total, cortex_block_upload_api_bytes_total, and cortex_block_upload_api_files_total. #5738
  • [ENHANCEMENT] Compactor: verify time range of compacted block(s) matches the time range of input blocks. #5760
  • [ENHANCEMENT] Querier: improved observability of calls to ingesters during queries. #5724
  • [ENHANCEMENT] Compactor: block backfilling logging is now more verbose. #5711
  • [ENHANCEMENT] Added support to rate limit application logs: #5764
    • -log.rate-limit-enabled
    • -log.rate-limit-logs-per-second
    • -log.rate-limit-logs-per-second-burst
  • [ENHANCEMENT] Ingester: added cortex_ingester_tsdb_head_min_timestamp_seconds and cortex_ingester_tsdb_head_max_timestamp_seconds metrics which return min and max time of all TSDB Heads open in an ingester. #5786 #5815
  • [ENHANCEMENT] Querier: cancel query requests to ingesters in a zone upon first error received from the zone, to reduce wasted effort spent computing results that won't be used #5764
  • [ENHANCEMENT] All: improve tracing of internal HTTP requests sent over httpgrpc. #5782
  • [ENHANCEMENT] Querier: add experimental per-query chunks limit based on an estimate of the number of chunks that will be sent from ingesters and store-gateways that is enforced earlier during query evaluation. This limit is disabled by default and can be configured with -querier.max-estimated-fetched-chunks-per-query-multiplier. #5765
  • [ENHANCEMENT] Ingester: add UI for listing tenants with TSDB on given ingester and viewing details of tenants's TSDB on given ingester. #5803 #5824
  • [ENHANCEMENT] Querier: improve observability of calls to store-gateways during queries. #5809
  • [ENHANCEMENT] Query-frontend: improve tracing of interactions with query-scheduler. #5818
  • [ENHANCEMENT] Query-scheduler: improve tracing of requests when request is rejected by query-scheduler. #5848
  • [ENHANCEMENT] Ingester: avoid logging some errors that could cause logging contention. #5494 #5581
  • [ENHANCEMENT] Store-gateway: wait for query gate after loading blocks. #5507
  • [ENHANCEMENT] Store-gateway: always include __name__ posting group in selection in order to reduce the number of object storage API calls. #5246
  • [ENHANCEMENT] Ingester: track active series by ref instead of hash/labels to reduce memory usage. #5134 #5193
  • [ENHANCEMENT] Go: updated to 1.21.1. #5955 #5960
  • [ENHANCEMENT] Alertmanager: updated to alertmanager 0.26.0. #5840
  • [BUGFIX] Ingester: Handle when previous ring state is leaving and the number of tokens has changed. #5204
  • [BUGFIX] Querier: fix issue where queries that use the timestamp() function fail with execution: attempted to read series at index 0 from stream, but the stream has already been exhausted if streaming chunks from ingesters to queriers is enabled. #5370
  • [BUGFIX] memberlist: bring back memberlist_client_kv_store_count metric that used to exist in Cortex, but got lost during dskit updates before Mimir 2.0. #5377
  • [BUGFIX] Querier: pass on HTTP 503 query response code. #5364
  • [BUGFIX] Store-gateway: Fix issue where stopping a store-gateway could cause all store-gateways to unload all blocks. #5464
  • [BUGFIX] Allocate ballast in smaller blocks to avoid problem when entire ballast was kept in memory working set. #5565
  • [BUGFIX] Querier: retry frontend result notification when an error is returned. #5591
  • [BUGFIX] Querier: fix issue where cortex_ingester_client_request_duration_seconds metric did not include streaming query requests that did not return any series. #5695
  • [BUGFIX] Ingester: fix ActiveSeries tracker double-counting series that have been deleted from the Head while still being active and then recreated again. #5678
  • [BUGFIX] Ingester: don't set "last update time" of TSDB into the future when opening TSDB. This could prevent detecting of idle TSDB for a long time, if sample in distant future was ingested. #5787
  • [BUGFIX] Store-gateway: fix bug when lazy index header could be closed prematurely even when still in use. #5795
  • [BUGFIX] Ruler: gracefully shut down rule evaluations. #5778
  • [BUGFIX] Querier: fix performance when ingesters stream samples. #5836
  • [BUGFIX] Ingester: fix spurious not found errors on label values API during head compaction. #5957
  • [BUGFIX] All: updated Minio object storage client from 7.0.62 to 7.0.63 to fix auto-detection of AWS GovCloud environments. #5905

Mixin

  • [CHANGE] Dashboards: show all workloads in selected namespace on "rollout progress" dashboard. #5113
  • [CHANGE] Dashboards: show the number of updated and ready pods for each workload in the "rollout progress" panel on the "rollout progress" dashboard. #5113
  • [CHANGE] Dashboards: removed "Query results cache misses" panel on the "Mimir / Queries" dashboard. #5423
  • [CHANGE] Dashboards: default to shared crosshair on all dashboards. #5489
  • [CHANGE] Dashboards: sort variable drop-down lists from A to Z, rather than Z to A. #5490
  • [CHANGE] Alerts: removed MimirProvisioningTooManyActiveSeries alert. You should configure -ingester.instance-limits.max-series and rely on MimirIngesterReachingSeriesLimit alert instead. #5593
  • [CHANGE] Alerts: removed MimirProvisioningTooManyWrites alert. The alerting threshold used in this alert was chosen arbitrarily and ingesters receiving an higher number of samples / sec don't necessarily have any issue. You should rely on SLOs metrics and alerts instead. #5706
  • [CHANGE] Alerts: don't raise MimirRequestErrors or MimirRequestLatency alert for the /debug/pprof endpoint. #5826
  • [ENHANCEMENT] Dashboards: adjust layout of "rollout progress" dashboard panels so that the "rollout progress" panel doesn't require scrolling. #5113
  • [ENHANCEMENT] Dashboards: show container name first in "pods count per version" panel on "rollout progress" dashboard. #5113
  • [ENHANCEMENT] Dashboards: show time spend waiting for turn when lazy loading index headers in the "index-header lazy load gate latency" panel on the "queries" dashboard. #5313
  • [ENHANCEMENT] Dashboards: split query results cache hit ratio by request type in "Query results cache hit ratio" panel on the "Mimir / Queries" dashboard. #5423
  • [ENHANCEMENT] Dashboards: add "rejected queries" panel to "queries" dashboard. #5429
  • [ENHANCEMENT] Dashboards: add native histogram active series and active buckets to "tenants" dashboard. #5543
  • [ENHANCEMENT] Dashboards: add panels to "Mimir / Writes" for requests rejected for per-instance limits. #5638
  • [ENHANCEMENT] Dashboards: rename "Blocks currently loaded" to "Blocks currently owned" in the "Mimir / Queries" dashboard. #5705
  • [ENHANCEMENT] Alerts: Add MimirIngestedDataTooFarInTheFuture warning alert that triggers when Mimir ingests sample with timestamp more than 1h in the future. #5822
  • [BUGFIX] Alerts: fix MimirIngesterRestarts to fire only when the ingester container is restarted, excluding the cases the pod is rescheduled. #5397
  • [BUGFIX] Dashboards: fix "unhealthy pods" panel on "rollout progress" dashboard showing only a number rather than the name of the workload and the number of unhealthy pods if only one workload has unhealthy pods. #5113 #5200
  • [BUGFIX] Alerts: fixed MimirIngesterHasNotShippedBlocks and MimirIngesterHasNotShippedBlocksSinceStart alerts. #5396
  • [BUGFIX] Alerts: Fix MimirGossipMembersMismatch to include admin-api and custom compactor pods. admin-api is a GEM component. #5641 #5797
  • [BUGFIX] Dashboards: fix autoscaling dashboard panels that could show multiple series for a single component. #5810

Jsonnet

  • [CHANGE] Removed _config.querier.concurrency configuration option and replaced it with _config.querier_max_concurrency and _config.ruler_querier_max_concurrency to allow to easily fine tune it for different querier deployments. #5322
  • [CHANGE] Change _config.multi_zone_ingester_max_unavailable to 50. #5327
  • [CHANGE] Change distributors rolling update strategy configuration: maxSurge and maxUnavailable are set to 15% and 0. #5714
  • [FEATURE] Alertmanager: Add horizontal pod autoscaler config, that can be enabled using autoscaling_alertmanager_enabled: true. #5194 #5249
  • [ENHANCEMENT] Enable the track_sizes feature for Memcached pods to help determine cache efficiency. #5209
  • [ENHANCEMENT] Add per-container map for environment variables. #5181
  • [ENHANCEMENT] Add PodDisruptionBudgets for compactor, continuous-test, distributor, overrides-exporter, querier, query-frontend, query-scheduler, rollout-operator, ruler, ruler-querier, ruler-query-frontend, ruler-query-scheduler, and all memcached workloads. #5098
  • [ENHANCEMENT] Ruler: configure the ruler storage cache when the metadata cache is enabled. #5326 #5334
  • [ENHANCEMENT] Shuffle-sharding: ingester shards in user-classes can now be configured to target different series and limit percentage utilization through _config.shuffle_sharding.target_series_per_ingester and _config.shuffle_sharding.target_utilization_percentage values. #5470
  • [ENHANCEMENT] Distributor: allow adjustment of the targeted CPU usage as a percentage of requested CPU. This can be adjusted with _config.autoscaling_distributor_cpu_target_utilization. #5525
  • [ENHANCEMENT] Ruler: add configuration option _config.ruler_remote_evaluation_max_query_response_size_bytes to easily set the maximum query response size allowed (in bytes). #5592
  • [ENHANCEMENT] Distributor: dynamically set GOMAXPROCS based on the CPU request. This should reduce distributor CPU utilization, assuming the CPU request is set to a value close to the actual utilization. #5588
  • [ENHANCEMENT] Querier: dynamically set GOMAXPROCS based on the CPU request. This should reduce noisy neighbour issues created by the querier, whose CPU utilization could eventually saturate the Kubernetes node if unbounded. #5646 #5658
  • [ENHANCEMENT] Allow to remove an entry from the configured environment variable for a given component, setting the environment value to null in the *_env_map objects (e.g. store_gateway_env_map+:: { 'field': null}). #5599
  • [ENHANCEMENT] Allow overriding the default number of replicas for etcd. #5589
  • [ENHANCEMENT] Memcached: reduce memory request for results, chunks and metadata caches. The requested memory is 5% greater than the configured memcached max cache size. #5661
  • [ENHANCEMENT] Autoscaling: Add the following configuration options to fine tune autoscaler target utilization: #5679 #5682 #5689
    • autoscaling_querier_target_utilization (defaults to 0.75)
    • autoscaling_mimir_read_target_utilization (defaults to 0.75)
    • autoscaling_ruler_querier_cpu_target_utilization (defaults to 1)
    • autoscaling_distributor_memory_target_utilization (defaults to 1)
    • autoscaling_ruler_cpu_target_utilization (defaults to 1)
    • autoscaling_query_frontend_cpu_target_utilization (defaults to 1)
    • autoscaling_ruler_query_frontend_cpu_target_utilization (defaults to 1)
    • autoscaling_alertmanager_cpu_target_utilization (defaults to 1)
  • [ENHANCEMENT] Gossip-ring: add appProtocol for istio compatibility. #5680
  • [ENHANCEMENT] Add _config.commonConfig to allow adding common configuration parameters for all Mimir components. #5703
  • [ENHANCEMENT] Update rollout-operator to v0.7.0. #5718
  • [ENHANCEMENT] Increase the default rollout speed for store-gateway when lazy loading is disabled. #5823
  • [BUGFIX] Fix compilation when index, chunks or metadata caches are disabled. #5710

Mimirtool

  • [ENHANCEMENT] Mimirtool uses paging to fetch all dashboards from Grafana when running mimirtool analyse grafana. This allows the tool to work correctly when running against Grafana instances with more than a 1000 dashboards. #5825
  • [ENHANCEMENT] Extract metric name from queries that have a __name__ matcher. #5911
  • [BUGFIX] Mimirtool no longer parses label names as metric names when handling templating variables that are populated using label_values(<label_name>) when running mimirtool analyse grafana. #5832
  • [BUGFIX] Fix panic when analyzing a grafana dashboard with multiline queries in templating variables. #5911

Query-tee

  • [CHANGE] Proxy Content-Type response header from backend. Previously Content-Type: text/plain; charset=utf-8 was returned on all requests. #5183
  • [CHANGE] Increase default value of -proxy.compare-skip-recent-samples to avoid racing with recording rule evaluation. #5561
  • [CHANGE] Add -backend.skip-tls-verify to optionally skip TLS verification on backends. #5656

Documentation

  • [CHANGE] Fix reference to get-started documentation directory. #5476
  • [CHANGE] Fix link to external OTLP/HTTP documentation.
  • [ENHANCEMENT] Improved MimirRulerTooManyFailedQueries runbook. #5586
  • [ENHANCEMENT] Improved "Recover accidentally deleted blocks" runbook. #5620
  • [ENHANCEMENT] Documented options and trade-offs to query label names and values. #5582
  • [ENHANCEMENT] Improved MimirRequestErrors runbook for alertmanager. #5694

Tools

  • [CHANGE] copyblocks: add support for S3 and the ability to copy between different object storage services. Due to this, the -source-service and -destination-service flags are now required and the -service flag has been removed. #5486
  • [FEATURE] undelete-block-gcs: Added new tool for undeleting blocks on GCS storage. #5610 #5855
  • [FEATURE] wal-reader: Added new tool for printing entries in TSDB WAL. #5780
  • [ENHANCEMENT] ulidtime: add -seconds flag to print timestamps as Unix timestamps. #5621
  • [ENHANCEMENT] ulidtime: exit with status code 1 if some ULIDs can't be parsed. #5621
  • [ENHANCEMENT] tsdb-index-toc: added index-header size estimates. #5652
  • [BUGFIX] Stop tools from panicking when -help flag is passed. #5412
  • [BUGFIX] Remove github.com/golang/glog command line flags from tools. #5413

All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.9.0...mimir-2.10.0

mimir-2.9.1

9 months ago

This release contains 2 PRs from 1 authors. Thank you!

Changelog

2.9.1

Grafana Mimir

  • [ENHANCEMENT] Update alpine base image to 3.18.3. #6021

All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.9.0...mimir-2.9.1

mimir-2.10.0-rc.2

9 months ago

This release contains 5 PRs from 3 authors. Thank you!

Changelog

2.10.0-rc.2

Grafana Mimir

  • [ENHANCEMENT] Go: updated to 1.21.1. #5955 #5960
  • [BUGFIX] Ingester: fix spurious not found errors on label values API during head compaction. #5957

All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.10.0-rc.1...mimir-2.10.0-rc.2

mimir-2.10.0-rc.1

9 months ago

This release contains 12 PRs from 4 authors. Thank you!

Changelog

2.10.0-rc.1

Grafana Mimir

  • [FEATURE] The following features are no longer considered experimental. #5872
    • Ruler storage cache (-ruler-storage.cache.*)
    • Exclude ingesters running in specific zones (-ingester.ring.excluded-zones)
    • Cardinality-based query sharding (-query-frontend.query-sharding-target-series-per-shard)
    • Cardinality query result caching (-query-frontend.results-cache-ttl-for-cardinality-query)
    • Label names and values query result caching (-query-frontend.results-cache-ttl-for-labels-query)
    • Query expression size limit (-query-frontend.max-query-expression-size-bytes)
    • Peer discovery / tenant sharding for overrides exporters (-overrides-exporter.ring.enabled)
    • Configuring enabled metrics in overrides exporter (-overrides-exporter.enabled-metrics)
    • Per-tenant results cache TTL (-query-frontend.results-cache-ttl, -query-frontend.results-cache-ttl-for-out-of-order-time-window)
  • [FEATURE] Querier: add experimental CLI flag -tenant-federation.max-concurrent to adjust the max number of per-tenant queries that can be run at a time when executing a single multi-tenant query. #5874
  • [FEATURE] Alertmanager: Add Microsoft Teams as a supported integration. #5840
  • [ENHANCEMENT] Alertmanager: update to alertmanager 0.26.0. #5840
  • [BUGFIX] Store-gateway: fix chunks corruption bug introduced in rc.0. #5875
  • [BUGFIX] Update Minio object storage client from 7.0.62 to 7.0.63 to fix auto-detection of AWS GovCloud environments. #5905

Mimirtool

  • [ENHANCEMENT] Mimirtool uses paging to fetch all dashboards from Grafana when running mimirtool analyse grafana. This allows the tool to work correctly when running against Grafana instances with more than a 1000 dashboards. #5825
  • [ENHANCEMENT] Extract metric name from queries that have a __name__ matcher. #5911
  • [BUGFIX] Mimirtool no longer parses label names as metric names when handling templating variables that are populated using label_values(<label_name>) when running mimirtool analyse grafana. #5832
  • [BUGFIX] Fix panic when analyzing a grafana dashboard with multiline queries in templating variables. #5911

All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.10.0-rc.0...mimir-2.10.0-rc.1

mimir-2.10.0-rc.0

9 months ago

This release contains 434 PRs from 54 authors, including new contributors Aaron Sanders, Alexander Proschek, Aljoscha Pörtner, balazs92117, Francois Gouteroux, Franco Posa, Heather Yuan, jingyang, kendrickclark, m4r1u2, Milan Plžík, Samir Teymurov, Sven Haardiek, Thomas Schaaf, Tiago Posse. Thank you!

Grafana Mimir version 2.10.0-rc.0 release notes

Pending, draft version can be seen at: https://github.com/grafana/mimir/pull/5873

Changelog

2.10.0-rc.0

Grafana Mimir

  • [CHANGE] Update Go version to 1.21.0. #5734
  • [CHANGE] Store-gateway: skip verifying index header integrity upon loading. To enable verification set blocks_storage.bucket_store.index_header.verify_on_load: true. #5174
  • [CHANGE] Querier: change the default value of the experimental -querier.streaming-chunks-per-ingester-buffer-size flag to 256. #5203
  • [CHANGE] Querier: only initiate query requests to ingesters in the ACTIVE state in the ring. #5342
  • [CHANGE] Querier: Renamed -querier.prefer-streaming-chunks to -querier.prefer-streaming-chunks-from-ingesters to enable streaming chunks from ingesters to queriers. #5182
  • [CHANGE] Querier: -query-frontend.cache-unaligned-requests has been moved from a global flag to a per-tenant override. #5312
  • [CHANGE] Ingester: removed cortex_ingester_shipper_dir_syncs_total and cortex_ingester_shipper_dir_sync_failures_total metrics. The former metric was not much useful, and the latter was never incremented. #5396
  • [CHANGE] Ingester: Do not log errors related to hitting per-instance limits to reduce resource usage when ingesters are under pressure. #5585
  • [CHANGE] gRPC clients: use default connect timeout of 5s, and therefore enable default connect backoff max delay of 5s. #5562
  • [CHANGE] The -shutdown-delay flag is no longer experimental. #5701
  • [CHANGE] The -validation.create-grace-period is now enforced in the ingester too, other than distributor and query-frontend. If you've configured -validation.create-grace-period then make sure the configuration is applied to ingesters too. #5712
  • [CHANGE] The -validation.create-grace-period is now enforced for examplars too in the distributor. If an examplar has timestamp greater than "now + grace_period", then the exemplar will be dropped and the metric cortex_discarded_exemplars_total{reason="exemplar_too_far_in_future",user="..."} increased. #5761
  • [CHANGE] The -validation.create-grace-period is now enforced in the query-frontend even when the configured value is 0. When the value is 0, the query end time range is truncated to the current real-world time. #5829
  • [CHANGE] Store-gateway: deprecate configuration parameters for index header under blocks-storage.bucket-store and use a new configurations in blocks-storage.bucket-store.index-header, deprecated configuration will be removed in Mimir 2.12. Configuration changes: #5726
    • -blocks-storage.bucket-store.index-header-lazy-loading-enabled is deprecated, use the new configuration -blocks-storage.bucket-store.index-header.lazy-loading-enabled
    • -blocks-storage.bucket-store.index-header-lazy-loading-idle-timeout is deprecated, use the new configuration -blocks-storage.bucket-store.index-header.lazy-loading-idle-timeout
    • -blocks-storage.bucket-store.index-header-lazy-loading-concurrency is deprecated, use the new configuration -blocks-storage.bucket-store.index-header.lazy-loading-concurrency
  • [CHANGE] Store-gateway: remove experimental fine-grained chunks caching. The following experimental configuration parameters have been removed -blocks-storage.bucket-store.chunks-cache.fine-grained-chunks-caching-enabled, -blocks-storage.bucket-store.fine-grained-chunks-caching-ranges-per-series. #5816
  • [CHANGE] Ingester: remove deprecated blocks-storage.tsdb.max-tsdb-opening-concurrency-on-startup. #5850
  • [FEATURE] Introduced distributor.service_overload_status_code_on_rate_limit_enabled flag for configuring status code to 529 instead of 429 upon rate limit exhaustion. #5752
  • [FEATURE] Cardinality API: Add a new count_method parameter which enables counting active series #5136
  • [FEATURE] Query-frontend: added experimental support to cache cardinality, label names and label values query responses. The cache will be used when -query-frontend.cache-results is enabled, and -query-frontend.results-cache-ttl-for-cardinality-query or -query-frontend.results-cache-ttl-for-labels-query set to a value greater than 0. The following metrics have been added to track the query results cache hit ratio per request_type: #5212 #5235 #5426 #5524
    • cortex_frontend_query_result_cache_requests_total{request_type="query_range|cardinality|label_names_and_values"}
    • cortex_frontend_query_result_cache_hits_total{request_type="query_range|cardinality|label_names_and_values"}
  • [FEATURE] Added -<prefix>.s3.list-objects-version flag to configure the S3 list objects version. #5099
  • [FEATURE] Ingester: Add optional CPU/memory utilization based read request limiting, considered experimental. Disabled by default, enable by configuring limits via both of the following flags: #5012 #5392 #5394 #5526 #5508 #5704
    • -ingester.read-path-cpu-utilization-limit
    • -ingester.read-path-memory-utilization-limit
    • -ingester.log-utilization-based-limiter-cpu-samples
  • [FEATURE] Ruler: Support filtering results from rule status endpoint by file, rule_group and rule_name. #5291
  • [FEATURE] Ingester: add experimental support for creating tokens by using spread minimizing strategy. This can be enabled with -ingester.ring.token-generation-strategy: spread-minimizing and -ingester.ring.spread-minimizing-zones: <all available zones>. In that case -ingester.ring.tokens-file-path must be empty. #5308 #5324
  • [FEATURE] Storegateway: Persist sparse index-headers to disk and read from disk on index-header loads instead of reconstructing. #5465 #5651 #5726
  • [FEATURE] Ingester: add experimental CLI flag -ingester.ring.spread-minimizing-join-ring-in-order that allows an ingester to register tokens in the ring only after all previous ingesters (with ID lower than its own ID) have already been registered. #5541
  • [FEATURE] Ingester: add experimental support to compact the TSDB Head when the number of in-memory series is equal or greater than -blocks-storage.tsdb.early-head-compaction-min-in-memory-series, and the ingester estimates that the per-tenant TSDB Head compaction will reduce in-memory series by at least -blocks-storage.tsdb.early-head-compaction-min-estimated-series-reduction-percentage. #5371
  • [FEATURE] Ingester: add new metrics for tracking native histograms in active series: cortex_ingester_active_native_histogram_series, cortex_ingester_active_native_histogram_series_custom_tracker, cortex_ingester_active_native_histogram_buckets, cortex_ingester_active_native_histogram_buckets_custom_tracker. The first 2 are the subsets of the existing and unmodified cortex_ingester_active_series and cortex_ingester_active_series_custom_tracker respectively, only tracking native histogram series, and the last 2 are the equivalents for tracking the number of buckets in native histogram series. #5318
  • [FEATURE] Add experimental CLI flag -<prefix>.s3.native-aws-auth-enabled that allows to enable the default credentials provider chain of the AWS SDK. #5636
  • [FEATURE] Distributor: add experimental support for circuit breaking when writing to ingesters via -ingester.client.circuit-breaker.enabled, -ingester.client.circuit-breaker.failure-threshold, or -ingester.client.circuit-breaker.cooldown-period or their corresponding YAML. #5650
  • [ENHANCEMENT] Overrides-exporter: Add new metrics for write path and alertmanager (max_global_metadata_per_user, max_global_metadata_per_metric, request_rate, request_burst_size, alertmanager_notification_rate_limit, alertmanager_max_dispatcher_aggregation_groups, alertmanager_max_alerts_count, alertmanager_max_alerts_size_bytes) and added flag -overrides-exporter.enabled-metrics to explicitly configure desired metrics, e.g. -overrides-exporter.enabled-metrics=request_rate,ingestion_rate. Default value for this flag is: ingestion_rate,ingestion_burst_size,max_global_series_per_user,max_global_series_per_metric,max_global_exemplars_per_user,max_fetched_chunks_per_query,max_fetched_series_per_query,ruler_max_rules_per_rule_group,ruler_max_rule_groups_per_tenant. #5376
  • [ENHANCEMENT] Cardinality API: When zone aware replication is enabled, the label values cardinality API can now tolerate single zone failure #5178
  • [ENHANCEMENT] Distributor: optimize sending requests to ingesters when incoming requests don't need to be modified. For now this feature can be disabled by setting -timeseries-unmarshal-caching-optimization-enabled=false. #5137
  • [ENHANCEMENT] Add advanced CLI flags to control gRPC client behaviour: #5161
    • -<prefix>.connect-timeout
    • -<prefix>.connect-backoff-base-delay
    • -<prefix>.connect-backoff-max-delay
    • -<prefix>.initial-stream-window-size
    • -<prefix>.initial-connection-window-size
  • [ENHANCEMENT] Query-frontend: added "response_size_bytes" field to "query stats" log. #5196
  • [ENHANCEMENT] Querier: Refine error messages for per-tenant query limits, informing the user of the preferred strategy for not hitting the limit, in addition to how they may tweak the limit. #5059
  • [ENHANCEMENT] Distributor: optimize sending of requests to ingesters by reusing memory buffers for marshalling requests. This optimization can be enabled by setting -distributor.write-requests-buffer-pooling-enabled to true. #5195 #5805 #5830
  • [ENHANCEMENT] Querier: add experimental -querier.minimize-ingester-requests option to initially query only the minimum set of ingesters required to reach quorum. #5202 #5259 #5263
  • [ENHANCEMENT] Querier: improve error message when streaming chunks from ingesters to queriers and a query limit is reached. #5245
  • [ENHANCEMENT] Use new data structure for labels, to reduce memory consumption. #3555 #5731
  • [ENHANCEMENT] Update alpine base image to 3.18.2. #5276
  • [ENHANCEMENT] Ruler: add cortex_ruler_sync_rules_duration_seconds metric, tracking the time spent syncing all rule groups owned by the ruler instance. #5311
  • [ENHANCEMENT] Store-gateway: add experimental blocks-storage.bucket-store.index-header-lazy-loading-concurrency config option to limit the number of concurrent index-headers loads when lazy loading. #5313 #5605
  • [ENHANCEMENT] Ingester and querier: improve level of detail in traces emitted for queries that hit ingesters. #5315
  • [ENHANCEMENT] Querier: add cortex_querier_queries_rejected_total metric that counts the number of queries rejected due to hitting a limit (eg. max series per query or max chunks per query). #5316 #5440 #5450
  • [ENHANCEMENT] Querier: add experimental -querier.minimize-ingester-requests-hedging-delay option to initiate requests to further ingesters when request minimisation is enabled and not all initial requests have completed. #5368
  • [ENHANCEMENT] Clarify docs for -ingester.client.* flags to make it clear that these are used by both queriers and distributors. #5375
  • [ENHANCEMENT] Querier and store-gateway: add experimental support for streaming chunks from store-gateways to queriers while evaluating queries. This can be enabled with -querier.prefer-streaming-chunks-from-store-gateways=true. #5182
  • [ENHANCEMENT] Querier: enforce max-chunks-per-query limit earlier in query processing when streaming chunks from ingesters to queriers to avoid unnecessarily consuming resources for queries that will be aborted. #5369 #5447
  • [ENHANCEMENT] Ingester: added cortex_ingester_shipper_last_successful_upload_timestamp_seconds metric tracking the last successful TSDB block uploaded to the bucket (unix timestamp in seconds). #5396
  • [ENHANCEMENT] Ingester: Add two metrics tracking resource utilization calculated by utilization based limiter: #5496
    • cortex_ingester_utilization_limiter_current_cpu_load: The current exponential weighted moving average of the ingester's CPU load
    • cortex_ingester_utilization_limiter_current_memory_usage_bytes: The current ingester memory utilization
  • [ENHANCEMENT] Ruler: added insight=true field to ruler's prometheus component for rule evaluation logs. #5510
  • [ENHANCEMENT] Distributor Ingester: Add metrics to count the number of requests rejected for hitting per-instance limits, cortex_distributor_instance_rejected_requests_total and cortex_ingester_instance_rejected_requests_total respectively. #5551
  • [ENHANCEMENT] Distributor: add support for ingesting exponential histograms that are over the native histogram scale limit of 8 in OpenTelemetry format by downscaling them. #5532 #5607
  • [ENHANCEMENT] General: buffered logging: #5506
    • -log.buffered: Enable buffered logging
  • [ENHANCEMENT] Distributor: add more detailed information to traces generated while processing OTLP write requests. #5539
  • [ENHANCEMENT] Distributor: improve performance ingesting OTLP payloads. #5531 #5607 #5616
  • [ENHANCEMENT] Ingester: optimize label-values with matchers call when number of matched series is small. #5600
  • [ENHANCEMENT] Compactor: Delete bucket-index, markers and debug files if there are no blocks left in the bucket index. This cleanup must be enabled by using -compactor.no-blocks-file-cleanup-enabled option. #5648
  • [ENHANCEMENT] Ingester: reduce memory usage of active series tracker. #5665
  • [ENHANCEMENT] Store-gateway: added -store-gateway.sharding-ring.auto-forget-enabled configuration parameter to control whether store-gateway auto-forget feature should be enabled or disabled (enabled by default). #5702
  • [ENHANCEMENT] Compactor: added per tenant block upload counters cortex_block_upload_api_blocks_total, cortex_block_upload_api_bytes_total, and cortex_block_upload_api_files_total. #5738
  • [ENHANCEMENT] Compactor: Verify time range of compacted block(s) matches the time range of input blocks. #5760
  • [ENHANCEMENT] Querier: improved observability of calls to ingesters during queries. #5724
  • [ENHANCEMENT] Compactor: block backfilling logging is now more verbose. #5711
  • [ENHANCEMENT] Added support to rate limit application logs: #5764
    • -log.rate-limit-enabled
    • -log.rate-limit-logs-per-second
    • -log.rate-limit-logs-per-second-burst
  • [ENHANCEMENT] Added cortex_ingester_tsdb_head_min_timestamp_seconds and cortex_ingester_tsdb_head_max_timestamp_seconds metrics which return min and max time of all TSDB Heads open in an ingester. #5786 #5815
  • [ENHANCEMENT] Querier: cancel query requests to ingesters in a zone upon first error received from the zone, to reduce wasted effort spent computing results that won't be used #5764
  • [ENHANCEMENT] Improve tracing of internal HTTP requests sent over httpgrpc #5782
  • [ENHANCEMENT] Querier: add experimental per-query chunks limit based on an estimate of the number of chunks that will be sent from ingesters and store-gateways that is enforced earlier during query evaluation. This limit is disabled by default and can be configured with -querier.max-estimated-fetched-chunks-per-query-multiplier. #5765
  • [ENHANCEMENT] Ingester: add UI for listing tenants with TSDB on given ingester and viewing details of tenants's TSDB on given ingester. #5803 #5824
  • [ENHANCEMENT] Querier: improve observability of calls to store-gateways during queries. #5809
  • [ENHANCEMENT] Query-frontend: improve tracing of interactions with query-scheduler. #5818
  • [ENHANCEMENT] Query-scheduler: improve tracing of requests when request is rejected by query-scheduler. #5848
  • [ENHANCEMENT] Ingester: avoid logging some errors that could cause logging contention. #5494 #5581
  • [ENHANCEMENT] Store-gateway: wait for query gate after loading blocks. #5507
  • [ENHANCEMENT] Store-gateway: always include __name__ posting group in selection in order to reduce the number of object storage API calls. #5246
  • [ENHANCEMENT] Ingester: track active series by ref instead of hash/labels to reduce memory usage. #5134 #5193
  • [BUGFIX] Ingester: Handle when previous ring state is leaving and the number of tokens has changed. #5204
  • [BUGFIX] Querier: fix issue where queries that use the timestamp() function fail with execution: attempted to read series at index 0 from stream, but the stream has already been exhausted if streaming chunks from ingesters to queriers is enabled. #5370
  • [BUGFIX] memberlist: bring back memberlist_client_kv_store_count metric that used to exist in Cortex, but got lost during dskit updates before Mimir 2.0. #5377
  • [BUGFIX] Querier: Pass on HTTP 503 query response code. #5364
  • [BUGFIX] Store-gateway: Fix issue where stopping a store-gateway could cause all store-gateways to unload all blocks. #5464
  • [BUGFIX] Allocate ballast in smaller blocks to avoid problem when entire ballast was kept in memory working set. #5565
  • [BUGFIX] Querier: Retry frontend result notification when an error is returned. #5591
  • [BUGFIX] Querier: fix issue where cortex_ingester_client_request_duration_seconds metric did not include streaming query requests that did not return any series. #5695
  • [BUGFIX] Ingester: Fix ActiveSeries tracker double-counting series that have been deleted from the Head while still being active and then recreated again. #5678
  • [BUGFIX] Ingester: Don't set "last update time" of TSDB into the future when opening TSDB. This could prevent detecting of idle TSDB for a long time, if sample in distant future was ingested. #5787
  • [BUGFIX] Store-gateway: fix bug when lazy index header could be closed prematurely even when still in use. #5795
  • [BUGFIX] Ruler: gracefully shut down rule evaluations. #5778
  • [BUGFIX] Querier: fix performance when ingesters stream samples. #5836

Mixin

  • [CHANGE] Dashboards: show all workloads in selected namespace on "rollout progress" dashboard. #5113
  • [CHANGE] Dashboards: show the number of updated and ready pods for each workload in the "rollout progress" panel on the "rollout progress" dashboard. #5113
  • [CHANGE] Dashboards: removed "Query results cache misses" panel on the "Mimir / Queries" dashboard. #5423
  • [CHANGE] Dashboards: default to shared crosshair on all dashboards. #5489
  • [CHANGE] Dashboards: sort variable drop-down lists from A to Z, rather than Z to A. #5490
  • [CHANGE] Alerts: removed MimirProvisioningTooManyActiveSeries alert. You should configure -ingester.instance-limits.max-series and rely on MimirIngesterReachingSeriesLimit alert instead. #5593
  • [CHANGE] Alerts: removed MimirProvisioningTooManyWrites alert. The alerting threshold used in this alert was chosen arbitrarily and ingesters receiving an higher number of samples / sec don't necessarily have any issue. You should rely on SLOs metrics and alerts instead. #5706
  • [CHANGE] Alerts: don't raise MimirRequestErrors or MimirRequestLatency alert for the /debug/pprof endpoint. #5826
  • [ENHANCEMENT] Dashboards: adjust layout of "rollout progress" dashboard panels so that the "rollout progress" panel doesn't require scrolling. #5113
  • [ENHANCEMENT] Dashboards: show container name first in "pods count per version" panel on "rollout progress" dashboard. #5113
  • [ENHANCEMENT] Dashboards: show time spend waiting for turn when lazy loading index headers in the "index-header lazy load gate latency" panel on the "queries" dashboard. #5313
  • [ENHANCEMENT] Dashboards: split query results cache hit ratio by request type in "Query results cache hit ratio" panel on the "Mimir / Queries" dashboard. #5423
  • [ENHANCEMENT] Dashboards: add "rejected queries" panel to "queries" dashboard. #5429
  • [ENHANCEMENT] Dashboards: add native histogram active series and active buckets to "tenants" dashboard. #5543
  • [ENHANCEMENT] Dashboards: add panels to "Mimir / Writes" for requests rejected for per-instance limits. #5638
  • [ENHANCEMENT] Dashboards: rename "Blocks currently loaded" to "Blocks currently owned" in the "Mimir / Queries" dashboard. #5705
  • [ENHANCEMENT] Alerts: Add MimirIngestedDataTooFarInTheFuture warning alert that triggers when Mimir ingests sample with timestamp more than 1h in the future. #5822
  • [BUGFIX] Alerts: fix MimirIngesterRestarts to fire only when the ingester container is restarted, excluding the cases the pod is rescheduled. #5397
  • [BUGFIX] Dashboards: fix "unhealthy pods" panel on "rollout progress" dashboard showing only a number rather than the name of the workload and the number of unhealthy pods if only one workload has unhealthy pods. #5113 #5200
  • [BUGFIX] Alerts: fixed MimirIngesterHasNotShippedBlocks and MimirIngesterHasNotShippedBlocksSinceStart alerts. #5396
  • [BUGFIX] Alerts: Fix MimirGossipMembersMismatch to include admin-api and custom compactor pods. admin-api is a GEM component. #5641 #5797
  • [BUGFIX] Dashboards: fix autoscaling dashboard panels that could show multiple series for a single component. #5810

Jsonnet

  • [CHANGE] Removed _config.querier.concurrency configuration option and replaced it with _config.querier_max_concurrency and _config.ruler_querier_max_concurrency to allow to easily fine tune it for different querier deployments. #5322
  • [CHANGE] Change _config.multi_zone_ingester_max_unavailable to 50. #5327
  • [CHANGE] Change distributors rolling update strategy configuration: maxSurge and maxUnavailable are set to 15% and 0. #5714
  • [FEATURE] Alertmanager: Add horizontal pod autoscaler config, that can be enabled using autoscaling_alertmanager_enabled: true. #5194 #5249
  • [ENHANCEMENT] Enable the track_sizes feature for Memcached pods to help determine cache efficiency. #5209
  • [ENHANCEMENT] Add per-container map for environment variables. #5181
  • [ENHANCEMENT] Add PodDisruptionBudgets for compactor, continuous-test, distributor, overrides-exporter, querier, query-frontend, query-scheduler, rollout-operator, ruler, ruler-querier, ruler-query-frontend, ruler-query-scheduler, and all memcached workloads. #5098
  • [ENHANCEMENT] Ruler: configure the ruler storage cache when the metadata cache is enabled. #5326 #5334
  • [ENHANCEMENT] Shuffle-sharding: ingester shards in user-classes can now be configured to target different series and limit percentage utilization through _config.shuffle_sharding.target_series_per_ingester and _config.shuffle_sharding.target_utilization_percentage values. #5470
  • [ENHANCEMENT] Distributor: allow adjustment of the targeted CPU usage as a percentage of requested CPU. This can be adjusted with _config.autoscaling_distributor_cpu_target_utilization. #5525
  • [ENHANCEMENT] Ruler: add configuration option _config.ruler_remote_evaluation_max_query_response_size_bytes to easily set the maximum query response size allowed (in bytes). #5592
  • [ENHANCEMENT] Distributor: dynamically set GOMAXPROCS based on the CPU request. This should reduce distributor CPU utilization, assuming the CPU request is set to a value close to the actual utilization. #5588
  • [ENHANCEMENT] Querier: dynamically set GOMAXPROCS based on the CPU request. This should reduce noisy neighbour issues created by the querier, whose CPU utilization could eventually saturate the Kubernetes node if unbounded. #5646 #5658
  • [ENHANCEMENT] Allow to remove an entry from the configured environment variable for a given component, setting the environment value to null in the *_env_map objects (e.g. store_gateway_env_map+:: { 'field': null}). #5599
  • [ENHANCEMENT] Allow overriding the default number of replicas for etcd. #5589
  • [ENHANCEMENT] Memcached: reduce memory request for results, chunks and metadata caches. The requested memory is 5% greater than the configured memcached max cache size. #5661
  • [ENHANCEMENT] Autoscaling: Add the following configuration options to fine tune autoscaler target utilization: #5679 #5682 #5689
    • autoscaling_querier_target_utilization (defaults to 0.75)
    • autoscaling_mimir_read_target_utilization (defaults to 0.75)
    • autoscaling_ruler_querier_cpu_target_utilization (defaults to 1)
    • autoscaling_distributor_memory_target_utilization (defaults to 1)
    • autoscaling_ruler_cpu_target_utilization (defaults to 1)
    • autoscaling_query_frontend_cpu_target_utilization (defaults to 1)
    • autoscaling_ruler_query_frontend_cpu_target_utilization (defaults to 1)
    • autoscaling_alertmanager_cpu_target_utilization (defaults to 1)
  • [ENHANCEMENT] Gossip-ring: add appProtocol for istio compatibility. #5680
  • [ENHANCEMENT] Add _config.commonConfig to allow adding common configuration parameters for all Mimir components. #5703
  • [ENHANCEMENT] Update rollout-operator to v0.7.0. #5718
  • [ENHANCEMENT] Increase the default rollout speed for store-gateway when lazy loading is disabled. #5823
  • [BUGFIX] Fix compilation when index, chunks or metadata caches are disabled. #5710

Query-tee

  • [CHANGE] Proxy Content-Type response header from backend. Previously Content-Type: text/plain; charset=utf-8 was returned on all requests. #5183
  • [CHANGE] Increase default value of -proxy.compare-skip-recent-samples to avoid racing with recording rule evaluation. #5561
  • [CHANGE] Add -backend.skip-tls-verify to optionally skip TLS verification on backends. #5656

Documentation

  • [CHANGE] Fix reference to get-started documentation directory. #5476
  • [CHANGE] Fix link to external OTLP/HTTP documentation.
  • [ENHANCEMENT] Improved MimirRulerTooManyFailedQueries runbook. #5586
  • [ENHANCEMENT] Improved "Recover accidentally deleted blocks" runbook. #5620
  • [ENHANCEMENT] Documented options and trade-offs to query label names and values. #5582
  • [ENHANCEMENT] Improved MimirRequestErrors runbook for alertmanager. #5694

Tools

  • [CHANGE] copyblocks: add support for S3 and the ability to copy between different object storage services. Due to this, the -source-service and -destination-service flags are now required and the -service flag has been removed. #5486
  • [FEATURE] undelete-block-gcs: Added new tool for undeleting blocks on GCS storage. #5610 #5855
  • [FEATURE] wal-reader: Added new tool for printing entries in TSDB WAL. #5780
  • [ENHANCEMENT] ulidtime: add -seconds flag to print timestamps as Unix timestamps. #5621
  • [ENHANCEMENT] ulidtime: exit with status code 1 if some ULIDs can't be parsed. #5621
  • [ENHANCEMENT] tsdb-index-toc: added index-header size estimates. #5652
  • [BUGFIX] Stop tools from panicking when -help flag is passed. #5412
  • [BUGFIX] Remove github.com/golang/glog command line flags from tools. #5413

All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.9.0...mimir-2.10.0-rc.0

mimir-2.9.0

11 months ago

This release contains 252 PRs from 46 authors, including new contributors Alex R, Alexander Soelberg Heidarsson, Alexander Weaver, Benjamin Lazarecki, Dhanu Saputra, Dominik Süß, Fiona Liao, Jonathan Halterman, Kristian Bremberg, MattiasSegerdahl, Salva Corts, Stephanie Closson, willychrisza. Thank you!

Grafana Mimir version 2.9.0 release notes

Grafana Labs is excited to announce version 2.9 of Grafana Mimir.

The highlights that follow include the top features, enhancements, and bugfixes in this release. For the complete list of changes, see the changelog.

Features and enhancements

  • Reduced store-gateway memory utilization on fetching series from long-term storage For queries that include broad label matchers (e.g. datacenter="dc1"), Mimir 2.9 will fetch a reduced volume of index data, which leads to a significant reduction in memory allocations in the store-gateway.
  • Reduced CPU utilisation for some shuffle sharding scenarios Mimir queriers will now use significantly less CPU in cases where shuffle sharding is enabled for tenants with a shard size that's large but lower than the total number of ingesters.
  • Reduced object storage API calls in compactors and rulers Mimir 2.9 comes with optimizations that will reduce the amount of times compactors and rulers need to access rules stored in object storage.
    • This release adds experimental support for a ruler storage cache. This cache should reduce the number of "list objects" API calls issued to the object storage when there are 2+ ruler replicas running in a Mimir cluster. The cache can be configured by setting the -ruler-storage.cache.* CLI flags or their respective YAML config options.
    • We also introduced a new feature to trigger a synchronization of tenant's rule groups as soon as changes to the rule configuration are made via API. This synchronization is in addition of the periodic syncing done every -ruler.poll-interval, which has then been relaxed from every 1m to every 10m. The new behaviour is enabled globally by default but can be disabled with -ruler.sync-rules-on-changes-enabled=false or tuned at a per-tenant level.
  • Experimental support for streaming chunks from ingester to querier This is expected to greatly reduce querier memory consumption when evaluating queries that select a large number of series, because chunks streamed from the querier can now be read into memory as needed.

Helm chart improvements

The Grafana Mimir and Grafana Enterprise Metrics Helm chart is now released independently. See the Grafana Mimir Helm chart documentation.

Important changes

In Grafana Mimir 2.9 we have removed the following previously deprecated or experimental metrics:

  • cortex_bucket_store_chunk_pool_requested_bytes_total
  • cortex_bucket_store_chunk_pool_returned_bytes_total

The following configuration options are deprecated and will be removed in Grafana Mimir 2.11:

  • The CLI flag -querier.query-ingesters-within. This configuration is moved to per-tenant overrides.
  • The CLI flag -blocks-storage.bucket-store.bucket-index.enabled.
  • The CLI flags -blocks-storage.bucket-store.chunk-pool-min-bucket-size-bytes, -blocks-storage.bucket-store.chunk-pool-max-bucket-size-bytes and -blocks-storage.bucket-store.max-chunk-pool-bytes.
  • The CLI flags -querier.iterators and -query.batch-iterators.

The following configuration options that were deprecated in 2.7 are removed:

  • The CLI flag -blocks-storage.bucket-store.chunks-cache.subrange-size. A fixed value of 16000 is now always used.
  • The CLI flag -blocks-storage.bucket-store.consistency-delay.
  • The CLI flag -compactor.consistency-delay.
  • The CLI flag -ingester.ring.readiness-check-ring-health.

The following experimental options and features are now stable:

  • The CLI flag -query-frontend.query-sharding-max-regexp-size-bytes.
  • The CLI flag -query-scheduler.max-used-instances.
  • The CLI flags -(alertmanager|blocks|ruler)-storage.storage-prefix.
  • The CLI flag -compactor.first-level-compaction-wait-period.
  • The CLI flags -usage-stats.enabled and -usage-stats.installation-mode.
  • The CLI flag -query-frontend.query-sharding-target-series-per-shard.

The following configuration option defaults were changed:

  • The default value for the CLI flag -query-frontend.query-sharding-max-regexp-size-bytes was changed from 0 to 4096. As a result, queries with regex matchers exceeding this limit will not be sharded by default.
  • The default value for the CLI flag -compactor.partial-block-deletion-delay was changed from 0s to 1d. As a result, partial blocks resulting from a failed block upload or deletion will be cleaned up automatically.
  • The default value for the CLI flag -ruler.poll-interval was changed from 1m to 10m.

Bug fixes

  • Store-gateway: Detect collisions in the postings cache. PR 4770
  • Store-gateway: Fix panic caused by cached LabelValues responses with more than 655360 values. PR 5021

Changelog

2.9.0

Grafana Mimir

  • [CHANGE] Store-gateway: change expanded postings, postings, and label values index cache key format. These caches will be invalidated when rolling out the new Mimir version. #4770 #4978 #5037
  • [CHANGE] Distributor: remove the "forwarding" feature as it isn't necessary anymore. #4876
  • [CHANGE] Query-frontend: Change the default value of -query-frontend.query-sharding-max-regexp-size-bytes from 0 to 4096. #4932
  • [CHANGE] Querier: -querier.query-ingesters-within has been moved from a global flag to a per-tenant override. #4287
  • [CHANGE] Querier: Use -blocks-storage.tsdb.retention-period instead of -querier.query-ingesters-within for calculating the lookback period for shuffle sharded ingesters. Setting -querier.query-ingesters-within=0 no longer disables shuffle sharding on the read path. #4287
  • [CHANGE] Block upload: /api/v1/upload/block/{block}/files endpoint now allows file uploads with no Content-Length. #4956
  • [CHANGE] Store-gateway: deprecate configuration parameters for chunk pooling, they will be removed in Mimir 2.11. The following options are now also ignored: #4996
    • -blocks-storage.bucket-store.max-chunk-pool-bytes
    • -blocks-storage.bucket-store.chunk-pool-min-bucket-size-bytes
    • -blocks-storage.bucket-store.chunk-pool-max-bucket-size-bytes
  • [CHANGE] Store-gateway: remove metrics cortex_bucket_store_chunk_pool_requested_bytes_total and cortex_bucket_store_chunk_pool_returned_bytes_total. #4996
  • [CHANGE] Compactor: change default of -compactor.partial-block-deletion-delay to 1d. This will automatically clean up partial blocks that were a result of failed block upload or deletion. #5026
  • [CHANGE] Compactor: the deprecated configuration parameter -compactor.consistency-delay has been removed. #5050
  • [CHANGE] Store-gateway: the deprecated configuration parameter -blocks-storage.bucket-store.consistency-delay has been removed. #5050
  • [CHANGE] The configuration parameter -blocks-storage.bucket-store.bucket-index.enabled has been deprecated and will be removed in Mimir 2.11. Mimir is running by default with the bucket index enabled since version 2.0, and starting from the version 2.11 it will not be possible to disable it. #5051
  • [CHANGE] The configuration parameters -querier.iterators and -query.batch-iterators have been deprecated and will be removed in Mimir 2.11. Mimir runs by default with -querier.batch-iterators=true, and starting from version 2.11 it will not be possible to change this. #5114
  • [CHANGE] Compactor: change default of -compactor.first-level-compaction-wait-period to 25m. #5128
  • [CHANGE] Ruler: changed default of -ruler.poll-interval from 1m to 10m. Starting from this release, the configured rule groups will also be re-synced each time they're modified calling the ruler configuration API. #5170
  • [FEATURE] Query-frontend: add -query-frontend.log-query-request-headers to enable logging of request headers in query logs. #5030
  • [ENHANCEMENT] Add per-tenant limit -validation.max-native-histogram-buckets to be able to ignore native histogram samples that have too many buckets. #4765
  • [ENHANCEMENT] Store-gateway: reduce memory usage in some LabelValues calls. #4789
  • [ENHANCEMENT] Store-gateway: add a stage label to the metric cortex_bucket_store_series_data_touched. This label now applies to data_type="chunks" and data_type="series". The stage label has 2 values: processed - the number of series that parsed - and returned - the number of series selected from the processed bytes to satisfy the query. #4797 #4830
  • [ENHANCEMENT] Distributor: make __meta_tenant_id label available in relabeling rules configured via metric_relabel_configs. #4725
  • [ENHANCEMENT] Compactor: added the configurable limit compactor.block-upload-max-block-size-bytes or compactor_block_upload_max_block_size_bytes to limit the byte size of uploaded or validated blocks. #4680
  • [ENHANCEMENT] Querier: reduce CPU utilisation when shuffle sharding is enabled with large shard sizes. #4851
  • [ENHANCEMENT] Packaging: facilitate configuration management by instructing systemd to start mimir with a configuration file. #4810
  • [ENHANCEMENT] Store-gateway: reduce memory allocations when looking up postings from cache. #4861 #4869 #4962 #5047
  • [ENHANCEMENT] Store-gateway: retain only necessary bytes when reading series from the bucket. #4926
  • [ENHANCEMENT] Ingester, store-gateway: clear the shutdown marker after a successful shutdown to enable reusing their persistent volumes in case the ingester or store-gateway is restarted. #4985
  • [ENHANCEMENT] Store-gateway, query-frontend: Reduced memory allocations when looking up cached entries from Memcached. #4862
  • [ENHANCEMENT] Alertmanager: Add additional template function queryFromGeneratorURL returning query URL decoded query from the GeneratorURL field of an alert. #4301
  • [ENHANCEMENT] Ruler: added experimental ruler storage cache support. The cache should reduce the number of "list objects" API calls issued to the object storage when there are 2+ ruler replicas running in a Mimir cluster. The cache can be configured setting -ruler-storage.cache.* CLI flags or their respective YAML config options. #4950 #5054
  • [ENHANCEMENT] Store-gateway: added HTTP /store-gateway/prepare-shutdown endpoint for gracefully scaling down of store-gateways. A gauge cortex_store_gateway_prepare_shutdown_requested has been introduced for tracing this process. #4955
  • [ENHANCEMENT] Updated Kuberesolver dependency (github.com/sercand/kuberesolver) from v2.4.0 to v4.0.0 and gRPC dependency (google.golang.org/grpc) from v1.47.0 to v1.53.0. #4922
  • [ENHANCEMENT] Introduced new options for logging HTTP request headers: -server.log-request-headers enables logging HTTP request headers, -server.log-request-headers-exclude-list lists headers which should not be logged. #4922
  • [ENHANCEMENT] Block upload: /api/v1/upload/block/{block}/files endpoint now disables read and write HTTP timeout, overriding -server.http-read-timeout and -server.http-write-timeout values. This is done to allow large file uploads to succeed. #4956
  • [ENHANCEMENT] Alertmanager: Introduce new metrics from upstream. #4918
    • cortex_alertmanager_notifications_failed_total (added reason label)
    • cortex_alertmanager_nflog_maintenance_total
    • cortex_alertmanager_nflog_maintenance_errors_total
    • cortex_alertmanager_silences_maintenance_total
    • cortex_alertmanager_silences_maintenance_errors_total
  • [ENHANCEMENT] Add native histogram support for cortex_request_duration_seconds metric family. #4987
  • [ENHANCEMENT] Ruler: do not list rule groups in the object storage for disabled tenants. #5004
  • [ENHANCEMENT] Query-frontend and querier: add HTTP API endpoint <prometheus-http-prefix>/api/v1/format_query to format a PromQL query. #4373
  • [ENHANCEMENT] Query-frontend: Add cortex_query_frontend_regexp_matcher_count and cortex_query_frontend_regexp_matcher_optimized_count metrics to track optimization of regular expression label matchers. #4813
  • [ENHANCEMENT] Alertmanager: Add configuration option to enable or disable the deletion of alertmanager state from object storage. This is useful when migrating alertmanager tenants from one cluster to another, because it avoids a condition where the state object is copied but then deleted before the configuration object is copied. #4989
  • [ENHANCEMENT] Querier: only use the minimum set of chunks from ingesters when querying, and cancel unnecessary requests to ingesters sooner if we know their results won't be used. #5016
  • [ENHANCEMENT] Add -enable-go-runtime-metrics flag to expose all go runtime metrics as Prometheus metrics. #5009
  • [ENHANCEMENT] Ruler: trigger a synchronization of tenant's rule groups as soon as they change the rules configuration via API. This synchronization is in addition of the periodic syncing done every -ruler.poll-interval. The new behavior is enabled by default, but can be disabled with -ruler.sync-rules-on-changes-enabled=false (configurable on a per-tenant basis too). If you disable the new behaviour, then you may want to revert -ruler.poll-interval to 1m. #4975 #5053 #5115 #5170
  • [ENHANCEMENT] Distributor: Improve invalid tenant shard size error message. #5024
  • [ENHANCEMENT] Store-gateway: record index header loading time separately in cortex_bucket_store_series_request_stage_duration_seconds{stage="load_index_header"}. Now index header loading will be visible in the "Mimir / Queries" dashboard in the "Series request p99/average latency" panels. #5011 #5062
  • [ENHANCEMENT] Querier and ingester: add experimental support for streaming chunks from ingesters to queriers while evaluating queries. This can be enabled with -querier.prefer-streaming-chunks=true. #4886 #5078 #5094 #5126
  • [ENHANCEMENT] Update Docker base images from alpine:3.17.3 to alpine:3.18.0. #5065
  • [ENHANCEMENT] Compactor: reduced the number of "object exists" API calls issued by the compactor to the object storage when syncing block's meta.json files. #5063
  • [ENHANCEMENT] Distributor: Push request rate limits (-distributor.request-rate-limit and -distributor.request-burst-size) and their associated YAML configuration are now stable. #5124
  • [ENHANCEMENT] Go: updated to 1.20.5. #5185
  • [ENHANCEMENT] Update alpine base image to 3.18.2. #5274 #5276
  • [BUGFIX] Metadata API: Mimir will now return an empty object when no metadata is available, matching Prometheus. #4782
  • [BUGFIX] Store-gateway: add collision detection on expanded postings and individual postings cache keys. #4770
  • [BUGFIX] Ruler: Support the type=alert|record query parameter for the API endpoint <prometheus-http-prefix>/api/v1/rules. #4302
  • [BUGFIX] Backend: Check that alertmanager's data-dir doesn't overlap with bucket-sync dir. #4921
  • [BUGFIX] Alertmanager: Allow to rate-limit webex, telegram and discord notifications. #4979
  • [BUGFIX] Store-gateway: panics when decoding LabelValues responses that contain more than 655360 values. These responses are no longer cached. #5021
  • [BUGFIX] Querier: don't leak memory when processing query requests from query-frontends (ie. when the query-scheduler is disabled). #5199

Documentation

  • [ENHANCEMENT] Improve MimirIngesterReachingTenantsLimit runbook. #4744 #4752
  • [ENHANCEMENT] Add symbol table size exceeds case to MimirCompactorHasNotSuccessfullyRunCompaction runbook. #4945
  • [ENHANCEMENT] Clarify which APIs use query sharding. #4948

Mixin

  • [CHANGE] Alerts: Remove MimirQuerierHighRefetchRate. #4980
  • [CHANGE] Alerts: Remove MimirTenantHasPartialBlocks. This is obsoleted by the changed default of -compactor.partial-block-deletion-delay to 1d, which will auto remediate this alert. #5026
  • [ENHANCEMENT] Alertmanager dashboard: display active aggregation groups #4772
  • [ENHANCEMENT] Alerts: MimirIngesterTSDBWALCorrupted now only fires when there are more than one corrupted WALs in single-zone deployments and when there are more than two zones affected in multi-zone deployments. #4920
  • [ENHANCEMENT] Alerts: added labels to duplicated MimirRolloutStuck and MimirCompactorHasNotUploadedBlocks rules in order to distinguish them. #5023
  • [ENHANCEMENT] Dashboards: fix holes in graph for lightly loaded clusters #4915
  • [ENHANCEMENT] Dashboards: allow configuring additional services for the Rollout Progress dashboard. #5007
  • [ENHANCEMENT] Alerts: do not fire MimirAllocatingTooMuchMemory alert for any matching container outside of namespaces where Mimir is running. #5089
  • [BUGFIX] Dashboards: show cancelled requests in a different color to successful requests in throughput panels on dashboards. #5039
  • [BUGFIX] Dashboards: fix dashboard panels that showed percentages with axes from 0 to 10000%. #5084

Jsonnet

  • [CHANGE] Ruler: changed ruler autoscaling policy, extended scale down period from 60s to 600s. #4786
  • [CHANGE] Update to v0.5.0 rollout-operator. #4893
  • [CHANGE] Backend: add alertmanager_args to mimir-backend when running in read-write deployment mode. Remove hardcoded filesystem alertmanager storage. This moves alertmanager's data-dir to /data/alertmanager by default. #4907 #4921
  • [CHANGE] Remove -pdb suffix from PodDisruptionBudget names. This will create new PodDisruptionBudget resources. Make sure to prune the old resources; otherwise, rollouts will be blocked. #5109
  • [CHANGE] Query-frontend: enable query sharding for cardinality estimation via -query-frontend.query-sharding-target-series-per-shard by default if the results cache is enabled. #5128
  • [ENHANCEMENT] Ingester: configure -blocks-storage.tsdb.head-compaction-interval=15m to spread TSDB head compaction over a wider time range. #4870
  • [ENHANCEMENT] Ingester: configure -blocks-storage.tsdb.wal-replay-concurrency to CPU request minus 1. #4864
  • [ENHANCEMENT] Compactor: configure -compactor.first-level-compaction-wait-period to TSDB head compaction interval plus 10 minutes. #4872
  • [ENHANCEMENT] Store-gateway: set GOMEMLIMIT to the memory request value. This should reduce the likelihood the store-gateway may go out of memory, at the cost of an higher CPU utilization due to more frequent garbage collections when the memory utilization gets closer or above the configured requested memory. #4971
  • [ENHANCEMENT] Store-gateway: dynamically set GOMAXPROCS based on the CPU request. This should reduce the likelihood a high load on the store-gateway will slow down the entire Kubernetes node. #5104
  • [ENHANCEMENT] Store-gateway: add store_gateway_lazy_loading_enabled configuration option which combines disabled lazy-loading and reducing blocks sync concurrency. Reducing blocks sync concurrency improves startup times with disabled lazy loading on HDDs. #5025
  • [ENHANCEMENT] Update rollout-operator image to v0.6.0. #5155
  • [BUGFIX] Backend: configure -ruler.alertmanager-url to mimir-backend when running in read-write deployment mode. #4892

Mimirtool

  • [CHANGE] check rules: will fail on duplicate rules when --strict is provided. #5035
  • [FEATURE] sync/diff can now include/exclude namespaces based on a regular expression using --namespaces-regex and --ignore-namespaces-regex. #5100
  • [ENHANCEMENT] analyze prometheus: allow to specify -prometheus-http-prefix. #4966
  • [ENHANCEMENT] analyze grafana: allow to specify --folder-title to limit dashboards analysis based on their exact folder title. #4973

Tools

  • [CHANGE] copyblocks: copying between Azure Blob Storage buckets is now supported in addition to copying between Google Cloud Storage buckets. As a result, the --service flag is now required to be specified (accepted values are gcs or abs). #4756

All changes in this release: https://github.com/grafana/mimir/compare/mimir-2.8.0...mimir-2.9.0

mimir-2.9.0-rc.1

1 year ago

This release contains 260 PRs from 46 authors. Thank you!

Grafana Mimir version 2.9 release notes

Grafana Labs is excited to announce version 2.9 of Grafana Mimir.

The highlights that follow include the top features, enhancements, and bugfixes in this release. For the complete list of changes, see the changelog.

Features and enhancements

  • Reduced store-gateway memory utilization on fetching series from long-term storage For queries that include broad label matchers (e.g. datacenter="dc1"), Mimir 2.9 will fetch a reduced volume of index data, which leads to a significant reduction in memory allocations in the store-gateway.
  • Reduced CPU utilisation for some shuffle sharding scenarios Mimir queriers will now use significantly less CPU in cases where shuffle sharding is enabled for tenants with a shard size that's large but lower than the total number of ingesters.
  • Reduced object storage API calls in compactors and rulers Mimir 2.9 comes with optimizations that will reduce the amount of times compactors and rulers need to access rules stored in object storage.
    • This release adds experimental support for a ruler storage cache. This cache should reduce the number of "list objects" API calls issued to the object storage when there are 2+ ruler replicas running in a Mimir cluster. The cache can be configured by setting the -ruler-storage.cache.* CLI flags or their respective YAML config options.
    • We also introduced a new feature to trigger a synchronization of tenant's rule groups as soon as changes to the rule configuration are made via API. This synchronization is in addition of the periodic syncing done every -ruler.poll-interval and allows to increase the polling interval. The new behavior is enabled globally by default but can be disabled with -ruler.sync-rules-on-changes-enabled=false or tuned at a per-tenant level.
  • Experimental support for streaming chunks from ingester to querier This is expected to greatly reduce querier memory consumption when evaluating queries that select a large number of series, because chunks streamed from the querier can now be read into memory as needed.

Helm chart improvements

The Grafana Mimir and Grafana Enterprise Metrics Helm chart is now released independently. See the Grafana Mimir Helm chart documentation.

Important changes

In Grafana Mimir 2.9 we have removed the following previously deprecated or experimental metrics:

  • cortex_bucket_store_chunk_pool_requested_bytes_total
  • cortex_bucket_store_chunk_pool_returned_bytes_total

The following configuration options are deprecated and will be removed in Grafana Mimir 2.11:

  • The CLI flag -querier.query-ingesters-within. This configuration is moved to per-tenant overrides.
  • The CLI flag -blocks-storage.bucket-store.bucket-index.enabled.
  • The CLI flags -blocks-storage.bucket-store.chunk-pool-min-bucket-size-bytes, -blocks-storage.bucket-store.chunk-pool-max-bucket-size-bytes and -blocks-storage.bucket-store.max-chunk-pool-bytes.
  • The CLI flags querier.iterators and -query.batch-iterators.

The following configuration options that were deprecated in 2.7 are removed:

  • The CLI flag -blocks-storage.bucket-store.chunks-cache.subrange-size. A fixed value of 16000 is now always used.
  • The CLI flag -blocks-storage.bucket-store.consistency-delay.
  • The CLI flag -compactor.consistency-delay.
  • The CLI flag -ingester.ring.readiness-check-ring-health.

The following experimental options and features are now stable:

  • The CLI flag -query-frontend.query-sharding-max-regexp-size-bytes.
  • The CLI flag -query-scheduler.max-used-instances.
  • The CLI flags -(alertmanager|blocks|ruler)-storage.storage-prefix.
  • The CLI flag -compactor.first-level-compaction-wait-period.
  • The CLI flags -usage-stats.enabled and -usage-stats.installation-mode.
  • The CLI flag -query-frontend.query-sharding-target-series-per-shard.

The following configuration option defaults were changed:

  • The default value for the CLI flag -query-frontend.query-sharding-max-regexp-size-bytes was changed from 0 to 4096. As a result, queries with regex matchers exceeding this limit will not be sharded by default.
  • The default value for the CLI flag -compactor.partial-block-deletion-delay was changed from 0s to 1d. As a result, partial blocks resulting from a failed block upload or deletion will be cleaned up automatically.
  • The default value for the CLI flag -ruler.poll-interval was changed from 1m to 10m.

Bug fixes

  • Store-gateway: Detect collisions in the postings cache. PR 4770
  • Store-gateway: Fix panic caused by cached LabelValues responses with more than 655360 values. PR 5021

Changelog

2.9.0-rc.1

Grafana Mimir

  • [CHANGE] Store-gateway: change expanded postings, postings, and label values index cache key format. These caches will be invalidated when rolling out the new Mimir version. #4770 #4978 #5037
  • [CHANGE] Distributor: remove the "forwarding" feature as it isn't necessary anymore. #4876
  • [CHANGE] Query-frontend: Change the default value of -query-frontend.query-sharding-max-regexp-size-bytes from 0 to 4096. #4932
  • [CHANGE] Querier: -querier.query-ingesters-within has been moved from a global flag to a per-tenant override. #4287
  • [CHANGE] Querier: Use -blocks-storage.tsdb.retention-period instead of -querier.query-ingesters-within for calculating the lookback period for shuffle sharded ingesters. Setting -querier.query-ingesters-within=0 no longer disables shuffle sharding on the read path. #4287
  • [CHANGE] Block upload: /api/v1/upload/block/{block}/files endpoint now allows file uploads with no Content-Length. #4956
  • [CHANGE] Store-gateway: deprecate configuration parameters for chunk pooling, they will be removed in Mimir 2.11. The following options are now also ignored: #4996
    • -blocks-storage.bucket-store.max-chunk-pool-bytes
    • -blocks-storage.bucket-store.chunk-pool-min-bucket-size-bytes
    • -blocks-storage.bucket-store.chunk-pool-max-bucket-size-bytes
  • [CHANGE] Store-gateway: remove metrics cortex_bucket_store_chunk_pool_requested_bytes_total and cortex_bucket_store_chunk_pool_returned_bytes_total. #4996
  • [CHANGE] Compactor: change default of -compactor.partial-block-deletion-delay to 1d. This will automatically clean up partial blocks that were a result of failed block upload or deletion. #5026
  • [CHANGE] Compactor: the deprecated configuration parameter -compactor.consistency-delay has been removed. #5050
  • [CHANGE] Store-gateway: the deprecated configuration parameter -blocks-storage.bucket-store.consistency-delay has been removed. #5050
  • [CHANGE] The configuration parameter -blocks-storage.bucket-store.bucket-index.enabled has been deprecated and will be removed in Mimir 2.11. Mimir is running by default with the bucket index enabled since version 2.0, and starting from the version 2.11 it will not be possible to disable it. #5051
  • [CHANGE] The configuration parameters -querier.iterators and -query.batch-iterators have been deprecated and will be removed in Mimir 2.11. Mimir runs by default with -querier.batch-iterators=true, and starting from version 2.11 it will not be possible to change this. #5114
  • [CHANGE] Compactor: change default of -compactor.first-level-compaction-wait-period to 25m. #5128
  • [CHANGE] Ruler: changed default of -ruler.poll-interval from 1m to 10m. Starting from this release, the configured rule groups will also be re-synced each time they're modified calling the ruler configuration API. #5170
  • [FEATURE] Query-frontend: add -query-frontend.log-query-request-headers to enable logging of request headers in query logs. #5030
  • [ENHANCEMENT] Add per-tenant limit -validation.max-native-histogram-buckets to be able to ignore native histogram samples that have too many buckets. #4765
  • [ENHANCEMENT] Store-gateway: reduce memory usage in some LabelValues calls. #4789
  • [ENHANCEMENT] Store-gateway: add a stage label to the metric cortex_bucket_store_series_data_touched. This label now applies to data_type="chunks" and data_type="series". The stage label has 2 values: processed - the number of series that parsed - and returned - the number of series selected from the processed bytes to satisfy the query. #4797 #4830
  • [ENHANCEMENT] Distributor: make __meta_tenant_id label available in relabeling rules configured via metric_relabel_configs. #4725
  • [ENHANCEMENT] Compactor: added the configurable limit compactor.block-upload-max-block-size-bytes or compactor_block_upload_max_block_size_bytes to limit the byte size of uploaded or validated blocks. #4680
  • [ENHANCEMENT] Querier: reduce CPU utilisation when shuffle sharding is enabled with large shard sizes. #4851
  • [ENHANCEMENT] Packaging: facilitate configuration management by instructing systemd to start mimir with a configuration file. #4810
  • [ENHANCEMENT] Store-gateway: reduce memory allocations when looking up postings from cache. #4861 #4869 #4962 #5047
  • [ENHANCEMENT] Store-gateway: retain only necessary bytes when reading series from the bucket. #4926
  • [ENHANCEMENT] Ingester, store-gateway: clear the shutdown marker after a successful shutdown to enable reusing their persistent volumes in case the ingester or store-gateway is restarted. #4985
  • [ENHANCEMENT] Store-gateway, query-frontend: Reduced memory allocations when looking up cached entries from Memcached. #4862
  • [ENHANCEMENT] Alertmanager: Add additional template function queryFromGeneratorURL returning query URL decoded query from the GeneratorURL field of an alert. #4301
  • [ENHANCEMENT] Ruler: added experimental ruler storage cache support. The cache should reduce the number of "list objects" API calls issued to the object storage when there are 2+ ruler replicas running in a Mimir cluster. The cache can be configured setting -ruler-storage.cache.* CLI flags or their respective YAML config options. #4950 #5054
  • [ENHANCEMENT] Store-gateway: added HTTP /store-gateway/prepare-shutdown endpoint for gracefully scaling down of store-gateways. A gauge cortex_store_gateway_prepare_shutdown_requested has been introduced for tracing this process. #4955
  • [ENHANCEMENT] Updated Kuberesolver dependency (github.com/sercand/kuberesolver) from v2.4.0 to v4.0.0 and gRPC dependency (google.golang.org/grpc) from v1.47.0 to v1.53.0. #4922
  • [ENHANCEMENT] Introduced new options for logging HTTP request headers: -server.log-request-headers enables logging HTTP request headers, -server.log-request-headers-exclude-list lists headers which should not be logged. #4922
  • [ENHANCEMENT] Block upload: /api/v1/upload/block/{block}/files endpoint now disables read and write HTTP timeout, overriding -server.http-read-timeout and -server.http-write-timeout values. This is done to allow large file uploads to succeed. #4956
  • [ENHANCEMENT] Alertmanager: Introduce new metrics from upstream. #4918
    • cortex_alertmanager_notifications_failed_total (added reason label)
    • cortex_alertmanager_nflog_maintenance_total
    • cortex_alertmanager_nflog_maintenance_errors_total
    • cortex_alertmanager_silences_maintenance_total
    • cortex_alertmanager_silences_maintenance_errors_total
  • [ENHANCEMENT] Add native histogram support for cortex_request_duration_seconds metric family. #4987
  • [ENHANCEMENT] Ruler: do not list rule groups in the object storage for disabled tenants. #5004
  • [ENHANCEMENT] Query-frontend and querier: add HTTP API endpoint <prometheus-http-prefix>/api/v1/format_query to format a PromQL query. #4373
  • [ENHANCEMENT] Query-frontend: Add cortex_query_frontend_regexp_matcher_count and cortex_query_frontend_regexp_matcher_optimized_count metrics to track optimization of regular expression label matchers. #4813
  • [ENHANCEMENT] Alertmanager: Add configuration option to enable or disable the deletion of alertmanager state from object storage. This is useful when migrating alertmanager tenants from one cluster to another, because it avoids a condition where the state object is copied but then deleted before the configuration object is copied. #4989
  • [ENHANCEMENT] Querier: only use the minimum set of chunks from ingesters when querying, and cancel unnecessary requests to ingesters sooner if we know their results won't be used. #5016
  • [ENHANCEMENT] Add -enable-go-runtime-metrics flag to expose all go runtime metrics as Prometheus metrics. #5009
  • [ENHANCEMENT] Ruler: trigger a synchronization of tenant's rule groups as soon as they change the rules configuration via API. This synchronization is in addition of the periodic syncing done every -ruler.poll-interval. The new behavior is enabled by default, but can be disabled with -ruler.sync-rules-on-changes-enabled=false (configurable on a per-tenant basis too). If you disable the new behaviour, then you may want to revert -ruler.poll-interval to 1m. #4975 #5053 #5115 #5170
  • [ENHANCEMENT] Distributor: Improve invalid tenant shard size error message. #5024
  • [ENHANCEMENT] Store-gateway: record index header loading time separately in cortex_bucket_store_series_request_stage_duration_seconds{stage="load_index_header"}. Now index header loading will be visible in the "Mimir / Queries" dashboard in the "Series request p99/average latency" panels. #5011 #5062
  • [ENHANCEMENT] Querier and ingester: add experimental support for streaming chunks from ingesters to queriers while evaluating queries. This can be enabled with -querier.prefer-streaming-chunks=true. #4886 #5078 #5094 #5126
  • [ENHANCEMENT] Update Docker base images from alpine:3.17.3 to alpine:3.18.0. #5065
  • [ENHANCEMENT] Compactor: reduced the number of "object exists" API calls issued by the compactor to the object storage when syncing block's meta.json files. #5063
  • [ENHANCEMENT] Distributor: Push request rate limits (-distributor.request-rate-limit and -distributor.request-burst-size) and their associated YAML configuration are now stable. #5124
  • [ENHANCEMENT] Go: updated to 1.20.5. #5185
  • [BUGFIX] Metadata API: Mimir will now return an empty object when no metadata is available, matching Prometheus. #4782
  • [BUGFIX] Store-gateway: add collision detection on expanded postings and individual postings cache keys. #4770
  • [BUGFIX] Ruler: Support the type=alert|record query parameter for the API endpoint <prometheus-http-prefix>/api/v1/rules. #4302
  • [BUGFIX] Backend: Check that alertmanager's data-dir doesn't overlap with bucket-sync dir. #4921
  • [BUGFIX] Alertmanager: Allow to rate-limit webex, telegram and discord notifications. #4979
  • [BUGFIX] Store-gateway: panics when decoding LabelValues responses that contain more than 655360 values. These responses are no longer cached. #5021
  • [BUGFIX] Querier: don't leak memory when processing query requests from query-frontends (ie. when the query-scheduler is disabled). #5199

Documentation

  • [ENHANCEMENT] Improve MimirIngesterReachingTenantsLimit runbook. #4744 #4752
  • [ENHANCEMENT] Add symbol table size exceeds case to MimirCompactorHasNotSuccessfullyRunCompaction runbook. #4945
  • [ENHANCEMENT] Clarify which APIs use query sharding. #4948

Mixin

  • [CHANGE] Alerts: Remove MimirQuerierHighRefetchRate. #4980
  • [CHANGE] Alerts: Remove MimirTenantHasPartialBlocks. This is obsoleted by the changed default of -compactor.partial-block-deletion-delay to 1d, which will auto remediate this alert. #5026
  • [ENHANCEMENT] Alertmanager dashboard: display active aggregation groups #4772
  • [ENHANCEMENT] Alerts: MimirIngesterTSDBWALCorrupted now only fires when there are more than one corrupted WALs in single-zone deployments and when there are more than two zones affected in multi-zone deployments. #4920
  • [ENHANCEMENT] Alerts: added labels to duplicated MimirRolloutStuck and MimirCompactorHasNotUploadedBlocks rules in order to distinguish them. #5023
  • [ENHANCEMENT] Dashboards: fix holes in graph for lightly loaded clusters #4915
  • [ENHANCEMENT] Dashboards: allow configuring additional services for the Rollout Progress dashboard. #5007
  • [ENHANCEMENT] Alerts: do not fire MimirAllocatingTooMuchMemory alert for any matching container outside of namespaces where Mimir is running. #5089
  • [BUGFIX] Dashboards: show cancelled requests in a different color to successful requests in throughput panels on dashboards. #5039
  • [BUGFIX] Dashboards: fix dashboard panels that showed percentages with axes from 0 to 10000%. #5084

Jsonnet

  • [CHANGE] Ruler: changed ruler autoscaling policy, extended scale down period from 60s to 600s. #4786
  • [CHANGE] Update to v0.5.0 rollout-operator. #4893
  • [CHANGE] Backend: add alertmanager_args to mimir-backend when running in read-write deployment mode. Remove hardcoded filesystem alertmanager storage. This moves alertmanager's data-dir to /data/alertmanager by default. #4907 #4921
  • [CHANGE] Remove -pdb suffix from PodDisruptionBudget names. This will create new PodDisruptionBudget resources. Make sure to prune the old resources; otherwise, rollouts will be blocked. #5109
  • [CHANGE] Query-frontend: enable query sharding for cardinality estimation via -query-frontend.query-sharding-target-series-per-shard by default if the results cache is enabled. #5128
  • [ENHANCEMENT] Ingester: configure -blocks-storage.tsdb.head-compaction-interval=15m to spread TSDB head compaction over a wider time range. #4870
  • [ENHANCEMENT] Ingester: configure -blocks-storage.tsdb.wal-replay-concurrency to CPU request minus 1. #4864
  • [ENHANCEMENT] Compactor: configure -compactor.first-level-compaction-wait-period to TSDB head compaction interval plus 10 minutes. #4872
  • [ENHANCEMENT] Store-gateway: set GOMEMLIMIT to the memory request value. This should reduce the likelihood the store-gateway may go out of memory, at the cost of an higher CPU utilization due to more frequent garbage collections when the memory utilization gets closer or above the configured requested memory. #4971
  • [ENHANCEMENT] Store-gateway: dynamically set GOMAXPROCS based on the CPU request. This should reduce the likelihood a high load on the store-gateway will slow down the entire Kubernetes node. #5104
  • [ENHANCEMENT] Store-gateway: add store_gateway_lazy_loading_enabled configuration option which combines disabled lazy-loading and reducing blocks sync concurrency. Reducing blocks sync concurrency improves startup times with disabled lazy loading on HDDs. #5025
  • [ENHANCEMENT] Update rollout-operator image to v0.6.0. #5155
  • [BUGFIX] Backend: configure -ruler.alertmanager-url to mimir-backend when running in read-write deployment mode. #4892

Mimirtool

  • [CHANGE] check rules: will fail on duplicate rules when --strict is provided. #5035
  • [FEATURE] sync/diff can now include/exclude namespaces based on a regular expression using --namespaces-regex and --ignore-namespaces-regex. #5100
  • [ENHANCEMENT] analyze prometheus: allow to specify -prometheus-http-prefix. #4966
  • [ENHANCEMENT] analyze grafana: allow to specify --folder-title to limit dashboards analysis based on their exact folder title. #4973

Tools

  • [CHANGE] copyblocks: copying between Azure Blob Storage buckets is now supported in addition to copying between Google Cloud Storage buckets. As a result, the --service flag is now required to be specified (accepted values are gcs or abs). #4756

New Contributors

Full Changelog: https://github.com/grafana/mimir/compare/mimir-2.8.0...mimir-2.9.0-rc.1