Cortexproject Cortex Versions Save

A horizontally scalable, highly available, multi-tenant, long term Prometheus.

v1.17.0

1 month ago

This release contains 168 contributions from 29 contributors. We also have 16 new contributors. Thank you all for the contributions!

Some notable changes release are:

Experimental OTLP ingestion
Experimental minimize spread token generator strategy on Ingester
Advanced query scheduling with Query Priority
ListRules API high availability by rule group replication and backup
Various improvements on Store Gateway Index Cache
mem-ballast-size-bytes flag has been marked as deprecated and not functional anymore
-querier.ingester-streaming flag has been marked as deprecated and ingester streaming is always enabled now
querier.iterators and querier.batch-iterators flags have been marked as deprecated and batch iterator is always enabled in Querier now

Cortex

[CHANGE] Azure Storage: Upgraded objstore dependency and support Azure Workload Identity Authentication. Added connection_string to support authenticating via SAS token. Marked msi_resource config as deprecating. #5645
[CHANGE] Store Gateway: Add a new fastcache based inmemory index cache. #5619
[CHANGE] Index Cache: Multi level cache backfilling operation becomes async. Added -blocks-storage.bucket-store.index-cache.multilevel.max-async-concurrency and -blocks-storage.bucket-store.index-cache.multilevel.max-async-buffer-size configs and metric cortex_store_multilevel_index_cache_backfill_dropped_items_total for number of dropped items. #5661
[CHANGE] Ingester: Disable uploading compacted blocks and overlapping compaction in ingester. #5735
[CHANGE] Distributor: Count the number of rate-limited samples in distributor_samples_in_total. #5714
[CHANGE] Ruler: Remove cortex_ruler_write_requests_total, cortex_ruler_write_requests_failed_total, cortex_ruler_queries_total, cortex_ruler_queries_failed_total, and cortex_ruler_query_seconds_total metrics for the tenant when the ruler deletes the manager for the tenant. #5772
[CHANGE] Main: Mark mem-ballast-size-bytes flag as deprecated. #5816
[CHANGE] Querier: Mark -querier.ingester-streaming flag as deprecated. Now query ingester streaming is always enabled. #5817
[CHANGE] Compactor/Bucket Store: Added -blocks-storage.bucket-store.block-discovery-strategy to configure different block listing strategy. Reverted the current recursive block listing mechanism and use the strategy Concurrent as in 1.15. #5828
[CHANGE] Compactor: Don't halt compactor when overlapped source blocks detected. #5854
[CHANGE] S3 Bucket Client: Expose -blocks-storage.s3.send-content-md5 flag and set default checksum algorithm to MD5. #5870
[CHANGE] Querier: Mark querier.iterators and querier.batch-iterators flags as deprecated. Now querier always use batch iterators. #5868
[FEATURE] OTLP ingestion experimental. #5813
[FEATURE] Ingester: Add per-tenant new metric cortex_ingester_tsdb_data_replay_duration_seconds. #5477
[FEATURE] Query Frontend/Scheduler: Add query priority support. #5605
[FEATURE] Tracing: Add kuberesolver to resolve endpoints address with kubernetes:// prefix as Kubernetes service. #5731
[FEATURE] Tracing: Add tracing.otel.round-robin flag to use round_robin gRPC client side LB policy for sending OTLP traces. #5731
[FEATURE] Ruler: Add ruler.concurrent-evals-enabled flag to enable concurrent evaluation within a single rule group for independent rules. Maximum concurrency can be configured via ruler.max-concurrent-evals. #5766
[FEATURE] Distributor Queryable: Experimental: Add config zone_results_quorum_metadata. When querying ingesters using metadata APIs such as label names and values, only results from quorum number of zones will be included and merged. #5779
[FEATURE] Storage Cache Clients: Add config set_async_circuit_breaker_config to utilize the circuit breaker pattern for dynamically thresholding asynchronous set operations. Implemented in both memcached and redis cache clients. #5789
[FEATURE] Ruler: Add experimental experimental.ruler.api-deduplicate-rules flag to remove duplicate rule groups from the Prometheus compatible rules API endpoint. Add experimental ruler.ring.replication-factor and ruler.ring.zone-awareness-enabled flags to configure rule group replication, but only the first ruler in the replicaset evaluates the rule group, the rest will just hold a copy as backup. Add experimental experimental.ruler.api-enable-rules-backup flag to configure rulers to send the rule group backups stored in the replicaset to handle events when a ruler is down during an API request to list rules. #5782
[ENHANCEMENT] Store Gateway: Added -store-gateway.enabled-tenants and -store-gateway.disabled-tenants to explicitly enable or disable store-gateway for specific tenants. #5638
[ENHANCEMENT] Compactor: Add new compactor metric cortex_compactor_start_duration_seconds. #5683
[ENHANCEMENT] Index Cache: Multi level cache adds config max_backfill_items to cap max items to backfill per async operation. #5686
[ENHANCEMENT] Query Frontend: Log number of split queries in query stats log. #5703
[ENHANCEMENT] Logging: Added new options for logging HTTP request headers: -server.log-request-headers enables logging HTTP request headers, -server.log-request-headers-exclude-list allows users to specify headers which should not be logged. #5744
[ENHANCEMENT] Query Frontend/Scheduler: Time check in query priority now considers overall data select time window (including range selectors, modifiers and lookback delta). #5758
[ENHANCEMENT] Querier: Added querier.store-gateway-query-stats-enabled to enable or disable store gateway query stats log. #5749
[ENHANCEMENT] AlertManager: Retrying AlertManager Delete Silence on error #5794
[ENHANCEMENT] Ingester: Add new ingester metric cortex_ingester_max_inflight_query_requests. #5798
[ENHANCEMENT] Query: Added query_storage_wall_time to Query Frontend and Ruler query stats log for wall time spent on fetching data from storage. Query evaluation is not included. #5799
[ENHANCEMENT] Query: Added additional max query length check at Query Frontend and Ruler. Added -querier.ignore-max-query-length flag to disable max query length check at Querier. #5808
[ENHANCEMENT] Querier: Add context error check when converting Metrics to SeriesSet for GetSeries on distributorQuerier. #5827
[ENHANCEMENT] Ruler: Improve GetRules response time by refactoring mutexes and introducing a temporary rules cache in ruler/manager.go. #5805
[ENHANCEMENT] Querier: Add context error check when merging slices from ingesters for GetLabel operations. #5837
[ENHANCEMENT] Ring: Add experimental -ingester.tokens-generator-strategy=minimize-spread flag to enable the new minimize spread token generator strategy. #5855
[ENHANCEMENT] Query Frontend: Ensure error response returned by Query Frontend follows Prometheus API error response format. #5811
[ENHANCEMENT] Ring Status Page: Add Ownership Diff From Expected column in the ring table to indicate the extent to which the ownership of a specific ingester differs from the expected ownership. #5889
[BUGFIX] Distributor: Do not use label with empty values for sharding #5717
[BUGFIX] Query Frontend: queries with negative offset should check whether it is cacheable or not. #5719
[BUGFIX] Redis Cache: pass cache_size config correctly. #5734
[BUGFIX] Distributor: Shuffle-Sharding with IngestionTenantShardSize == 0, default sharding strategy should be used #5189
[BUGFIX] Cortex: Fix GRPC stream clients not honoring overrides for call options. #5797
[BUGFIX] Ring DDB: Fix lifecycle for ring counting unhealthy pods as healthy. #5838
[BUGFIX] Ring DDB: Fix region assignment. #5842

New Contributors

@testwill made their first contribution in https://github.com/cortexproject/cortex/pull/5644
@dsabsay made their first contribution in https://github.com/cortexproject/cortex/pull/5684
@pawarpranav83 made their first contribution in https://github.com/cortexproject/cortex/pull/5719
@Kramer0x0 made their first contribution in https://github.com/cortexproject/cortex/pull/5743
@tesla59 made their first contribution in https://github.com/cortexproject/cortex/pull/5746
@blorby made their first contribution in https://github.com/cortexproject/cortex/pull/5767
@CharlieTLe made their first contribution in https://github.com/cortexproject/cortex/pull/5784
@lekaf974 made their first contribution in https://github.com/cortexproject/cortex/pull/5793
@euniceek made their first contribution in https://github.com/cortexproject/cortex/pull/5794
@mustafain117 made their first contribution in https://github.com/cortexproject/cortex/pull/5823
@availhang made their first contribution in https://github.com/cortexproject/cortex/pull/5826
@erlan-z made their first contribution in https://github.com/cortexproject/cortex/pull/5827
@yj-yoo made their first contribution in https://github.com/cortexproject/cortex/pull/5775
@kindknow made their first contribution in https://github.com/cortexproject/cortex/pull/5856
@momantech made their first contribution in https://github.com/cortexproject/cortex/pull/5863
@till made their first contribution in https://github.com/cortexproject/cortex/pull/5874

Full Changelog: https://github.com/cortexproject/cortex/compare/v1.16.1...v1.17.0

v1.17.0-rc.1

1 month ago

Over v1.17.0-rc.0 to include one bug fix and one change.

[CHANGE] Ruler: Remove experimental.ruler.api-enable-rules-backup flag and use ruler.ring.replication-factor to check if rules backup is enabled. #5901
[BUGFIX] Fix random string used in test allocating 50MB in Cortex binary. #5903

v1.17.0-rc.0

1 month ago

This release contains 166 contributions from 29 contributors. We also have 16 new contributors. Thank you all for the contributions!

Some notable changes release are:

Experimental OTLP ingestion
Experimental minimize spread token generator strategy on Ingester
Advanced query scheduling with Query Priority
ListRules API high availability by rule group replication and backup
Various improvements on Store Gateway Index Cache
mem-ballast-size-bytes flag has been marked as deprecated and not functional anymore
-querier.ingester-streaming flag has been marked as deprecated and ingester streaming is always enabled now
querier.iterators and querier.batch-iterators flags have been marked as deprecated and batch iterator is always enabled in Querier now

Cortex

[CHANGE] Azure Storage: Upgraded objstore dependency and support Azure Workload Identity Authentication. Added connection_string to support authenticating via SAS token. Marked msi_resource config as deprecating. #5645
[CHANGE] Store Gateway: Add a new fastcache based inmemory index cache. #5619
[CHANGE] Index Cache: Multi level cache backfilling operation becomes async. Added -blocks-storage.bucket-store.index-cache.multilevel.max-async-concurrency and -blocks-storage.bucket-store.index-cache.multilevel.max-async-buffer-size configs and metric cortex_store_multilevel_index_cache_backfill_dropped_items_total for number of dropped items. #5661
[CHANGE] Ingester: Disable uploading compacted blocks and overlapping compaction in ingester. #5735
[CHANGE] Distributor: Count the number of rate-limited samples in distributor_samples_in_total. #5714
[CHANGE] Ruler: Remove cortex_ruler_write_requests_total, cortex_ruler_write_requests_failed_total, cortex_ruler_queries_total, cortex_ruler_queries_failed_total, and cortex_ruler_query_seconds_total metrics for the tenant when the ruler deletes the manager for the tenant. #5772
[CHANGE] Main: Mark mem-ballast-size-bytes flag as deprecated. #5816
[CHANGE] Querier: Mark -querier.ingester-streaming flag as deprecated. Now query ingester streaming is always enabled. #5817
[CHANGE] Compactor/Bucket Store: Added -blocks-storage.bucket-store.block-discovery-strategy to configure different block listing strategy. Reverted the current recursive block listing mechanism and use the strategy Concurrent as in 1.15. #5828
[CHANGE] Compactor: Don't halt compactor when overlapped source blocks detected. #5854
[CHANGE] S3 Bucket Client: Expose -blocks-storage.s3.send-content-md5 flag and set default checksum algorithm to MD5. #5870
[CHANGE] Querier: Mark querier.iterators and querier.batch-iterators flags as deprecated. Now querier always use batch iterators. #5868
[CHANGE] Query Frontend: Error response returned by Query Frontend now follows Prometheus API error response format. #5811
[FEATURE] Experimental: OTLP ingestion. #5813
[FEATURE] Query Frontend/Scheduler: Add query priority support. #5605
[FEATURE] Tracing: Use kuberesolver to resolve OTLP endpoints address with kubernetes:// prefix as Kubernetes service. #5731
[FEATURE] Tracing: Add tracing.otel.round-robin flag to use round_robin gRPC client side LB policy for sending OTLP traces. #5731
[FEATURE] Ruler: Add ruler.concurrent-evals-enabled flag to enable concurrent evaluation within a single rule group for independent rules. Maximum concurrency can be configured via ruler.max-concurrent-evals. #5766
[FEATURE] Distributor Queryable: Experimental: Add config zone_results_quorum_metadata. When querying ingesters using metadata APIs such as label names and values, only results from quorum number of zones will be included and merged. #5779
[FEATURE] Storage Cache Clients: Add config set_async_circuit_breaker_config to utilize the circuit breaker pattern for dynamically thresholding asynchronous set operations. Implemented in both memcached and redis cache clients. #5789
[FEATURE] Ruler: Add experimental experimental.ruler.api-deduplicate-rules flag to remove duplicate rule groups from the Prometheus compatible rules API endpoint. Add experimental ruler.ring.replication-factor and ruler.ring.zone-awareness-enabled flags to configure rule group replication, but only the first ruler in the replicaset evaluates the rule group, the rest will just hold a copy as backup. Add experimental experimental.ruler.api-enable-rules-backup flag to configure rulers to send the rule group backups stored in the replicaset to handle events when a ruler is down during an API request to list rules. #5782
[FEATURE] Ring: Add experimental -ingester.tokens-generator-strategy=minimize-spread flag to enable the new minimize spread token generator strategy. #5855
[FEATURE] Ring Status Page: Add Ownership Diff From Expected column in the ring table to indicate the extent to which the ownership of a specific ingester differs from the expected ownership. #5889
[ENHANCEMENT] Ingester: Add per-tenant new metric cortex_ingester_tsdb_data_replay_duration_seconds. #5477
[ENHANCEMENT] Store Gateway: Added -store-gateway.enabled-tenants and -store-gateway.disabled-tenants to explicitly enable or disable store-gateway for specific tenants. #5638
[ENHANCEMENT] Query Frontend: Write service timing header in response even though there is an error. #5653
[ENHANCEMENT] Compactor: Add new compactor metric cortex_compactor_start_duration_seconds. #5683
[ENHANCEMENT] Index Cache: Multi level cache adds config max_backfill_items to cap max items to backfill per async operation. #5686
[ENHANCEMENT] Query Frontend: Log number of split queries in query stats log. #5703
[ENHANCEMENT] Compactor: Skip compaction retry when encountering a permission denied error. #5727
[ENHANCEMENT] Logging: Added new options for logging HTTP request headers: -server.log-request-headers enables logging HTTP request headers, -server.log-request-headers-exclude-list allows users to specify headers which should not be logged. #5744
[ENHANCEMENT] Query Frontend/Scheduler: Time check in query priority now considers overall data select time window (including range selectors, modifiers and lookback delta). #5758
[ENHANCEMENT] Querier: Added querier.store-gateway-query-stats-enabled to enable or disable store gateway query stats log. #5749
[ENHANCEMENT] Querier: Improve labels APIs latency by merging slices using K-way merge and more than 1 core. #5785
[ENHANCEMENT] AlertManager: Retrying AlertManager Delete Silence on error. #5794
[ENHANCEMENT] Ingester: Add new ingester metric cortex_ingester_max_inflight_query_requests. #5798
[ENHANCEMENT] Query: Added query_storage_wall_time to Query Frontend and Ruler query stats log for wall time spent on fetching data from storage. Query evaluation is not included. #5799
[ENHANCEMENT] Query: Added additional max query length check at Query Frontend and Ruler. Added -querier.ignore-max-query-length flag to disable max query length check at Querier. #5808
[ENHANCEMENT] Querier: Add context error check when converting Metrics to SeriesSet for GetSeries on distributorQuerier. #5827
[ENHANCEMENT] Ruler: Improve GetRules response time by reducing lock contention and introducing a temporary rules cache in ruler/manager.go. #5805
[ENHANCEMENT] Querier: Add context error check when merging slices from ingesters for GetLabel operations. #5837
[BUGFIX] Distributor: Do not use label with empty values for sharding #5717
[BUGFIX] Query Frontend: queries with negative offset should check whether it is cacheable or not. #5719
[BUGFIX] Redis Cache: pass cache_size config correctly. #5734
[BUGFIX] Distributor: Shuffle-Sharding with ingestion_tenant_shard_size set to 0, default sharding strategy should be used. #5189
[BUGFIX] Cortex: Fix GRPC stream clients not honoring overrides for call options. #5797
[BUGFIX] Ruler: Fix support for keep_firing_for field in alert rules. #5823
[BUGFIX] Ring DDB: Fix lifecycle for ring counting unhealthy pods as healthy. #5838
[BUGFIX] Ring DDB: Fix region assignment. #5842

New Contributors

@testwill made their first contribution in https://github.com/cortexproject/cortex/pull/5644
@dsabsay made their first contribution in https://github.com/cortexproject/cortex/pull/5684
@pawarpranav83 made their first contribution in https://github.com/cortexproject/cortex/pull/5719
@Kramer0x0 made their first contribution in https://github.com/cortexproject/cortex/pull/5743
@tesla59 made their first contribution in https://github.com/cortexproject/cortex/pull/5746
@blorby made their first contribution in https://github.com/cortexproject/cortex/pull/5767
@CharlieTLe made their first contribution in https://github.com/cortexproject/cortex/pull/5784
@lekaf974 made their first contribution in https://github.com/cortexproject/cortex/pull/5793
@euniceek made their first contribution in https://github.com/cortexproject/cortex/pull/5794
@mustafain117 made their first contribution in https://github.com/cortexproject/cortex/pull/5823
@availhang made their first contribution in https://github.com/cortexproject/cortex/pull/5826
@erlan-z made their first contribution in https://github.com/cortexproject/cortex/pull/5827
@yj-yoo made their first contribution in https://github.com/cortexproject/cortex/pull/5775
@kindknow made their first contribution in https://github.com/cortexproject/cortex/pull/5856
@momantech made their first contribution in https://github.com/cortexproject/cortex/pull/5863
@till made their first contribution in https://github.com/cortexproject/cortex/pull/5874

Full Changelog: https://github.com/cortexproject/cortex/compare/v1.16.1...v1.17.0-rc.0

v1.16.1

1 month ago

This release includes two security fixes:

[ENHANCEMENT] Upgraded Docker base images to alpine:3.18. #5684
[ENHANCEMENT] Upgrade to go 1.21.9 #5879 #5882

v1.16.0

6 months ago

This release contains 227 contributions from 27 contributors. We also have 10 new contributors. Thank you all for the contribution!

Some notable changes release are:

Store Gateway multilevel index cache
Object storage backend for runtime config
Disable specific rule groups in Ruler
List rules supports filtering by rule name, rule group and file
Allow tenant shard size to be a percent of total instances for Querier and Store Gateway
Various improvement on metrics

Cortex

[CHANGE] AlertManager: include reason label in cortex_alertmanager_notifications_failed_total. #5409
[CHANGE] Ruler: Added user label to cortex_ruler_write_requests_total, cortex_ruler_write_requests_failed_total, cortex_ruler_queries_total, and cortex_ruler_queries_failed_total metrics. #5312
[CHANGE] Alertmanager: Validating new fields on the PagerDuty AM config. #5290
[CHANGE] Ingester: Creating label native-histogram-sample on the cortex_discarded_samples_total to keep track of discarded native histogram samples. #5289
[CHANGE] Store Gateway: Rename cortex_bucket_store_cached_postings_compression_time_seconds to cortex_bucket_store_cached_postings_compression_time_seconds_total. #5431
[CHANGE] Store Gateway: Rename cortex_bucket_store_cached_series_fetch_duration_seconds to cortex_bucket_store_series_fetch_duration_seconds and cortex_bucket_store_cached_postings_fetch_duration_seconds to cortex_bucket_store_postings_fetch_duration_seconds. Add new metric cortex_bucket_store_chunks_fetch_duration_seconds. #5448
[CHANGE] Store Gateway: Remove idle_timeout, max_conn_age, pool_size, min_idle_conns fields for Redis index cache and caching bucket. #5448
[CHANGE] Store Gateway: Add flag -store-gateway.sharding-ring.zone-stable-shuffle-sharding to enable store gateway to use zone stable shuffle sharding. #5489
[CHANGE] Bucket Index: Add series_max_size and chunk_max_size to bucket index. #5489
[CHANGE] StoreGateway: Rename cortex_bucket_store_chunk_pool_returned_bytes_total and cortex_bucket_store_chunk_pool_requested_bytes_total to cortex_bucket_store_chunk_pool_operation_bytes_total. #5552
[CHANGE] Query Frontend/Querier: Make build info API disabled by default and add feature flag api.build-info-enabled to enable it. #5533
[CHANGE] Purger: Do no use S3 tenant kms key when uploading deletion marker. #5575
[CHANGE] Ingester: Shipper always allows uploading compacted blocks to ship OOO compacted blocks. #5625
[CHANGE] DDBKV: Change metric name from dynamodb_kv_read_capacity_total to dynamodb_kv_consumed_capacity_total and include Delete, Put, Batch dimension. #5487
[CHANGE] Compactor: Adding the userId on the compact dir path. #5524
[CHANGE] Ingester: Remove deprecated ingester metrics. #5472
[CHANGE] Query Frontend: Expose -querier.max-subquery-steps to configure subquery max steps check. By default, the limit is set to 0, which is disabled. #5656
[FEATURE] Store Gateway: Implementing multi level index cache. #5451
[FEATURE] Ruler: Add support for disabling rule groups. #5521
[FEATURE] Support object storage backends for runtime configuration file. #5292
[FEATURE] Ruler: Add support for Limit field on RuleGroup. #5528
[FEATURE] AlertManager: Add support for Webex, Discord and Telegram Receiver. #5493
[FEATURE] Ingester: added -admin-limit-message to customize the message contained in limit errors.#5460
[FEATURE] AlertManager: Update version to v0.26.0 and bring in Microsoft Teams receiver. #5543
[FEATURE] Store Gateway: Support lazy expanded posting optimization. Added new flag blocks-storage.bucket-store.lazy-expanded-postings-enabled and new metrics cortex_bucket_store_lazy_expanded_postings_total, cortex_bucket_store_lazy_expanded_posting_size_bytes_total and cortex_bucket_store_lazy_expanded_posting_series_overfetched_size_bytes_total. #5556.
[FEATURE] Store Gateway: Add max_downloaded_bytes_per_request to limit max bytes to download per store gateway request. #5179
[FEATURE] Added 2 flags -alertmanager.alertmanager-client.grpc-max-send-msg-size and -alertmanager.alertmanager-client.grpc-max-recv-msg-size to configure alert manager grpc client message size limits. #5338
[FEATURE] Querier/StoreGateway: Allow the tenant shard sizes to be a percent of total instances. #5393
[FEATURE] Added the flag -alertmanager.api-concurrency to configure alert manager api concurrency limit. #5412
[FEATURE] Store Gateway: Add -store-gateway.sharding-ring.keep-instance-in-the-ring-on-shutdown to skip unregistering instance from the ring in shutdown. #5421
[FEATURE] Ruler: Support for filtering rules in the API. #5417
[FEATURE] Compactor: Add -compactor.ring.tokens-file-path to store generated tokens locally. #5432
[FEATURE] Query Frontend: Add -frontend.retry-on-too-many-outstanding-requests to re-enqueue 429 requests if there are multiple query-schedulers available. #5496
[FEATURE] Store Gateway: Add -blocks-storage.bucket-store.max-inflight-requests for store gateways to reject further series requests upon reaching the limit. #5553
[FEATURE] Store Gateway: Support filtered index cache. #5587
[ENHANCEMENT] Update go version to 1.21.3. #5630
[ENHANCEMENT] Store Gateway: Add cortex_bucket_store_block_load_duration_seconds histogram to track time to load blocks. #5580
[ENHANCEMENT] Querier: retry chunk pool exhaustion error in querier rather than query frontend. #5569
[ENHANCEMENT] Alertmanager: Added flag -alertmanager.alerts-gc-interval to configure alerts Garbage collection interval. #5550
[ENHANCEMENT] Query Frontend: enable vertical sharding on binary expr . #5507
[ENHANCEMENT] Query Frontend: Include user agent as part of query frontend log. #5450
[ENHANCEMENT] Query: Set CORS Origin headers for Query API #5388
[ENHANCEMENT] Query Frontend: Add cortex_rejected_queries_total metric for throttled queries. #5356
[ENHANCEMENT] Query Frontend: Optimize the decoding of SampleStream. #5349
[ENHANCEMENT] Compactor: Check ctx done when uploading visit marker. #5333
[ENHANCEMENT] AlertManager: Add cortex_alertmanager_dispatcher_aggregation_groups and cortex_alertmanager_dispatcher_alert_processing_duration_seconds metrics for dispatcher. #5592
[ENHANCEMENT] Store Gateway: Added new flag blocks-storage.bucket-store.series-batch-size to control how many series to fetch per batch in Store Gateway. #5582.
[ENHANCEMENT] Querier: Log query stats when querying store gateway. #5376
[ENHANCEMENT] Ruler: Add cortex_ruler_rule_group_load_duration_seconds and cortex_ruler_rule_group_sync_duration_seconds metrics. #5609
[ENHANCEMENT] Ruler: Add contextual info and query statistics to log #5604
[ENHANCEMENT] Distributor/Ingester: Add span on push path #5319
[ENHANCEMENT] Query Frontend: Reject subquery with too small step size. #5323
[ENHANCEMENT] Compactor: Exposing Thanos accept-malformed-index to Cortex compactor. #5334
[ENHANCEMENT] Log: Avoid expensive log.Valuer evaluation for disallowed levels. #5297
[ENHANCEMENT] Improving Performance on the API Gzip Handler. #5347
[ENHANCEMENT] Dynamodb: Add puller-sync-time to allow different pull time for ring. #5357
[ENHANCEMENT] Emit querier max_concurrent as a metric. #5362
[ENHANCEMENT] Avoid sort tokens on lifecycler autoJoin. #5394
[ENHANCEMENT] Do not resync blocks in running store gateways during rollout deployment and container restart. #5363
[ENHANCEMENT] Store Gateway: Add new metrics cortex_bucket_store_sent_chunk_size_bytes, cortex_bucket_store_postings_size_bytes and cortex_bucket_store_empty_postings_total. #5397
[ENHANCEMENT] Add jitter to lifecycler heartbeat. #5404
[ENHANCEMENT] Store Gateway: Add config estimated_max_series_size_bytes and estimated_max_chunk_size_bytes to address data overfetch. #5401
[ENHANCEMENT] Distributor/Ingester: Add experimental -distributor.sign_write_requests flag to sign the write requests. #5430
[ENHANCEMENT] Store Gateway/Querier/Compactor: Handling CMK Access Denied errors. #5420 #5442 #5446
[ENHANCEMENT] Alertmanager: Add the alert name in error log when it get throttled. #5456
[ENHANCEMENT] Querier: Retry store gateway on different zones when zone awareness is enabled. #5476
[ENHANCEMENT] Compactor: allow unregister_on_shutdown to be configurable. #5503
[ENHANCEMENT] Querier: Batch adding series to query limiter to optimize locking. #5505
[ENHANCEMENT] Store Gateway: add metric cortex_bucket_store_chunk_refetches_total for number of chunk refetches. #5532
[ENHANCEMENT] BasicLifeCycler: allow final-sleep during shutdown #5517
[ENHANCEMENT] All: Handling CMK Access Denied errors. #5420 #5542
[ENHANCEMENT] Querier: Retry store gateway client connection closing gRPC error. #5558
[ENHANCEMENT] QueryFrontend: Add generic retry for all APIs. #5561.
[ENHANCEMENT] Querier: Check context before notifying scheduler and frontend. #5565
[ENHANCEMENT] QueryFrontend: Add metric for number of series requests. #5373
[ENHANCEMENT] Store Gateway: Add histogram metrics for total time spent fetching series and chunks per request. #5573
[ENHANCEMENT] Store Gateway: Check context in multi level cache. Add cortex_store_multilevel_index_cache_fetch_duration_seconds and cortex_store_multilevel_index_cache_backfill_duration_seconds to measure fetch and backfill latency. #5596
[ENHANCEMENT] Ingester: Added new ingester TSDB metrics cortex_ingester_tsdb_head_samples_appended_total, cortex_ingester_tsdb_head_out_of_order_samples_appended_total, cortex_ingester_tsdb_snapshot_replay_error_total, cortex_ingester_tsdb_sample_ooo_delta and cortex_ingester_tsdb_mmap_chunks_total. #5624
[ENHANCEMENT] Query Frontend: Handle context error before decoding and merging responses. #5499
[ENHANCEMENT] Store-Gateway and AlertManager: Add a wait_instance_time_out to context to avoid waiting forever. #5581
[BUGFIX] Compactor: Fix possible division by zero during compactor config validation. #5535
[BUGFIX] Ruler: Validate if rule group can be safely converted back to rule group yaml from protobuf message #5265
[BUGFIX] Querier: Convert gRPC ResourceExhausted status code from store gateway to 422 limit error. #5286
[BUGFIX] Alertmanager: Route web-ui requests to the alertmanager distributor when sharding is enabled. #5293
[BUGFIX] Storage: Bucket index updater should ignore meta not found for partial blocks. #5343
[BUGFIX] Ring: Add JOINING state to read operation. #5346
[BUGFIX] Compactor: Partial block with only visit marker should be deleted even there is no deletion marker. #5342
[BUGFIX] KV: Etcd calls will no longer block indefinitely and will now time out after the DialTimeout period. #5392
[BUGFIX] Ring: Allow RF greater than number of zones to select more than one instance per zone #5411
[BUGFIX] Store Gateway: Fix bug in store gateway ring comparison logic. #5426
[BUGFIX] Ring: Fix bug in consistency of Get func in a scaling zone-aware ring. #5429
[BUGFIX] Compactor: Fix retry on markers. #5441
[BUGFIX] Query Frontend: Fix bug of failing to cancel downstream request context in query frontend v2 mode (query scheduler enabled). #5447
[BUGFIX] Alertmanager: Remove the user id from state replication key metric label value. #5453
[BUGFIX] Compactor: Avoid cleaner concurrency issues checking global markers before all blocks. #5457
[BUGFIX] DDBKV: Disallow instance with older timestamp to update instance with newer timestamp. #5480
[BUGFIX] DDBKV: When no change detected in ring, retry the CAS until there is change. #5502
[BUGFIX] Fix bug on objstore when configured to use S3 fips endpoints. #5540
[BUGFIX] Ruler: Fix bug on ruler where a failure to load a single RuleGroup would prevent rulers to sync all RuleGroup. #5563
[BUGFIX] Query Frontend: Fix query string being omitted in query stats log. #5655

v1.16.0-rc.1

6 months ago

Over v1.16.0-rc.0 to include one bug fix and one change.

[CHANGE] Query Frontend: Expose -querier.max-subquery-steps to configure subquery max steps check. By default, the limit is set to 0, which is disabled. #5656
[BUGFIX] Query Frontend: Fix query string being omitted in query stats log. #5655

v1.16.0-rc.0

6 months ago

This release contains 227 contributions from 27 contributors. We also have 10 new contributors. Thank you all for the contribution!

Some notable changes release are:

Store Gateway multilevel index cache
Object storage backend for runtime config
Disable specific rule groups in Ruler
List rules supports filtering by rule name, rule group and file
Allow tenant shard size to be a percent of total instances for Querier and Store Gateway
Various improvement on metrics

Cortex

[CHANGE] AlertManager: include reason label in cortex_alertmanager_notifications_failed_total. #5409
[CHANGE] Ruler: Added user label to cortex_ruler_write_requests_total, cortex_ruler_write_requests_failed_total, cortex_ruler_queries_total, and cortex_ruler_queries_failed_total metrics. #5312
[CHANGE] Alertmanager: Validating new fields on the PagerDuty AM config. #5290
[CHANGE] Ingester: Creating label native-histogram-sample on the cortex_discarded_samples_total to keep track of discarded native histogram samples. #5289
[CHANGE] Store Gateway: Rename cortex_bucket_store_cached_postings_compression_time_seconds to cortex_bucket_store_cached_postings_compression_time_seconds_total. #5431
[CHANGE] Store Gateway: Rename cortex_bucket_store_cached_series_fetch_duration_seconds to cortex_bucket_store_series_fetch_duration_seconds and cortex_bucket_store_cached_postings_fetch_duration_seconds to cortex_bucket_store_postings_fetch_duration_seconds. Add new metric cortex_bucket_store_chunks_fetch_duration_seconds. #5448
[CHANGE] Store Gateway: Remove idle_timeout, max_conn_age, pool_size, min_idle_conns fields for Redis index cache and caching bucket. #5448
[CHANGE] Store Gateway: Add flag -store-gateway.sharding-ring.zone-stable-shuffle-sharding to enable store gateway to use zone stable shuffle sharding. #5489
[CHANGE] Bucket Index: Add series_max_size and chunk_max_size to bucket index. #5489
[CHANGE] StoreGateway: Rename cortex_bucket_store_chunk_pool_returned_bytes_total and cortex_bucket_store_chunk_pool_requested_bytes_total to cortex_bucket_store_chunk_pool_operation_bytes_total. #5552
[CHANGE] Query Frontend/Querier: Make build info API disabled by default and add feature flag api.build-info-enabled to enable it. #5533
[CHANGE] Purger: Do no use S3 tenant kms key when uploading deletion marker. #5575
[CHANGE] Ingester: Shipper always allows uploading compacted blocks to ship OOO compacted blocks. #5625
[CHANGE] DDBKV: Change metric name from dynamodb_kv_read_capacity_total to dynamodb_kv_consumed_capacity_total and include Delete, Put, Batch dimension. #5487
[CHANGE] Compactor: Adding the userId on the compact dir path. #5524
[CHANGE] Ingester: Remove deprecated ingester metrics. #5472
[FEATURE] Store Gateway: Implementing multi level index cache. #5451
[FEATURE] Ruler: Add support for disabling rule groups. #5521
[FEATURE] Support object storage backends for runtime configuration file. #5292
[FEATURE] Ruler: Add support for Limit field on RuleGroup. #5528
[FEATURE] AlertManager: Add support for Webex, Discord and Telegram Receiver. #5493
[FEATURE] Ingester: added -admin-limit-message to customize the message contained in limit errors.#5460
[FEATURE] AlertManager: Update version to v0.26.0 and bring in Microsoft Teams receiver. #5543
[FEATURE] Store Gateway: Support lazy expanded posting optimization. Added new flag blocks-storage.bucket-store.lazy-expanded-postings-enabled and new metrics cortex_bucket_store_lazy_expanded_postings_total, cortex_bucket_store_lazy_expanded_posting_size_bytes_total and cortex_bucket_store_lazy_expanded_posting_series_overfetched_size_bytes_total. #5556.
[FEATURE] Store Gateway: Add max_downloaded_bytes_per_request to limit max bytes to download per store gateway request. #5179
[FEATURE] Added 2 flags -alertmanager.alertmanager-client.grpc-max-send-msg-size and -alertmanager.alertmanager-client.grpc-max-recv-msg-size to configure alert manager grpc client message size limits. #5338
[FEATURE] Querier/StoreGateway: Allow the tenant shard sizes to be a percent of total instances. #5393
[FEATURE] Added the flag -alertmanager.api-concurrency to configure alert manager api concurrency limit. #5412
[FEATURE] Store Gateway: Add -store-gateway.sharding-ring.keep-instance-in-the-ring-on-shutdown to skip unregistering instance from the ring in shutdown. #5421
[FEATURE] Ruler: Support for filtering rules in the API. #5417
[FEATURE] Compactor: Add -compactor.ring.tokens-file-path to store generated tokens locally. #5432
[FEATURE] Query Frontend: Add -frontend.retry-on-too-many-outstanding-requests to re-enqueue 429 requests if there are multiple query-schedulers available. #5496
[FEATURE] Store Gateway: Add -blocks-storage.bucket-store.max-inflight-requests for store gateways to reject further series requests upon reaching the limit. #5553
[FEATURE] Store Gateway: Support filtered index cache. #5587
[ENHANCEMENT] Update go version to 1.21.3. #5630
[ENHANCEMENT] Store Gateway: Add cortex_bucket_store_block_load_duration_seconds histogram to track time to load blocks. #5580
[ENHANCEMENT] Querier: retry chunk pool exhaustion error in querier rather than query frontend. #5569
[ENHANCEMENT] Alertmanager: Added flag -alertmanager.alerts-gc-interval to configure alerts Garbage collection interval. #5550
[ENHANCEMENT] Query Frontend: enable vertical sharding on binary expr . #5507
[ENHANCEMENT] Query Frontend: Include user agent as part of query frontend log. #5450
[ENHANCEMENT] Query: Set CORS Origin headers for Query API #5388
[ENHANCEMENT] Query Frontend: Add cortex_rejected_queries_total metric for throttled queries. #5356
[ENHANCEMENT] Query Frontend: Optimize the decoding of SampleStream. #5349
[ENHANCEMENT] Compactor: Check ctx done when uploading visit marker. #5333
[ENHANCEMENT] AlertManager: Add cortex_alertmanager_dispatcher_aggregation_groups and cortex_alertmanager_dispatcher_alert_processing_duration_seconds metrics for dispatcher. #5592
[ENHANCEMENT] Store Gateway: Added new flag blocks-storage.bucket-store.series-batch-size to control how many series to fetch per batch in Store Gateway. #5582.
[ENHANCEMENT] Querier: Log query stats when querying store gateway. #5376
[ENHANCEMENT] Ruler: Add cortex_ruler_rule_group_load_duration_seconds and cortex_ruler_rule_group_sync_duration_seconds metrics. #5609
[ENHANCEMENT] Ruler: Add contextual info and query statistics to log #5604
[ENHANCEMENT] Distributor/Ingester: Add span on push path #5319
[ENHANCEMENT] Query Frontend: Reject subquery with too small step size. #5323
[ENHANCEMENT] Compactor: Exposing Thanos accept-malformed-index to Cortex compactor. #5334
[ENHANCEMENT] Log: Avoid expensive log.Valuer evaluation for disallowed levels. #5297
[ENHANCEMENT] Improving Performance on the API Gzip Handler. #5347
[ENHANCEMENT] Dynamodb: Add puller-sync-time to allow different pull time for ring. #5357
[ENHANCEMENT] Emit querier max_concurrent as a metric. #5362
[ENHANCEMENT] Avoid sort tokens on lifecycler autoJoin. #5394
[ENHANCEMENT] Do not resync blocks in running store gateways during rollout deployment and container restart. #5363
[ENHANCEMENT] Store Gateway: Add new metrics cortex_bucket_store_sent_chunk_size_bytes, cortex_bucket_store_postings_size_bytes and cortex_bucket_store_empty_postings_total. #5397
[ENHANCEMENT] Add jitter to lifecycler heartbeat. #5404
[ENHANCEMENT] Store Gateway: Add config estimated_max_series_size_bytes and estimated_max_chunk_size_bytes to address data overfetch. #5401
[ENHANCEMENT] Distributor/Ingester: Add experimental -distributor.sign_write_requests flag to sign the write requests. #5430
[ENHANCEMENT] Store Gateway/Querier/Compactor: Handling CMK Access Denied errors. #5420 #5442 #5446
[ENHANCEMENT] Alertmanager: Add the alert name in error log when it get throttled. #5456
[ENHANCEMENT] Querier: Retry store gateway on different zones when zone awareness is enabled. #5476
[ENHANCEMENT] Compactor: allow unregister_on_shutdown to be configurable. #5503
[ENHANCEMENT] Querier: Batch adding series to query limiter to optimize locking. #5505
[ENHANCEMENT] Store Gateway: add metric cortex_bucket_store_chunk_refetches_total for number of chunk refetches. #5532
[ENHANCEMENT] BasicLifeCycler: allow final-sleep during shutdown #5517
[ENHANCEMENT] All: Handling CMK Access Denied errors. #5420 #5542
[ENHANCEMENT] Querier: Retry store gateway client connection closing gRPC error. #5558
[ENHANCEMENT] QueryFrontend: Add generic retry for all APIs. #5561.
[ENHANCEMENT] Querier: Check context before notifying scheduler and frontend. #5565
[ENHANCEMENT] QueryFrontend: Add metric for number of series requests. #5373
[ENHANCEMENT] Store Gateway: Add histogram metrics for total time spent fetching series and chunks per request. #5573
[ENHANCEMENT] Store Gateway: Check context in multi level cache. Add cortex_store_multilevel_index_cache_fetch_duration_seconds and cortex_store_multilevel_index_cache_backfill_duration_seconds to measure fetch and backfill latency. #5596
[ENHANCEMENT] Ingester: Added new ingester TSDB metrics cortex_ingester_tsdb_head_samples_appended_total, cortex_ingester_tsdb_head_out_of_order_samples_appended_total, cortex_ingester_tsdb_snapshot_replay_error_total, cortex_ingester_tsdb_sample_ooo_delta and cortex_ingester_tsdb_mmap_chunks_total. #5624
[ENHANCEMENT] Query Frontend: Handle context error before decoding and merging responses. #5499
[ENHANCEMENT] Store-Gateway and AlertManager: Add a wait_instance_time_out to context to avoid waiting forever. #5581
[BUGFIX] Compactor: Fix possible division by zero during compactor config validation. #5535
[BUGFIX] Ruler: Validate if rule group can be safely converted back to rule group yaml from protobuf message #5265
[BUGFIX] Querier: Convert gRPC ResourceExhausted status code from store gateway to 422 limit error. #5286
[BUGFIX] Alertmanager: Route web-ui requests to the alertmanager distributor when sharding is enabled. #5293
[BUGFIX] Storage: Bucket index updater should ignore meta not found for partial blocks. #5343
[BUGFIX] Ring: Add JOINING state to read operation. #5346
[BUGFIX] Compactor: Partial block with only visit marker should be deleted even there is no deletion marker. #5342
[BUGFIX] KV: Etcd calls will no longer block indefinitely and will now time out after the DialTimeout period. #5392
[BUGFIX] Ring: Allow RF greater than number of zones to select more than one instance per zone #5411
[BUGFIX] Store Gateway: Fix bug in store gateway ring comparison logic. #5426
[BUGFIX] Ring: Fix bug in consistency of Get func in a scaling zone-aware ring. #5429
[BUGFIX] Compactor: Fix retry on markers. #5441
[BUGFIX] Query Frontend: Fix bug of failing to cancel downstream request context in query frontend v2 mode (query scheduler enabled). #5447
[BUGFIX] Alertmanager: Remove the user id from state replication key metric label value. #5453
[BUGFIX] Compactor: Avoid cleaner concurrency issues checking global markers before all blocks. #5457
[BUGFIX] DDBKV: Disallow instance with older timestamp to update instance with newer timestamp. #5480
[BUGFIX] DDBKV: When no change detected in ring, retry the CAS until there is change. #5502
[BUGFIX] Fix bug on objstore when configured to use S3 fips endpoints. #5540
[BUGFIX] Ruler: Fix bug on ruler where a failure to load a single RuleGroup would prevent rulers to sync all RuleGroup. #5563

v1.15.3

11 months ago

This release includes:

Distributor: Fix potential data corruption in cases of timeout between distributors and ingesters. #5422

v1.15.2

1 year ago

This release includes Go runtime upgrade to 1.20.4 to address critical CVE.

[ENHANCEMENT] Update Go version to 1.20.4. #5299

v1.15.1

1 year ago

This release includes:

[CHANGE] Alertmanager: Validating new fields on the PagerDuty AM config. #5290
[BUGFIX] Querier: Convert gRPC ResourceExhausted status code from store gateway to 422 limit error. #5286