metrics2.0 based, multi-tenant timeseries store for Graphite and friends.
as of v0.13.1-38-gb88c3b84 by default we reject data points with a timestamp far in the future.
By default the cutoff is at 10% of the raw retention's TTL, so for example with the default
storage schema 1s:35d:10min:7
the cutoff is at 35d*0.1=3.5d
.
The limit can be configured by using the parameter retention.future-tolerance-ratio
, or it can
be completely disabled by using the parameter retention.enforce-future-tolerance
.
To predict whether Metrictank would drop incoming data points once the enforcement is turned on,
the metric tank.sample-too-far-ahead
can be used, this metric counts the data points which
would be dropped if the enforcement were turned on while it is off.
#1572
Prometheus integration removal. As of v0.13.1-97-gd77c5a31, it is no longer possible to use metrictank to scrape prometheus data, or query data via Promql. There was not enough usage (or customer interest) to keep maintaining this functionality. #1613
as of v0.13.1-110-g6b6f475a tag support is enabled by default, it can still be disabled though. This means if previously metrics with tags have been ingested while tag support was disabled, then those tags would have been treated as a normal part of the metric name, when tag support now gets enabled due to this change then the tags would be treated as tags and they wouldn't be part of the metric name anymore. As a result there is a very unlikely scenario in which some queries don't return the same results as before, if they query for tags as part of the metric name. (note: meta tags still disabled by default) #1619
as of v0.13.1-186-gc75005d the /tags/delSeries
no longer accepts a propagate
parameter.
It is no longer possible to send the request to only a single node, it now always propagates to all nodes, bringing this method in line with /metrics/delete
.
as of v0.13.1-250-g21d1dcd1 (#951) metrictank no longer excessively aligns all data to the same lowest comon multiple resolution, but rather keeps data at their native resolution when possible.
Thus, to upgrade an existing cluster, you have 2 options: A) disable pre-normalization, do an in-place upgrade. enable it, do another in-place upgrade. This works regardless of whether you have a separate query peers, and regardless of whether you first upgrade query or shard nodes. B) do a colored deployment: create a new gossip cluster that has the optimization enabled from the get-go, then delete the older deployment.
as of v0.13.1-384-g82dedf95 the meta record index configuration parameters have been moved out
of the section cassandra-idx
, they now have their own section cassandra-meta-record-idx
.
as of v0.13.1-433-g4c801819, metrictank proxies bad requests to graphite.
though as of v0.13.1-577-g07eed80f this is configurable via the http.proxy-bad-requests
flag.
Leave enabled if your queries are in the grey zone (rejected by MT, tolerated by graphite),
disable if you don't like the additional latency.
The aspiration is to remove this entire feature once we work out any more kinks in metrictank's request validation.
as of v0.13.1-788-g79e4709 (see: #1831) the option reject-invalid-tags
was removed. Another option named reject-invalid-input
was added to take its place, and the default value is set to true
. This new option rejects invalid tags and invalid UTF8 data found in either the metric name or the tag key or tag value. The exported stat input.xx.metricdata.discarded.invalid_tag
was also changed to input.xx.metricdata.discarded.invalid_input
, so dashboards will need to be updated accordingly.
/tags/terms
query to get counts of tag values #1582schema_table
and schema_archive_table
sections in the template name should have 2 %s
sections which will be
expanded to the keyspace
and table
, or keyspace
and archive-table
settings respectively configured under cassandra-idx
of the metrictank config file.input.reject-invalid-tags
flag.
if you're unsure whether you're currently sending invalid tags, it's a good idea to first disable the invalid tag rejection and watch the
new counter called input.<input name>.metricdata.discarded.invalid_tag
, if invalid tags get ingested this counter will increase without
rejecting them. once you're sure that you don't ingest invalid tags you can enable rejection to enforce the validation.
more information on #1348tracing-enabled -> jaeger.enabled
tracing-addr -> jaeger.agent-addr
tracing-add-tags -> jaeger.add-tags
(now also key=value instead of key:value)This release includes the "query layer" functionality. Versions prior to v0.11.0-184-g293b55b9 cannot handle query nodes joining the cluster and will crash. To deploy the new query nodes and introduce them into the cluster, you must first upgrade all other nodes to this version (or later) Also, regarding cluster.mode:
since v0.11.0-169-g59ebb227, kafka-version now defaults to 2.0.0 instead of 0.10.0.0. Make sure to set this to a proper version if your brokers are not at least at version 2.0.0. See #1221
since v0.11.0-233-gcf24c43a, if queries need rollup data, but asked for a consolidateBy() without matching rollup aggregation we pick the most approriate rollup from what is available.
since v0.11.0-252-ga1e41192, remove log-min-dur flag, it was no longer used. #1275
Since v0.11.0-285-g4c862d8c, duplicate points are now always rejected, even with the reorder buffer enabled. Note from the future: this was undone in v0.13.0-188-g6cd12d6, see future notes about reorderBufferAllowUpdate
with our previous chunk format, when both:
the encoded delta became corrupted and reading the chunk results in incorrect data. This release brings a remediation to recover the data at read time, as well as a new chunk format that does not suffer from the issue. The new chunks are also about 9 bytes shorter in the typical case. While metrictank now writes to the store exclusively using the new format, it can read from the store in any of the formats. This means readers should be upgraded before writers, to avoid the situation where an old reader cannot parse the chunk written by a newer writer during an upgrade. See #1126, #1129
we now use logrus for logging #1056, #1083 Log levels are now strings, not integers. See the updated config file
index pruning is now configurable via index-rules.conf #924, #1120
We no longer use a max-stale
setting in the cassandra-idx
section,
and instead gained an index-rules-conf
setting.
The NSQ cluster notifier has been removed. NSQ is a delight to work with, but we could only use it for a small portion of our clustering needs, requiring Kafka anyway for data ingestion and distribution. We've been using Kafka for years and neglected the NSQ notifier code, so it's time to rip it out. See #1161
the offset manager for the kafka input / notifier plugin has been removed since there was no need for it.
offset=last
is thus no longer valid. See #1110
metrics_active
for scraping by prometheus #1160metric-max-stale
#1175, #1176min-stale
option, rename max-age
to max-stale
#1064NameWithTags()
callable from template format #1157GO*
variables as MT_GO*
#1044$(go list ./... | grep -v /vendor/)
#1050There was a bug in 0.9 which caused instances to incorrectly encode Id's for tracking saved rollup chunks, which in some cases could cause data loss when write nodes restarted and would overwrite rollup chunks with partial chunks. Because of this, we strongly recommend upgrading to this version.
support for new MetricPoint optimized data format in the kafka mdm topic, resulting in less kafka io, disk usage, GC workload, metrictank and kafka cpu usage, faster backfills. #876 , #885 #890, #891, #894, #911 this also comes with:
public-org
config setting. #880, #883