A library that provides an embeddable, persistent key-value store for fast storage.
SstFileMetaData
to prevent throwing java.lang.NoSuchMethodError
ColumnFamilyOptions::max_successive_merges > 0
where the CPU overhead for deciding whether to merge could have increased unless the user had set the option ColumnFamilyOptions::strict_max_successive_merges
GetMergeOperandsOptions::continue_cb
, to give users the ability to end GetMergeOperands()
's lookup process before all merge operands were found.default_write_temperature
CF option and opening an SstFileWriter
with a temperature.WriteBatchWithIndex
now supports wide-column point lookups via the GetEntityFromBatch
API. See the API comments for more details.Iterator::GetProperty("rocksdb.iterator.write-time")
to allow users to get data's approximate write unix time and write data with a specific write time via WriteBatch::TimedPut
API.best_efforts_recovery == true
) may now be used together with atomic flush (atomic_flush == true
). The all-or-nothing recovery guarantee for atomically flushed data will be upheld.bottommost_temperature
, already replaced by last_level_temperature
WriteCommittedTransaction::GetForUpdate
, if the column family enables user-defined timestamp, it was mandated that argument do_validate
cannot be false, and UDT based validation has to be done with a user set read timestamp. It's updated to make the UDT based validation optional if user sets do_validate
to false and does not set a read timestamp. With this, GetForUpdate
skips UDT based validation and it's users' responsibility to enforce the UDT invariant. SO DO NOT skip this UDT-based validation if users do not have ways to enforce the UDT invariant. Ways to enforce the invariant on the users side include manage a monotonically increasing timestamp, commit transactions in a single thread etc.kEnableWait
to measure time spent by user threads blocked in RocksDB other than mutex, such as a write thread waiting to be added to a write group, a write thread delayed or stalled etc.RateLimiter
's API no longer requires the burst size to be the refill size. Users of NewGenericRateLimiter()
can now provide burst size in single_burst_bytes
. Implementors of RateLimiter::SetSingleBurstBytes()
need to adapt their implementations to match the changed API doc.write_memtable_time
to the newly introduced PerfLevel kEnableWait
.RateLimiter
s created by NewGenericRateLimiter()
no longer modify the refill period when SetSingleBurstBytes()
is called.ColumnFamilyOptions::max_successive_merges
when the key's merge operands are all found in memory, unless strict_max_successive_merges
is explicitly set.kBlockCacheTier
reads to return Status::Incomplete
when I/O is needed to fetch a merge chain's base value from a blob file.kBlockCacheTier
reads to return Status::Incomplete
on table cache miss rather than incorrectly returning an empty value.multiGet()
variants now take advantage of the underlying batched multiGet()
performance improvements.
BeforeBenchmark (columnFamilyTestType) (keyCount) (keySize) (multiGetSize) (valueSize) Mode Cnt Score Error Units
MultiGetBenchmarks.multiGetList10 no_column_family 10000 16 100 64 thrpt 25 6315.541 ± 8.106 ops/s
MultiGetBenchmarks.multiGetList10 no_column_family 10000 16 100 1024 thrpt 25 6975.468 ± 68.964 ops/s
After
Benchmark (columnFamilyTestType) (keyCount) (keySize) (multiGetSize) (valueSize) Mode Cnt Score Error Units
MultiGetBenchmarks.multiGetList10 no_column_family 10000 16 100 64 thrpt 25 7046.739 ± 13.299 ops/s
MultiGetBenchmarks.multiGetList10 no_column_family 10000 16 100 1024 thrpt 25 7654.521 ± 60.121 ops/s
SstFileWriter
create SST files without persisting user defined timestamps when the Option.persist_user_defined_timestamps
flag is set to false.DeleteFilesInRanges
and GetPropertiesOfTablesInRange
.access_hint_on_compaction_start
ColumnFamilyOptions::check_flush_compaction_key_order
WritableFile::GetFileSize
and FSWritableFile::GetFileSize
implementation that returns 0 and make it pure virtual, so that subclasses are enforced to explicitly provide an implementation.ColumnFamilyOptions::level_compaction_dynamic_file_size
EnableFileDeletions
API because it is unsafe with no known legitimate use.ColumnFamilyOptions::ignore_max_compaction_bytes_for_input
sst_dump --command=check
now compares the number of records in a table with num_entries
in table property, and reports corruption if there is a mismatch. API SstFileDumper::ReadSequential()
is updated to optionally do this verification. (#12322)DBImpl::RenameTempFileToOptionsFile
.rocksdb.sst.write.micros
measures time of each write to SST file; rocksdb.file.write.{flush|compaction|db.open}.micros
measure time of each write to SST table (currently only block-based table format) and blob file for flush, compaction and db open.kVerify
to enum class FileOperationType
in listener.h. Update your switch
statements as needed.level_compaction_dynamic_file_size
, ignore_max_compaction_bytes_for_input
, check_flush_compaction_key_order
, flush_verify_memtable_count
, compaction_verify_record_count
, fail_if_options_file_error
, and enforce_single_del_contracts
rocksdb.blobdb.blob.file.write.micros
expands to also measure time writing the header and footer. Therefore the COUNT may be higher and values may be smaller than before. For stacked BlobDB, it no longer measures the time of explictly flushing blob file.rocksdb.blobdb.blob.file.synced
includes blob files failed to get synced and rocksdb.blobdb.blob.file.bytes.written
includes blob bytes failed to get written.BackupEngine
, sst_dump
, or ldb
.preclude_last_level_data_seconds
option that could interfere with expected data tiering.WriteBatchWithIndex
. This includes the PutEntity
API and support for wide columns in the existing read APIs (GetFromBatch
, GetFromBatchAndDB
, MultiGetFromBatchAndDB
, and BaseDeltaIterator
).TablePropertiesCollectorFactory
may now return a nullptr
collector to decline processing a file, reducing callback overheads in such cases.HyperClockCacheOptions::eviction_effort_cap
controls the space-time trade-off of the response. The default should be generally well-balanced, with no measurable affect on normal operation.RocksDB.get([ColumnFamilyHandle columnFamilyHandle,] ReadOptions opt, ByteBuffer key, ByteBuffer value)
which now accepts indirect buffer parameters as well as direct buffer parametersRocksDB.put( [ColumnFamilyHandle columnFamilyHandle,] WriteOptions writeOpts, final ByteBuffer key, final ByteBuffer value)
which now accepts indirect buffer parameters as well as direct buffer parametersRocksDB.merge([ColumnFamilyHandle columnFamilyHandle,] WriteOptions writeOptions, ByteBuffer key, ByteBuffer value)
methods with the same parameter options as put(...)
- direct and indirect buffers are supportedRocksIterator.key( byte[] key [, int offset, int len])
methods which retrieve the iterator key into the supplied bufferRocksIterator.value( byte[] value [, int offset, int len])
methods which retrieve the iterator value into the supplied bufferget(final ColumnFamilyHandle columnFamilyHandle, final ReadOptions readOptions, byte[])
in favour of get(final ReadOptions readOptions, final ColumnFamilyHandle columnFamilyHandle, byte[])
which has consistent parameter ordering with other methods in the same classTransaction.get( ReadOptions opt, [ColumnFamilyHandle columnFamilyHandle, ] byte[] key, byte[] value)
methods which retrieve the requested value into the supplied bufferTransaction.get( ReadOptions opt, [ColumnFamilyHandle columnFamilyHandle, ] ByteBuffer key, ByteBuffer value)
methods which retrieve the requested value into the supplied bufferTransaction.getForUpdate( ReadOptions readOptions, [ColumnFamilyHandle columnFamilyHandle, ] byte[] key, byte[] value, boolean exclusive [, boolean doValidate])
methods which retrieve the requested value into the supplied bufferTransaction.getForUpdate( ReadOptions readOptions, [ColumnFamilyHandle columnFamilyHandle, ] ByteBuffer key, ByteBuffer value, boolean exclusive [, boolean doValidate])
methods which retrieve the requested value into the supplied bufferTransaction.getIterator()
method as a convenience which defaults the ReadOptions
value supplied to existing Transaction.iterator()
methods. This mirrors the existing RocksDB.iterator()
method.Transaction.put([ColumnFamilyHandle columnFamilyHandle, ] ByteBuffer key, ByteBuffer value [, boolean assumeTracked])
methods which supply the key, and the value to be written in a ByteBuffer
parameterTransaction.merge([ColumnFamilyHandle columnFamilyHandle, ] ByteBuffer key, ByteBuffer value [, boolean assumeTracked])
methods which supply the key, and the value to be written/merged in a ByteBuffer
parameterTransaction.mergeUntracked([ColumnFamilyHandle columnFamilyHandle, ] ByteBuffer key, ByteBuffer value)
methods which supply the key, and the value to be written/merged in a ByteBuffer
parameterEnableFileDeletion
API not default to force enabling. For users that rely on this default behavior and still
want to continue to use force enabling, they need to explicitly pass a true
to EnableFileDeletion
.daily_offpeak_time_utc
, the compaction picker will select a larger number of files for periodic compaction. This selection will include files that are projected to expire by the next off-peak start time, ensuring that these files are not chosen for periodic compaction outside of off-peak hours.DB::StartTrace()
, the subsequent trace writes are skipped to avoid writing to a file that has previously seen error. In this case, DB::EndTrace()
will also return a non-ok status with info about the error occured previously in its status message.TablePropertiesCollector::Finish()
once.WAL_ttl_seconds > 0
, we now process archived WALs for deletion at least every WAL_ttl_seconds / 2
seconds. Previously it could be less frequent in case of small WAL_ttl_seconds
values when size-based expiration (WAL_size_limit_MB > 0
) was simultaneously enabled.rocksdb.fifo.{max.size|ttl}.compactions
to count FIFO compactions that drop files for different reasonsDBOptions::daily_offpeak_time_utc
in "HH:mm-HH:mm" format. This information will be used for resource optimization in the futureSetSingleBurstBytes()
for RocksDB rate limiterDBOptions::fail_if_options_file_error
changed from false
to true
. Operations that set in-memory options (e.g., DB::Open*()
, DB::SetOptions()
, DB::CreateColumnFamily*()
, and DB::DropColumnFamily()
) but fail to persist the change will now return a non-OK Status
by default.Options::compaction_readahead_size
is 0Status::NotSupported()
max_successive_merges
logic.create_missing_column_families=true
and many column families.