A fully typed LMDB wrapper with minimum overhead 🐦
Heed is a fully typed LMDB wrapper with minimum overhead. It is also the most maintained Rust wrapper on top of LMDB and is used by meilisearch/meilisearch. LMDB is a memory-mapped key-value store battle-tested for a long time.
sync-read-txn
featureWe removed the unsound sync-read-txn
feature that was making the RoTxn: Sync
when it mustn't as it is not safe. We replaced this feature with the read-txn-no-tls
, which makes the RoTxn: Send
, usable from different threads using a Mutex
.
We apologize for this and are discussing with the RustSec advisory team how best to advise people not to use this unsound feature.
We exposed nearly every LMDB features: DUPSORT
, INTERGER_KEY/DUP
, REVERSE_KEY/DUP
... You'll also be able to iterate over duplicate items or skip them. That's your choice.
You'll also be able to use the new Database::put_with_flags
and <Iterator>::put_current_with_flags
methods that support the NO_DUP_DATA
, NO_OVERWRITE
, APPEND
, and APPEND_DUP
flags. Allowing you to append data faster on keys or duplicate data.
Thanks to @xiaoyawei, you can use the LMDB key comparison custom functions and not only rely on the default lexicographic comparison. You can read more about this key-value feature in the LMDB source code.
use std::cmp::Ordering;
use heed_traits::Comparator;
enum StringAsIntCmp {}
impl Comparator for StringAsIntCmp {
fn compare(a: &[u8], b: &[u8]) -> Ordering {
let a: i32 = str::from_utf8(a).unwrap().parse().unwrap();
let b: i32 = str::from_utf8(b).unwrap().parse().unwrap();
a.cmp(&b)
}
}
let mut wtxn = env.write_txn()?;
let db = env.database_options().types::<Str, Unit>().key_comparator::<StringAsIntCmp>().create(&mut wtxn)?;
db.put(&mut wtxn, "-1000", &())?;
db.put(&mut wtxn, "-100", &())?;
db.put(&mut wtxn, "100", &())?;
let mut iter = db.iter(&wtxn)?;
assert_eq!(iter.next().transpose()?, Some(("-1000", ())));
assert_eq!(iter.next().transpose()?, Some(("-100", ())));
assert_eq!(iter.next().transpose()?, Some(("100", ())));
assert_eq!(iter.next().transpose()?, None);
We now have our own update-to-date lmdb-master-sys
crate. It represents the bindgen
-generated bindings to the LMDB library, and heed is directly plugged into it.
It will be easier for Meilisearch to bump the engine's LMDB version now. We previously used a fork of the outdated lmdb-rkv-sys
crate of Mozilla, but it was cumbersome to bump three repositories, i.e., our fork, meilisearch/lmdb-rs and finally heed.
Now we can make all the changes in the heed repository to bump the LMDB version :tada:
Thanks to @GregoryConrad, we now have a posix-sem
feature. This change allows iOS and macOS build to comply with Apple's App Sandbox (necessary for distribution in the App Store) and possible speed improvements brought upon by the POSIX semaphores.
You can now declare a heed Database
with a number as the key or the value in a straightforward way. Just specify the endianness of it, and that's it.
use heed::byteorder::BE;
use heed::types::*;
type BEI64 = I64<BE>;
let mut wtxn = env.write_txn()?;
let db: Database<BEI64, Unit> = env.create_database(&mut wtxn, Some("big-endian-iter"))?;
let ret = db.put(&mut wtxn, &0, &())?;
let ret = db.put(&mut wtxn, &68, &())?;
let ret = db.put(&mut wtxn, &35, &())?;
let ret = db.put(&mut wtxn, &42, &())?;
wtxn.commit()?;
@irevoire added some new Env
methods to get the size of a database:
Env::map_size
returns the size of the original memory map.Env::real_disk_size
returns the size on the disk as seen by the file system.Env::non_free_pages_size
returns the size of the non-free pages of the current transaction.Env::resize
unsafe method to resize the environment.You'll also be able to get the number of entries in a database in a snap. We no longer .iter().count()
internally and directly ask LMDB about this count.
Sometimes, it is possible to directly write into your database without first serializing your data into an intermediary buffer. For example, it can be true for many data structures like RoaringBitmaps
.
use roaring::RoaringBitmap;
type BEI64 = I64<BE>;
let mut wtxn = env.write_txn()?;
let db = env.create_database::<BEI32, ByteSlice>(&mut wtxn, Some("number-string"))?;
let bitmap = RoaringBitmap::from_iter([1, 2, 3, 4]);
// Instead of serializing the data into a buffer, as you know the length of it,
// you can directly write the data into the LMDB value reserved space.
db.put_reserved(&mut wtxn, &42, bitmap.serialize_size(), |reserved| {
bitmap.serialize_into(reserved)
})?;
zerocopy
with the more popular bytemuck
The new version of heed now uses bytemuck
to replace zerocopy
. The bytemuck
library seems much easier to contribute to; it seems much more popular than the former (710k downloads by month compared to 109k). It brings a better API, at least for heed, as it can return information on which kind of problem happens when a cast fails. It would simplify some codecs.
Support custom encoding/decoding errors has been added. Weren't you frustrated when heed triggered an error in one of the encoding/decoding traits, and you could not understand why? It is no longer an issue as the BytesEncode
/BytesDecode
trait can return a BoxedError
that can be displayed.
We introduced the BadOpenOptions
heed error when a database is already opened in the same program, but you tried to open it with different options. This behavior will also be improved in v0.20.0 to simplify the usage of the lib and make it more correct towards LMDB behaviors around the map size.
Debug
for most structsA lot more types implement the Debug
trait. It will be easier to embed an Env
, a Database
, or even an iterator in a struct that already implements Debug
.
Thanks to @AureliaDolo and @darnuria, we have a much better documentation covering and added examples to nearly everything that could look complex.
The principle of least astonishment applies to user interface and software design. It proposes that a system component should behave how most users expect it to behave. The behavior should not astonish or surprise users.
Since the early days of heed, it would automatically link to the already available libmdb library installed on the system. We saw a lot of strange issues, non-reproducible on our side, and later discovered that the system-LMDB of Arch Linux was used by heed instead of the vendored one!
It is no longer an issue as we removed this behavior for the build.rs. The vendored version is always used. We no longer use an unknown version of LMDB.
Thanks to @darnuria again, read-only transactions sometimes need to commit to making databases globally usable in the program. We now have tests to ensure we can open and commit databases in read-only environments. However, this change is subtle. We must commit to making a just-opened database global and not just local.
let rtxn = env.read_txn()?;
let db = env.open_poly_database(&rtxn, Some("my-database"))?;
rtxn.commit()?;
// We can store and use `db` here as long as the database is alive.
This detail raised an issue in heed. It is currently not safe to use a Database
. We must redefine how we open and create databases to make them safe. The new API should be released for v0.20.0.
In this release, the RwTxn::abort
method no longer returns a heed::Result
as LMDB can't fail. It was introduced when we were supporting MDBX.
We simplified the signature of the RoTxn
and RwTxn
types by removing one lifetime and only keeping a single one. The new signature only has a single 'p
lifetime, the environment lifetime, or the parent transaction. The simplification was possible as the parent transaction must already live longer than the environment.
// Previous signature
struct RwTxn<'env, 'parent, T = ()>;
// New signature
struct RwTxn<'p>;
We also removed the types of transactions. Those types were first introduced to avoid using a transaction opened with one environment with another one. Unfortunately, as the T
type was optional, it wasn't used much. We decided that a runtime check would be better and added a bunch of assert_eq!
to be sure that transactions and environments weren't mixed.
The previous version of heed used nested transactions when opening or creating databases. The operation did it this way to simplify internal methods. Unfortunately, LMDB has some limitations: using nested transactions with the MDB_WRITEMAP
option is impossible.
It is now possible to use LMDB with MDB_WRITEMAP
and open databases freely :blush:
Heed is a fully typed LMDB wrapper with minimum overhead. It is also the most maintained Rust wrapper on top of LMDB and is used by meilisearch/meilisearch. LMDB is a memory-mapped key-value store battle-tested for a long time.
sync-read-txn
featureWe removed the unsound sync-read-txn
feature that was making the RoTxn: Sync
when it mustn't as it is not safe. We replaced this feature with the read-txn-no-tls
, which makes the RoTxn: Send
, usable from different threads using a Mutex
.
We apologize for this and are discussing with the RustSec advisory team how best to advise people not to use this unsound feature.
We exposed nearly every LMDB features: DUPSORT
, INTERGER_KEY/DUP
, REVERSE_KEY/DUP
... You'll also be able to iterate over duplicate items or skip them. That's your choice.
You'll also be able to use the new Database::put_with_flags
and <Iterator>::put_current_with_flags
methods that support the NO_DUP_DATA
, NO_OVERWRITE
, APPEND
, and APPEND_DUP
flags. Allowing you to append data faster on keys or duplicate data.
We now have our own update-to-date lmdb-master-sys
crate. It represents the bindgen
-generated bindings to the LMDB library, and heed is directly plugged into it.
It will be easier for Meilisearch to bump the engine's LMDB version now. We previously used a fork of the outdated lmdb-rkv-sys
crate of Mozilla, but it was cumbersome to bump three repositories, i.e., our fork, meilisearch/lmdb-rs and finally heed.
Now we can make all the changes in the heed repository to bump the LMDB version :tada:
Thanks to @GregoryConrad, we now have a posix-sem
feature. This change allows iOS and macOS build to comply with Apple's App Sandbox (necessary for distribution in the App Store) and possible speed improvements brought upon by the POSIX semaphores.
You can now declare a heed Database
with a number as the key or the value in a straightforward way. Just specify the endianness of it, and that's it.
use heed::byteorder::BE;
use heed::types::*;
type BEI64 = I64<BE>;
let mut wtxn = env.write_txn()?;
let db: Database<BEI64, Unit> = env.create_database(&mut wtxn, Some("big-endian-iter"))?;
let ret = db.put(&mut wtxn, &0, &())?;
let ret = db.put(&mut wtxn, &68, &())?;
let ret = db.put(&mut wtxn, &35, &())?;
let ret = db.put(&mut wtxn, &42, &())?;
wtxn.commit()?;
@irevoire added some new Env
methods to get the size of a database:
Env::map_size
returns the size of the original memory map.Env::real_disk_size
returns the size on the disk as seen by the file system.Env::non_free_pages_size
returns the size of the non-free pages of the current transaction.Env::resize
unsafe method to resize the environment.You'll also be able to get the number of entries in a database in a snap. We no longer .iter().count()
internally and directly ask LMDB about this count.
Sometimes, it is possible to directly write into your database without first serializing your data into an intermediary buffer. For example, it can be true for many data structures like RoaringBitmaps
.
use roaring::RoaringBitmap;
type BEI64 = I64<BE>;
let mut wtxn = env.write_txn()?;
let db = env.create_database::<BEI32, ByteSlice>(&mut wtxn, Some("number-string"))?;
let bitmap = RoaringBitmap::from_iter([1, 2, 3, 4]);
// Instead of serializing the data into a buffer, as you know the length of it,
// you can directly write the data into the LMDB value reserved space.
db.put_reserved(&mut wtxn, &42, bitmap.serialize_size(), |reserved| {
bitmap.serialize_into(reserved)
})?;
zerocopy
with the more popular bytemuck
The new version of heed now uses bytemuck
to replace zerocopy
. The bytemuck
library seems much easier to contribute to; it seems much more popular than the former (710k downloads by month compared to 109k). It brings a better API, at least for heed, as it can return information on which kind of problem happens when a cast fails. It would simplify some codecs.
Support custom encoding/decoding errors has been added. Weren't you frustrated when heed triggered an error in one of the encoding/decoding traits, and you could not understand why? It is no longer an issue as the BytesEncode
/BytesDecode
trait can return a BoxedError
that can be displayed.
We introduced the BadOpenOptions
heed error when a database is already opened in the same program, but you tried to open it with different options. This behavior will also be improved in v0.20.0 to simplify the usage of the lib and make it more correct towards LMDB behaviors around the map size.
Debug
for most structsA lot more types implement the Debug
trait. It will be easier to embed an Env
, a Database
, or even an iterator in a struct that already implements Debug
.
Thanks to @AureliaDolo and @darnuria, we have a much better documentation covering and added examples to nearly everything that could look complex.
The principle of least astonishment applies to user interface and software design. It proposes that a system component should behave how most users expect it to behave. The behavior should not astonish or surprise users.
Since the early days of heed, it would automatically link to the already available libmdb library installed on the system. We saw a lot of strange issues, non-reproducible on our side, and later discovered that the system-LMDB of Arch Linux was used by heed instead of the vendored one!
It is no longer an issue as we removed this behavior for the build.rs. The vendored version is always used. We no longer use an unknown version of LMDB.
Thanks to @darnuria again, read-only transactions sometimes need to commit to making databases globally usable in the program. We now have tests to ensure we can open and commit databases in read-only environments. However, this change is subtle. We must commit to making a just-opened database global and not just local.
let rtxn = env.read_txn()?;
let db = env.open_poly_database(&rtxn, Some("my-database"))?;
rtxn.commit()?;
// We can store and use `db` here as long as the database is alive.
This detail raised an issue in heed. It is currently not safe to use a Database
. We must redefine how we open and create databases to make them safe. The new API should be released for v0.20.0.
In this release, the RwTxn::abort
method no longer returns a heed::Result
as LMDB can't fail. It was introduced when we were supporting MDBX.
We simplified the signature of the RoTxn
and RwTxn
types by removing one lifetime and only keeping a single one. The new signature only has a single 'p
lifetime, the environment lifetime, or the parent transaction. The simplification was possible as the parent transaction must already live longer than the environment.
// Previous signature
struct RwTxn<'env, 'parent, T = ()>;
// New signature
struct RwTxn<'p>;
We also removed the types of transactions. Those types were first introduced to avoid using a transaction opened with one environment with another one. Unfortunately, as the T
type was optional, it wasn't used much. We decided that a runtime check would be better and added a bunch of assert_eq!
to be sure that transactions and environments weren't mixed.
The previous version of heed used nested transactions when opening or creating databases. The operation did it this way to simplify internal methods. Unfortunately, LMDB has some limitations: using nested transactions with the MDB_WRITEMAP
option is impossible.
It is now possible to use LMDB with MDB_WRITEMAP
and open databases freely :blush:
Heed is a fully typed LMDB wrapper with minimum overhead. It is also the most maintained Rust wrapper on top of LMDB and is used by meilisearch/meilisearch. LMDB is a memory-mapped key-value store battle-tested for a long time.
We now have our own update-to-date lmdb-master-sys
crate. It represents the bindgen
-generated bindings to the LMDB library, and heed is directly plugged into it.
It will be easier for Meilisearch to bump the engine's LMDB version now. We previously used a fork of the outdated lmdb-rkv-sys
crate of Mozilla, but it was cumbersome to bump three repositories, i.e., our fork, meilisearch/lmdb-rs and finally heed.
Now we can make all the changes in the heed repository to bump the LMDB version :tada:
Thanks to @GregoryConrad, we now have a posix-sem
feature. This change allows iOS and macOS build to comply with Apple's App Sandbox (necessary for distribution in the App Store), in addition to possible speed improvements brought upon by the POSIX semaphores.
You will now be able to declare a heed Database
with a number as the key or the value in a straightforward way. Just specify the endianness of it, and that's it.
use heed::byteorder::BE;
use heed::types::*;
type BEI64 = I64<BE>;
let mut wtxn = env.write_txn()?;
let db: Database<BEI64, Unit> = env.create_database(&mut wtxn, Some("big-endian-iter"))?;
let ret = db.put(&mut wtxn, &0, &())?;
let ret = db.put(&mut wtxn, &68, &())?;
let ret = db.put(&mut wtxn, &35, &())?;
let ret = db.put(&mut wtxn, &42, &())?;
wtxn.commit()?;
@irevoire added some new Env
methods to get the size of a database:
Env::map_size
returns the size of the original memory map.Env::real_disk_size
returns the size on the disk as seen by the file system.Env::non_free_pages_size
returns the size of the non-free pages of the current transaction.You'll also be able to get the number of entries in a database in a snap. We no longer .iter().count()
internally and directly ask LMDB about this count.
Sometimes it is possible to directly write into your database without first serializing your data into an intermediary buffer. It can be the case for many data-structure like RoaringBitmaps
, for example.
use roaring::RoaringBitmap;
type BEI64 = I64<BE>;
let mut wtxn = env.write_txn()?;
let db = env.create_database::<BEI32, ByteSlice>(&mut wtxn, Some("number-string"))?;
let bitmap = RoaringBitmap::from_iter([1, 2, 3, 4]);
// Instead of serializing the data into a buffer, as you know the length of it,
// you can directly write the data into the LMDB value reserved space.
db.put_reserved(&mut wtxn, &42, bitmap.serialize_size(), |reserved| {
bitmap.serialize_into(reserved)
})?;
zerocopy
with the more popular bytemuck
The new version of heed now uses bytemuck
to replace zerocopy
. The bytemuck
library seems much easier to contribute to; it seems much more popular than the former (710k downloads by month compared to 109k). It brings a better API, at least for heed, as it can return information on which kind of problem happens when a cast fails, it would simplify some codecs.
Support custom encoding/decoding errors has been added. Weren't you frustrated when heed triggered an error in one of the encoding/decoding traits, and you were unable to understand the reason? It is no more an issue as the BytesEncode
/BytesDecode
trait can return a BoxedError
that can be displayed.
We introduced the BadOpenOptions
heed error for when a database is already opened in the same program, but you tried to open it with different options. This behavior will also be improved in v0.20.0 to simplify the usage of the lib and make it more correct towards LMDB behaviors around the map size.
Debug
for most structsA lot more types implement the Debug
trait. It will be easier to embed an Env
or a Database
in a struct that already implements Debug
.
The principle of least astonishment applies to user interface and software design. It proposes that a system component should behave how most users expect it to behave. The behavior should not astonish or surprise users.
Since the early days of heed would automatically link to the already available libmdb library installed on the system. We saw a lot of strange issues, non-reproducible on our side, and later discovered that the system-LMDB of Arch Linux was used by heed instead of the vendored one!
It is no longer an issue as we removed this behavior for the build.rs. The vendored version is always used. We no more use an unknown version of LMDB.
The read-only transaction is immutable and therefore doesn't require the commit
nor abort
methods as it cannot make changes. We removed those two methods from the RoTxn
type.
However, this change is subtle. Making opened (not created) databases global is no longer possible without committing the transaction and, therefore can only be possible with a write transaction. We must commit to making a just-opened database global and not just local.
let rtxn = env.read_txn()?;
let db = env.open_poly_database(&rtxn, Some("my-database"))?;
rtxn.commit()?;
// We can store and use `db` here as long as the database is alive.
This detail raised an issue in heed. It is currently not safe to use a Database
. We must redefine how we open and create databases to make them safe. The new API should be released for v0.20.0.
Note that in this release, the RwTxn::abort
method no more returns a heed::Result
as LMDB can't fail.
We simplified the signature of the RoTxn
and RwTxn
types by removing one lifetime and only keeping a single one. The new signature only has a single 'p
lifetime, either the environment lifetime or the parent transaction. The simplification was possible as the parent transaction must already live longer than the environment.
// Previous signature
struct RwTxn<'env, 'parent, T = ()>;
// New signature
struct RwTxn<'p>;
We also removed the types of transactions. Those types were first introduced to avoid using a transaction opened with one environment with another one. Unfortunately, as the T
type was optional, it wasn't used much. We decided that a runtime check would be better and added a bunch of assert_eq!
to be sure that transactions and environments weren't mixed.
The previous version of heed used nested transactions when opening or creating databases. The operation did it this way to simplify internal methods. Unfortunately, LMDB has some limitations: it is impossible to use nested transactions with the MDB_WRITEMAP
option.
It is now possible to use LMDB with MDB_WRITEMAP
and open databases freely :blush:
In this PR we are:
del_current
, put_current
, and append
iterator methods are unsafe now and it because you must know what you are doing when you use them: you must not retain any reference from inside the database when calling them. You can read more on the pull request.This release is breaking. Here is what has been updated compared to the v0.8 releases.
PolyDatabase
The PolyDatabase
was a non-typed heed database that was forcing library users to always specify the en/decoder to store/read entries from the database. It was sometimes confusing and made the codebase harder to maintain due to the Database
that was mirroring the PolyDatabase
method but with already known en/decoders.
The PolyDatabase can now be replaced by the UntypedDatabe
combined with the newly introduced remap_key/data_type
or remap_types
methods.
BytesDecode
and BytesEncode
traitsThe BytesDecode
and BytesEncode
traits have been reworked to allow users to return error messages now. It was a big source of frustration for users only to know that an en/decoding was failing without knowing anything about the error happening.
zerocopy
by bytemuck
We used the zerocopy
crate since the first version of heed, the crate was named zerocopy-lmdb
at first!
This is now the past as we switch from it, we found out that proposing changes to the zerocopy crate was more complex than proposing a change to the bytemuck crate hosted on Github. I am really impressed by the work done by both crate maintainers and grateful to people working on the zerocopy crate, nothing political. You can read more about that on the related issue.
Database::len
methodThanks to @Keats this is now O(1)
to get the number of entries in a database. The previous heed version was O(N)
as it was iterating through the whole tree to count the length.
We now compile on Windows 🎉
Thanks to @maroider and @gentoid for your help!
We support typed transactions, it is useful when you need to work with multiple environments and you don't want to shuffle the transactions between them.
We now support infinite nested write transactions.
It is a feature that LMDB provides, it allows to create a write transaction inside of another one, when a nested transaction is committed, nothing is saved consistently until the main write transaction (the parent) is committed.
Nested transactions can be aborted without aborting the main one, it can be useful to create nested jobs that can fail.
You can see an example usage of nested transactions in the examples folder.