Sshash Versions Save

A compressed, associative, exact, and weighted dictionary for k-mers.

v3.0.0

1 year ago

This release of the library features a restructured public API for the dictionary and its supported queries.

  • "Advanced" lookup queries now include, besides the usual absolute kmer_id: contig information (contig_id and contig_size of the contig where the k-mer lies in), the relative (within the contig) identifier of the k-mer (named kmer_id_in_contig), and the orientation of the k-mer in the contig. For any positive query, 0 <= kmer_id_in_contig < contig_size holds true.
  • Streaming queries are now general, not just streaming membership queries as they were in the previous releases, and return advanced lookup information by default.
  • Support for Navigational queries has been added. Given a k-mer g[1..k], a navigational query determines if g[2..k]+x is present (forward neighbourhood) and if x+g[1..k-1] is present (backward neighbourhood) in the dictionary, for x = A, C, G, T ('+' here means string concatenation). If a contig identifier is specified for a navigational query (rather than a k-mer), then the backward neighbourhood of the first k-mer and the forward neighbourhood of the last k-mer in the contig are returned.

v2.1.0

1 year ago

With this release the dictionary construction uses external memory to save RAM usage.

v2.0.0

1 year ago

No major changes compared to previous version (rather than renaming of variables for consistency with papers), but we removed a (useless) serialised 4-byte integer from skew_index and so previous index binary files are not compatible with this library release.

v1.2.0

2 years ago

This release adds a new tool called permute that re-orders (and possibly reverse-complement) the strings in an input (weighted) collection to minimize the number of runs in the abundances and, hence, optimize the encoding of the abundances.

The abundances are encoded in O(r) space on top of the space for a SSHash dictionary, where r is the number of runs (i.e., maximal substrings formed by a single abundance value) in the abundances. The i-th abundance in the sequence, corresponding to the k-mer of identifier i, is retrieved in O(log r) time.

v1.1.0

2 years ago

This release adds a new feature: compressed abundances. The SSHash dictionary now can also store the abundances in highly compressed space.

v1.0.0

2 years ago

First release.