Will Rowe Hulk Versions Save

Histosketching Using Little Kmers

1.0.0

4 years ago

This is a complete re-implementation of HULK. The change log states the main changes:

  • fully re-written codebase
    • I've aimed for it to be largely backwards compatible with previous releases
  • fully open-sourced!
  • algorithm changes
    • underlying histogram is now based on minimizer frequencies
    • count-min sketch for k-mer frequencies is now replaced with a fixed-size array and a jump-hash for minimizer placement
  • changes to the sketch subcommand:
    • sketches saved to JSON by default (ala sourmash)
    • histosketch count-min sketch is no longer configurable by the user (this was Epsilon and Delta)
    • spectrum size is determined based on k-mer size
    • minCount for k-mer frequencies is removed
  • changes to the smash subcommand:
    • operates on JSON input
    • outputs matrix as csv
  • replaced some unecessary features
    • the functionality of the print and distance subcommands is available in the smash subcommand

0.1.2

5 years ago

Minor bug fixes and improvements:

  • adding buffered channels for read processing and hashing
  • swapping countmin sketch parameters (back to epsilon and delta to offer more tuning)
  • tweaking jump hash for counter placement in cms
  • updating default settings to improve performance after the above changes

0.1.0

5 years ago

This release bumps HULK to a more stable version. Here are a summary of the main changes:

  • swap uint64 encoding of k-mers to instead us ntHash (Go implementation)

  • replace delta+epsilon values in CMS with a soft memory limit for the CMS structure

  • use Jump hash adjusting/querying CMS counters

  • also change the XORing of hash function so that the uint64 is split to 2 uint32s, with one of these being altered using the CMS depth iterator.
  • allow FASTA input

  • bug fixes (histosketch metadata, weighted jaccard similarity

0.0.2

5 years ago

First full release of HULK

0.0.1

5 years ago