Fast, high-quality texture compression library for many formats
These are the stand-alone texture compression kernels for Convection Texture Tools (CVTT), you can embed these in other applications. https://github.com/elasota/cvtt
The CVTT codecs are designed to get very high quality at good speed by leveraging effective heuristics and a SPMD-style design that makes heavy use of SIMD ops and 16-bit math.
Compressed texture format support:
Include "ConvectionKernels.h"
Depending on the input format, blocks should be pre-packed into one of the PixelBlock structures: PixelBlockU8 for unsigned LDR formats (BC1, BC2, BC3, BC7, BC4U, BC5U), PixelBlockS8 for signed LDR formats (BC4S, BC5S), and PixelBlockF16 for HDR formats (BC6H). The block pixel order is left-to-right, top-to-bottom, and the channel order is red, green, blue, alpha.
BC6H floats are stored as int16_t in the pixel block structure, which should be bit-cast from the 16-bit float input. Converting other float precisions to 16-bit is outside of the scope of the kernels.
Create an Options structure and fill it out:
For some modes, you must pass an encoding plan, which controls how the encoder will behave. You should NOT attempt to initialize the encoding plan yourself, either use a default-initialized encoding plan (which will run at maximum quality), or use ConfigureBC7EncodingPlanFromQuality or ConfigureBC7EncodingPlanFromFineTuningParams to configure a lower-quality encoding plan. Configuring an encoding plan is somewhat slow and you should only do it once per encode job.
Once you've done both of those things, call the corresponding encode function to digest the input blocks and emit output blocks.
VERY IMPORTANT: The encode functions must be given a list of cvtt::NumParallelBlocks blocks, and will emit cvtt::NumParallelBlocks output blocks. If you want to encode fewer blocks, then you must pad the input structure with unused block data, and the output buffer must still contain enough space.
The ETC encoders require significantly more temporary data storage than the other encoders, so the storage must be allocated before using the encoders.
To allocate the temporary data:
To release the temporary data:
Once allocated, the compression data can be reused over multiple calls to the encode functions, and depending on architecture, can usually be used by a different thread than the one that allocated it, as long as multiple encode functions are not using it at once.