Kvazaar Versions Save

An open-source HEVC encoder

v2.3.1

1 month ago

Features

New AVX2 optimizations for intra angular predictions, faster and gets around a code generation issue with MSVC > 19.36

Fixes

Several fixes to CMakeLists.txt
fix unaligned access on array_checksum_generic8 c6f2ba4711d42285636da97b133a7b5aa49c9533
fix preset log while enable-logging to false 7424474577cbda309beac2c4d6a071feb5f2f4d4
Include CMakeLists.txt in dist 3b1bcb62755adfd4c3c818838b5d09535198407e

External contributions

Fix typo in error message. by @fancycode in https://github.com/ultravideo/kvazaar/pull/386
Minor cmake fixes by @lgbaldoni and @kmilos in https://github.com/ultravideo/kvazaar/pull/385
Ignore "*get_pc_thunk.*" symbols in exported symbols test by @matoro in https://github.com/ultravideo/kvazaar/pull/392
Fixed issue #394 - missing _mm256_storeu2_m128i macro by @supersjgk in https://github.com/ultravideo/kvazaar/pull/395

New Contributors

@fancycode made their first contribution in https://github.com/ultravideo/kvazaar/pull/386
@lgbaldoni made their first contribution in https://github.com/ultravideo/kvazaar/pull/385
@supersjgk made their first contribution in https://github.com/ultravideo/kvazaar/pull/395

Performance

With the new AVX2 intra prediction the encoder is 1-5% faster, depending on settings

Full Changelog: https://github.com/ultravideo/kvazaar/compare/v2.3.0...v2.3.1

v2.3.0

4 months ago

Features

--(no-)enable-logging to enable/disable logging of normal encoder perfomance into stderr, errors are still outputted to stderr.
AVX2 optimisations for finding last non zero coefficient in RDOQ
Remove YASM to make compilation with visual studio easier
Experimental support for CMake, in the future we would like to get rid of automake and visual studio so if there are any issues with the CMakeLists.txt please report them

Fixes

Fix a bug when requesting encoder_headers before any frame has been pushed in aaae5b0f4926065136f287876c6bc41631bae692
Fix a problem with win+gcc (mingw, msys, cygwin) causing some optimized functions to segfault 589ed477d55560250dcc1dd2ea6a31b517545ebf
Fix GCC detection for automake 49dc5fcaf4b3aab2e9edb74b7b820b41a18b6b10 8523c968bff6eb3f020232282eb05cdb6764fd82

External contributions

add config option to turn off logging output (#356) by @farindk in https://github.com/ultravideo/kvazaar/pull/357
Update Dockerfile base image to Ubuntu 20.04 by @PeterDaveHello in https://github.com/ultravideo/kvazaar/pull/358
threads.h: fix for older macOS builds by @barracuda156 in https://github.com/ultravideo/kvazaar/pull/373
Don't export MD5 byteReverse symbol on big-endian by @matoro in https://github.com/ultravideo/kvazaar/pull/377

New Contributors

@farindk made their first contribution in https://github.com/ultravideo/kvazaar/pull/357
@barracuda156 made their first contribution in https://github.com/ultravideo/kvazaar/pull/373
@matoro made their first contribution in https://github.com/ultravideo/kvazaar/pull/377

Performance

The RD performance should be exactly the same as in v2.2.0 and configurations using RDOQ should be around 1-3% faster with the AVX2 optimizations

Full Changelog: https://github.com/ultravideo/kvazaar/compare/v2.2.0...v2.3.0

v2.2.0

1 year ago

uvg266

We now also have a VVC encoder called uvg266, which is available at github/uvg266. The majority of future development will go towards uvg266 but do not consider Kvazaar to be abandoned!

Features

Updated Region of Interest (ROI) functionality to allow separate ROI map for each frame
Improve inter search
Update cabac context during search to improve the accuracy of bit cost estimation
Move intra chroma search option from --rd 3 to its own option --(no-)intra-chroma-search and fast bipred to --(no-)fast-bipred
Change maximum rd level to 4, where 3 performs more rd search for inter and 4 performs full intra search
Add --(no-)combine-intra-cus for controlling whether the larger intra blocks are tried even when search at current depth is disabled
Add --force-inter for debugging purposes to force all PUs in inter slices to use best inter mode

Optimizations

AVX2 implementations of bidirectional blending

Fixes

Make sure the dpb is more than max_num_reorder_pics 899c672ed15a522fe2391ff81eef682e0a9d89a9
Compute proper count of buffered frames for vps and sps. Use common function d4880be6f256bb7ccbcab844c2488162b9580f15
Fix some strategy function pointer signatures a4005046ae2ebb3c88e92ff06736ce57b60fdcc7

External contributors

build: fix automake warning by @bradh in https://github.com/ultravideo/kvazaar/pull/335
cli: add missing newlines in usage by @bradh in https://github.com/ultravideo/kvazaar/pull/336
refactor SEI by @bradh in https://github.com/ultravideo/kvazaar/pull/341
cli: minor api doc fix by @bradh in https://github.com/ultravideo/kvazaar/pull/342
add sudo ldconfig by @binbinzhm in https://github.com/ultravideo/kvazaar/pull/345
Enable -mpopcnt and -mlzcnt on AVX2 by @klondi in https://github.com/ultravideo/kvazaar/pull/301

New Contributors

@bradh made their first contribution in https://github.com/ultravideo/kvazaar/pull/335
@binbinzhm made their first contribution in https://github.com/ultravideo/kvazaar/pull/345
@klondi made their first contribution in https://github.com/ultravideo/kvazaar/pull/301

Performance

BD-Bitrate

Average BD-Bitrate compared with v2.1:

class	0-uf	1-sf	2-vf	3-fr	4-f	5-m	6-s	7-sr	8-vs
hevc-A	0.4 %	0.7 %	0.0 %	-0.1 %	0.0 %	-1.1 %	-0.8 %	0.2 %	-5.1 %
hevc-B	0.3 %	0.4 %	-0.3 %	-0.6 %	-0.4 %	-1.7 %	-1.2 %	0.1 %	-7.1 %
hevc-C	0.5 %	1.1 %	0.7 %	0.5 %	0.2 %	-0.8 %	-0.4 %	0.4 %	-5.5 %
hevc-D	0.4 %	1.5 %	1.0 %	0.6 %	0.7 %	-0.3 %	-0.1 %	0.4 %	-6.6 %
hevc-E	-2.1 %	-1.7 %	-2.2 %	-2.5 %	-2.5 %	-1.4 %	-1.1 %	-1.1 %	-5.4 %
hevc-F	-0.1 %	0.2 %	-0.2 %	-0.2 %	-0.3 %	-0.6 %	-0.4 %	0.1 %	-3.1 %

total	0.0 %	0.4 %	-0.1 %	-0.3 %	-0.3 %	-1.0 %	-0.7 %	0.0 %	-5.6 %

Speedup

Average speedup compared with v2.1 on an Intel Xeon W-2145 (8-core) machine:

class	0-uf	1-sf	2-vf	3-fr	4-f	5-m	6-s	7-sr	8-vs
hevc-A	1.09x	1.10x	1.12x	1.14x	1.15x	1.08x	1.04x	1.10x	0.68x
hevc-B	1.10x	1.09x	1.15x	1.15x	1.15x	1.09x	1.07x	1.14x	0.66x
hevc-C	1.11x	1.06x	1.11x	1.14x	1.14x	1.07x	1.08x	1.03x	0.75x
hevc-D	1.08x	1.08x	1.13x	1.13x	1.14x	1.08x	1.08x	1.06x	0.71x
hevc-E	1.07x	1.08x	1.11x	1.14x	1.14x	1.08x	1.10x	1.12x	0.62x
hevc-F	1.10x	1.05x	1.07x	1.09x	1.08x	1.05x	1.05x	1.02x	0.67x

total	1.09x	1.08x	1.12x	1.13x	1.13x	1.08x	1.07x	1.08x	0.68x

Full Changelog: https://github.com/ultravideo/kvazaar/compare/v2.1.0...v2.2.0

v2.1.0

2 years ago

Kvazaar 2.1 has been released!

Important update on the license

With this release, the license was changed from LGPL2.1 to 3-Clause BSD, allowing more liberties in using Kvazaar.

Features

Option to use a custom table for fast bitrate estimation weights (--fast-coeff-table <file>)
Tools to create new weight sets for fast bitrate estimation (--fast-rd-sampling, --fastrd-accuracy-check, --fastrd-outdir), documented in rdcost-weight-tool/README.txt
Added support for Y4M input in addition to YUV (Autodetected, manually enable using --input-file-format=y4m)
Allow writing out block statistics for debugging (--stats-file-prefix)

Optimizations

New AVX2 implementations for interpolation filters

Fixes

DTS offset was incorrect at the beginning of the sequence when using GOP (7918628b8e447b7c410ab65fe4418197dbdb6b56 )
Added VPS information about reordering and buffering, fixes VLC playback (b68625b8699abde4b0293c1fc302c52782bf4464)
Removed pthread_exit() on mingw to fix media-autobuild suite (9a65617a34eccc46b80869895d9f30590f6e726e)
Renamed truncate function to not overlap with POSIX one, fixed FreeBSD build (1fa69c705dd80d40bd55e63bbfce85e3388412a0)

v2.0.0

4 years ago

Kvazaar 2.0 has been released!

Since 1.3, the encoder has received such improvements and fixes that make this one of our most significant releases so far. It is only fitting to call this version 2.0.

Here are some of the more interesting changes in this release:

Important for MinGW users

AVX2 optimizations are no longer enabled for MinGW GCC due to stack alignment issues. Use Clang for AVX2 support and therefore better performance.

Highlights

Updated presets
Updated GOP definitions using QP offset model.
- There is now even longer hierarchical GOP --gop=16
Much faster and improved bipred
Alternative and better rate control algorithm, optimal bit allocation (--rc-algorithm oba)
Variance adaptive quantization (--vaq)

Features

Option to set QP offset for intra frames (--intra-qp-offset, automatical by default)
Zero-coeff-rdo is now configurable (--zero-coeff-rdo)
Optional intra frame analysis for rate control (--intra-bits)
Optional machine learning based depth constraints for intra search (--ml-pu-depth-intra)
PU depths are now separately configurable for each GOP layer

User Interface

Report bitrate and some kind of (cumulative) average QP

Optimizations:

More AVX2 opimizations for SAO
More AVX2 opimizations for transforms
More AVX2 opimizations for intra prediction
AVX2 strategy for variance calculation

...and several other improvements and fixes

BD-Bitrate

Average BD-Bitrate compared with v1.3:

class	0-uf	1-sf	2-vf	3-fr	4-f	5-m	6-s	7-sr	8-vs
hevc-A	-20.2 %	-21.4 %	-28.8 %	-24.8 %	-21.4 %	-23.5 %	-19.1 %	-11.3 %	-15.0 %
hevc-B	-35.1 %	-34.5 %	-34.7 %	-31.8 %	-28.6 %	-29.5 %	-21.8 %	-11.4 %	-16.2 %
hevc-C	-21.0 %	-23.6 %	-31.3 %	-27.7 %	-22.4 %	-23.5 %	-19.8 %	-12.9 %	-18.0 %
hevc-D	-22.5 %	-26.6 %	-37.4 %	-33.2 %	-26.6 %	-25.3 %	-18.3 %	-10.8 %	-15.1 %
hevc-E	-35.9 %	-34.9 %	-30.9 %	-29.8 %	-28.0 %	-30.4 %	-25.7 %	-19.0 %	-20.2 %
hevc-F	-18.6 %	-17.7 %	-21.3 %	-19.9 %	-17.5 %	-15.2 %	-17.6 %	-26.8 %	-37.0 %

Total	-26.0 %	-26.9 %	-31.1 %	-28.2 %	-24.4 %	-24.6 %	-20.3 %	-15.4 %	-20.6 %

Speedup

Average speedup compared with v1.3 on an AMD Ryzen 7 1700X machine:

class	0-uf	1-sf	2-vf	3-fr	4-f	5-m	6-s	7-sr	8-vs
hevc-A	0.90x	0.75x	0.80x	0.78x	0.93x	1.11x	1.21x	1.23x	0.95x
hevc-B	0.96x	0.81x	0.83x	0.82x	0.97x	1.15x	1.26x	1.29x	0.95x
hevc-C	0.94x	0.78x	0.73x	0.70x	0.83x	1.12x	1.22x	1.26x	0.86x
hevc-D	1.02x	0.83x	0.78x	0.76x	0.90x	1.16x	1.24x	1.28x	0.93x
hevc-E	0.95x	0.84x	0.91x	0.89x	1.06x	1.45x	1.56x	1.42x	1.16x
hevc-F	0.96x	0.85x	0.84x	0.84x	0.94x	1.17x	1.29x	1.25x	1.00x

Total	0.96x	0.81x	0.81x	0.80x	0.93x	1.18x	1.29x	1.29x	0.97x

v1.3.0

4 years ago

Time has finally come for the long overdue release v1.3 of Kvazaar!

Since it has been such a long time after our previous release, many many new features, fixes, and optimizations have been introduced and the coding efficiency is on a whole new level.

Binaries for Windows will be added starting from this release.

Pthreads is no longer needed to build and run Kvazaar on Windows. It has been replaced by our custom ThreadWrapper so Kvazaar is able to use the new c++ standard threads. ThreadWrapper is also separately available on repository https://github.com/ultravideo/ThreadWrapper and released under ISC lisence. It is not complete yet, so contributions are welcome.

Performance figures and more up-to-date release notes will be added in the following days.

Some CI tests are failing at the moment, but there should not be any critical issues.

Edit (2.5.2020): Here are some of the previously missing missing release notes:

Features

Option --fast-residual-cost to gain speedup with less accurate mode decisions
Option --(no-)open-gop to choose between open and closed GOP
Option --set-qp-in-cu to move QP signalling into CU level for very specific use cases
Option --scaling-list to enable choose enable and choose scaling lists
Option --max-merge to set the maximum amount of merge candidates
Option --early-skip to speed up coding at the cost of worse results

Optimizations

Another version of interplation filters and fractional pixel motion estimation
AVX2 optimized sign bit hiding
AVX2/BMI2 optimized coefficient coding
AVX2 blending for biprediction
Improvements and AVX2 code for certain SAD functions

Building

ASM: marked stack non-executable

The old notes:

Features

Changed --rd=2 to use SSD metric for CU mode decision (662430d441c09a85d2de0106c3adbdd5c5f196f0)
Changed inter search to check the cost of flushing residual to zero (75a87006301ed892b3e3d74862ff5f297b5a45aa)
Changed rectangular and asymmetric blocks to use a transform split (774c6665286d11b7299a0d76a013a21c93645c28)
Added diamond search ME algorithm (4e13608b01ac18d543d500c582e2b5e72f9c92a1)
Enabled low delay B GOP structure with --bipred --gop=lp-g4d3t1 (7155dd0db72b0ecabe08ecd20db5ebaaefcaabc6)
Added termination of intra search at zero residual with --intra-rdo-et (4fb1c16c6198b2c1774743349c0a03a786314c4d)

Optimization

Made TZ search faster and slightly better (c13604468d95693c9cc26b749836221e13d4c339)
Optimized bi-prediction (69756e24914000f5a1660b9a3f8ce0ea341a1a6c)

Fixes

Fixed transform skip with rectangular inter blocks (fb462b25ef8faa6234ad89b7155d2842172f57f6)
Fixed accidental inter search for 4x4 blocks (649113a8218bd9564b90edeac3c2a8d0ebabf354)

User Interface

Changed options for all preset levels (f033ad0ad0817ee247180b7d4b1d3541d251dd96)
Added an option for limiting the number of steps in motion estimation with --me-steps (39ed36830ef3411a50b56a06a73dad9a56e8390b)
Added --me=dia (4e13608b01ac18d543d500c582e2b5e72f9c92a1)
Added --level, --force-level and --high-tier for setting bitstream level and tier (bac07457ea078bcca582e9fefb327031a50574af)

Building

Fixed issue with struct timespec redefinition with Visual Studio 2015 and later (713e694d82da77bb96fd86fd41da3f04f0f956b2)
Fixed building .asm files in Visual Studio 2017 (6be81959d5381010f3c0bc4bf008ac4830d641fe)
Fixed compatibility with crypto++ 6.0 (4b24cd03a24ee0290b39f10f49dad92c7a846652)
Added support for crypto++ with the name libcryptopp (411276d6f2b236c6ebbe0482918c0bb158e8e21e)
Dockerfile base image was updated to Ubuntu 18.04 (8380b6c0f3379a6d2132337426987e9a8c59d813)
Enabled -Wextra by default (ff17e0ba179fe06c9bd86795c709c1a513f1012b)

Refactoring

Inter motion vector cost functions (c73cce386a3b10c481b03bf5b51dc97311a528d8)
Dockerfile (0164291688c8065f6e403fc725681a59192a24e4)

BD-Bitrate

Average BD-Bitrate compared with v1.2:

Class	0-uf	1-sf	2-vf	3-fr	4-f	5-m	6-s	7-sr	8-vs
hevc-A	-18.8 %	-17.5 %	-1.7 %	-6.2 %	-2.1 %	-6.1 %	-8.7 %	-19.1 %	-19.2 %
hevc-B	-21.3 %	-20.9 %	-4.6 %	-5.7 %	-2.9 %	-6.6 %	-11.8 %	-26.4 %	-23.9 %
hevc-C	-26.6 %	-24.9 %	-2.4 %	-8.3 %	-3.7 %	-11.9 %	-11.0 %	-26.1 %	-21.0 %
hevc-D	-33.3 %	-31.1 %	-1.8 %	-11.5 %	-7.5 %	-16.5 %	-16.2 %	-28.9 %	-23.6 %
hevc-E	-26.3 %	-25.3 %	-20.4 %	-15.4 %	-13.8 %	-15.7 %	-21.3 %	-26.3 %	-25.9 %
hevc-F	-9.2 %	-8.4 %	-5.2 %	-4.9 %	-3.0 %	-20.8 %	-17.3 %	0.2 %	18.8 %

Total	-22.7 %	-21.5 %	-5.7 %	-8.4 %	-5.3 %	-13.2 %	-14.4 %	-21.3 %	-15.4 %

Speedup

Average speedup compared with v1.2 on an Intel Xeon E5-2620 v4 machine:

class	0-uf	1-sf	2-vf	3-fr	4-f	5-m	6-s	7-sr	8-vs
hevc-A	x0.88	x0.91	x1.02	x1.19	x1.37	x0.76	x0.93	x0.58	x0.36
hevc-B	x0.90	x0.92	x1.08	x1.21	x1.40	x0.81	x0.99	x0.62	x0.38
hevc-C	x0.87	x0.88	x0.96	x1.11	x1.24	x0.59	x0.72	x0.40	x0.26
hevc-D	x0.92	x0.97	x1.03	x1.23	x1.34	x0.65	x0.74	x0.38	x0.27
hevc-E	x0.97	x0.94	x1.08	x1.15	x1.34	x0.86	x1.05	x0.74	x0.54
hevc-F	x0.81	x0.91	x1.01	x1.09	x1.30	x0.78	x0.88	x0.49	x0.35

Total	x0.89	x0.92	x1.03	x1.17	x1.33	x0.74	x0.88	x0.53	x0.35

v1.2.0

6 years ago

We are happy to release Kvazaar version 1.2. Since the last version, Kvazaar has obtained significant speedups at all presets and the compression efficiency has improved for the fastest presets. Please find the complete list of changes below.

Features

Intra prediction mode encryption with --crypto=intra_pred_modes (2b8ce5e47c2baa489466b31c0bc8aeab4ad07563)
Adaptive QP for 360° video with --erp-aqp (26adef44929e5722a9dbec537b662a3d683d1f03)
New selection algorithm for --owf=auto and --threads=auto (8c4a3473a85152d692d43eaa5e5f0b9fda73738e)
Added an option to set the encryption key using --key (2e130912cbdf8ce19b612c2f67bbcb551ec8a415)
Added an option to limit SAO to band offset or edge offset only with --sao=band and --sao=edge (8674c0f5eeee36f758e8838c898190e74c96d362)

Optimization

Reduced number of intra modes checked when using --rd=2 (2cad3173ecc7b8562bca5a138a560c45e1621a6f)
Reduced inter-frame CTU dependencies caused by SAO (050e90dc0506e020bb7591544d3bb7341d2abe40)
Changed to a faster calculation for coefficient costs when using --rd=0 (1ead9c0c39da6ec4868f2c034355378d932732c3)

Fixes

Fixed long motion vectors not getting clipped (#158, 85e2a40da3b8bee7d8fb6563a935e7117062b90b)
Fixed order of pictures in reconstruction debug output when --gop=8 is used (#101, aae141f2d385dafe314b4143430158314ad74170)
Fixed a use-after-free when encoding very few frames with --gop=8 (#161, 2991962033900806b5ba8f59feabf6fe58a54d58)
Fixed a crash when video size is not a multiple of the smallest CU size (2f2405dfe66969d6e411a6c0c4b5b7626d5a9232)
Fixed invalid bitstream when QP is too large (382636de55d36a34fd05475e506325235fe55ba6)
Fixed a race condition causing a deadlock (5f8e17d4bac10731069b6904c7ed4f7e590ec8af)
Fixed a memory leak in encryption (8654b48186d04011011adf03a1f7aa222f49f891)
Fixed I-frames not being IRAP frames when using GOP (00c9f52bd4f6c7ccbf20e62bfd1e027d277da20d, 841597e12328656e2579b573f2e6c1f2e8e18809)
Fixed computing inter and intra costs with different metrics (afc13f1974d86a6e85196eabcde1980957b6c6a3)
Fixed reliance on undefined behavior (b41f0fa8e875876baa234471b780b25fc22fc64d, 924cf857ec29ecdb262fd415d1c77796dc5c73a3)
Fixed --mv-constraint=frametilemargin constraining motion vectors too much (409d2114f0c92e7d174f83dc96cac1c14d94e528)
Fixed using --bipred with --tmvp (#160, 9974380cdd7398af885f4f0b599cc58cdd643b0e)

User Interface

Changed type of kvz_config.roi.dqps from uint8_t* to int8_t. Delta QP values for --roi may now be negative. (79cb3a2fd32b56d5358c10dba8bd75b42c9d05fd)
Changed PSNR display format (20d6444f07d1c2fa0cd95f3f3a1751b2877f5870)

Building

Default to no -Werror. Run configure with --enable-werror to enable it. (033bc6bc45b0c6b200a4227208d2fa6263e09166)
make check now runs valgrind tests that used to only run on Travis. Programs ffmpeg, valgrind and TAppDecoderStatic should be found from $PATH (6bbe5e10cdaa71fdec4530852ef99c5203100520)

Refactoring

Removed duplicate code in inter MVP and merge candidate selection (4fb0783cb07914404a93c8d10bfc8f22a1a819d7)
Removed duplicate code in intra reconstruction for luma and chroma (e944416ae61c0b54824c444357579ae3cebd9500)
Changed functions for writing the CU tree bitstream to use luma pixel coordinates (610c91b0c56d652bd7c39c9c78d4706a8ccba882, f5eef7f33c4084309143c33f84901d7aaad0c4d0)
Removed duplicate code in functions for writing intra CU bitstream with and without encryption (525a5180ff8a95991b0d6c905699fd28c198a8ea)
Removed duplicate code in helper functions in search.c (2c734760bc9f7f751775878c6250aa73ca2dde59)
Gathered function parameters for inter search functions into a single struct (2fa3d829462bed21631f278e45de6fac01f88353)

BD-Bitrate

Average BD-Bitrate compared with v1.1:

Class	0-uf	1-sf	2-vf	3-fr	4-f	5-m	6-s	7-sr	8-vs
hevc-A	-15.71 %	-6.68 %	-4.66 %	-0.89 %	-1.11 %	-0.54 %	+0.04 %	-0.02 %	+0.32 %
hevc-B	-19.04 %	-8.15 %	-6.92 %	-1.26 %	-1.48 %	-0.65 %	-0.33 %	-0.33 %	-0.07 %
hevc-C	-20.39 %	-8.54 %	-5.01 %	-0.55 %	-0.72 %	-0.44 %	+0.03 %	-0.00 %	+0.23 %
hevc-D	-13.24 %	-5.15 %	-2.54 %	-0.33 %	-0.51 %	-0.32 %	-0.10 %	-0.04 %	+0.13 %
hevc-E	-4.37 %	-3.31 %	-1.90 %	-0.52 %	-1.10 %	-0.68 %	-0.74 %	-0.86 %	-0.78 %
hevc-F	-12.42 %	-6.15 %	-5.25 %	+0.04 %	+0.24 %	+0.25 %	+0.32 %	+0.70 %	+0.90 %

Total	-14.80 %	-6.59 %	-4.68 %	-0.60 %	-0.78 %	-0.39 %	-0.13 %	-0.08 %	+0.14 %

Speedup

Average speedup compared with v1.1 on an Intel Core i7-4770 machine:

Class	0-uf	1-sf	2-vf	3-fr	4-f	5-m	6-s	7-sr	8-vs
hevc-A	x1.07	x1.06	x1.10	x1.10	x1.09	x1.11	x1.10	x1.11	x1.10
hevc-B	x1.06	x1.07	x1.09	x1.11	x1.09	x1.13	x1.12	x1.13	x1.14
hevc-C	x1.22	x1.27	x1.32	x1.35	x1.33	x1.37	x1.39	x1.42	x1.41
hevc-D	x1.34	x1.58	x1.64	x1.60	x1.58	x1.54	x1.55	x1.57	x1.54
hevc-E	x1.20	x1.17	x1.16	x1.18	x1.16	x1.17	x1.16	x1.20	x1.19
hevc-F	x1.25	x1.20	x1.20	x1.23	x1.20	x1.24	x1.24	x1.27	x1.27

Total	x1.19	x1.22	x1.24	x1.26	x1.24	x1.26	x1.26	x1.28	x1.28

v1.1.0

7 years ago

I think there are enough new features to call this v1.1.0.

Average BD bitrate (QP 17, 22, 27, 32) v1.1.0 vs v1.0.0

Class	0-uf	1-sf	2-vf	3-fr	4-f	5-m	6-s	7-sr	8-vs
A	-1.9%	-1.5%	-1.2%	-1.0%	-0.7%	-0.7%	-1.1%	-1.1%	-1.3%
B	-2.2%	-1.3%	-1.4%	-0.8%	-0.5%	-0.6%	-0.9%	-0.8%	-1.0%
C	-1.3%	-1.1%	-1.0%	-0.9%	-0.6%	-0.6%	-0.6%	-0.6%	-0.8%
D	-1.1%	-0.9%	-0.7%	-0.7%	-0.6%	-0.5%	-0.1%	-0.2%	-0.4%
E	-2.5%	-1.5%	-0.9%	-0.7%	-0.3%	-0.3%	-0.5%	-0.4%	-0.4%
F	-1.6%	-0.7%	-0.9%	-0.8%	-0.5%	-0.4%	-0.7%	-0.7%	-0.7%

All	-1.7%	-1.2%	-1.0%	-0.8%	-0.5%	-0.5%	-0.6%	-0.6%	-0.8%

Average speedup (QP 17, 22, 27, 32) v1.1.0 vs v1.0.0

Class	0-uf	1-sf	2-vf	3-fr	4-f	5-m	6-s	7-sr	8-vs
A	1.03x	1.02x	1.01x	1.01x	1.01x	1.02x	1.07x	1.09x	1.16x
B	1.03x	1.01x	1.02x	1.01x	1.01x	1.01x	1.06x	1.06x	1.13x
C	1.03x	1.02x	1.01x	1.01x	1.01x	1.01x	1.07x	1.07x	1.16x
D	1.07x	1.05x	1.03x	1.03x	1.02x	1.02x	1.07x	1.09x	1.17x
E	0.99x	0.97x	0.98x	0.99x	0.99x	0.99x	1.00x	1.02x	1.05x
F	1.02x	1.01x	1.01x	1.01x	1.01x	1.01x	1.08x	1.09x	1.17x

All	1.03x	1.02x	1.01x	1.01x	1.01x	1.01x	1.06x	1.07x	1.14x

Paramaters: --threads=4 --owf=1 -p64

Features

Bitrate control now works at LCU level, giving more consistent results. (2318bd77ed01dc8960028b31ee9bd610c84cd72f)
Added --roi parameter for LCU level delta-QP control. (4a0121ac42c37f62d9bd2e5eda5fe268f8fed9bb)
Added --slices parameter for encapsulating tiles and WPP-rows into slice NAL's instead of using bitstream offsets. (1e6463c08bef7f72a9c6629f4cea549c806ddf9f)
Temporal motion vector prediction now works with B-frames. (d892be51ec483b5470b85db6ffa0f4122e8677cf)

Optimization

Added AVX2 version of SSD. (778e46dfd85a23ee3a76785af289fc10e26df750)
Optimized intra reference building. (c31207ea7deb25617ed60591dcda6c101a629d91)
Optimized bitstream writes. (a9e45efcfc82a0ee704357170d60df6fedc38d91)
Optimized CU-split decision. (2c069a3e5fcb665a54e6c55490522526f6587453)
Fix main-thread busy-looping on Linux. (a5a925fc283d01e844f4e8a4a6c4105f878f0504)
Avoid initializing memory needlessly during RDOQ. (acd12cba1e1aaac6e15090552fe9e1b22bbb6eb7, b021d2244e18a45aca0a47747e999fcebf059c85)

Fixes

Pass DTS and PTS timestamps correctly through the API. (d18de19d8a1bb49aa63349a57191d3d8cf013407)
Fixed bug with subpixel motion estimation within tiles. (2c005cda25d6cf3030462b1c7d6ea5fbb87037b1)
Improved 10-bit RD-performance. (70a52f0e4867c713c68005d65ce63c7b8e1d04b4)
Fixed for stupendously large bitstreams when --mv-constraint was used with --subme. (937a7649873cece08bb42d723dd637d8c7221551)
Fixed bug with --smp and --amp. (46c9a483c377c555a5b1890a75f2a3e858ca36c0)
Fix problem with --bipred. (1e6463c08bef7f72a9c6629f4cea549c806ddf9f)
Fixed hang with threading on OSX. (d893474babfa12494bdefeb357f495304ff78443)
Fix crash when frame is less than 65 pixels high and WPP is used. (b8e3513a231678eaeeca5ca41e7d9db0e487a4bf)

User Interface

Disabled WPP with tiles enabled. (cb6672b4526738625fd1c780d5df7b4c0899bf4d)
Improved --help. (5bf745460da2ba9c75e2309cff0c3d104569f6f0, 78a28e033818700f7c7f62f81c4e261ff3180669)
Made it possible to disable the gop-structure that was enabled by default in v.1.0.0. (deb63f735f9aa2e0597d52993b250e24a8f6bcb4)
Have --threads=auto enable threading instead of disabling it. (db5e750c7f943dff436a172e9d3d1f81c6d5e0ee)
Give errors on failures and handle them better. (97863cdaa26c1373767dd74d460cf1a46f248a22, 6a178dee963f4a05dcfdd3eb76f90e9935b074b0)
Use reference picture number of medium preset by default. (7ff33e1bf2c44817f9b39748e769abd1f05b3a30)

Building

Include optimizations on 32-bit. (1dcc9937439422c65ca3bf46d229f52ce3fe1bb9)
Added appveyor CI tests for MSYS2. (e269b8653934d39ed9584d16843952ae73395770)
Add pkg-config macros, so pkg-config doesn't need to be installed anymore. (2d7daa1da7ce1db129ed7e6f12de6bb6d83d25ac)
Travis CI OSX tests work again. (c32f5fafc9d3b4cd0486e89b0cc9e7e955715e79)

Refactoring

Refactored deblocking and sign hiding. (7ec5f787920781d9c67e16923d63496ca28a5878)
Removed Exp-Golomb lookup table. (ed3bd898fd2eeb3b34c7445275cfb39ab708f737)
Copy kvz_config to encoder_control_t and remove duplicate fields. (e78a8dfcf521346cfe88ee56dc6c87af4d529a11)

v1.0.0

7 years ago

It's been 9 months since last release. Now that the encoder just got 10x faster (on veryslow), and quite a bit faster and better on every other preset as well, I think it's time for a major verson bump.

Average BD bitrate (QP 17, 22, 27, 32) v1.0.0 vs v0.8.3

Class	0-uf	1-sf	2-vf	3-fr	4-f	5-m	6-s	7-sr	8-vs
A	-16.4%	-26.9%	-27.5%	-31.0%	-11.2%	-11.9%	-11.3%	-6.7%	-4.8%
B	-16.2%	-33.7%	-31.7%	-37.6%	-11.6%	-14.8%	-15.7%	-9.1%	-6.3%
C	-7.0%	-17.6%	-28.0%	-31.2%	-8.3%	-9.0%	-11.3%	-7.1%	-8.1%
D	-3.7%	-12.3%	-29.2%	-30.3%	-5.4%	-5.9%	-11.5%	-8.3%	-9.9%
E	-28.4%	-42.6%	-33.5%	-39.4%	-22.6%	-28.5%	-20.3%	-7.0%	-0.7%
F	-6.1%	-11.3%	-12.8%	-16.5%	-10.1%	-2.1%	2.3%	10.8%	6.4%

All	-13.0%	-24.1%	-27.1%	-31.0%	-11.5%	-12.0%	-11.3%	-4.6%	-3.9%

Average speedup (QP 17, 22, 27, 32) v1.0.0 vs v0.8.3

Class	0-uf	1-sf	2-vf	3-fr	4-f	5-m	6-s	7-sr	8-vs
A	1.61x	1.91x	1.89x	1.37x	2.69x	3.33x	4.79x	7.32x	11.06x
B	1.65x	1.98x	1.96x	1.46x	2.67x	3.36x	4.79x	8.15x	13.89x
C	1.76x	1.97x	1.98x	1.45x	2.52x	2.97x	4.87x	9.32x	15.77x
D	2.09x	1.87x	1.81x	1.32x	1.97x	2.36x	5.13x	8.78x	12.65x
E	1.91x	1.96x	1.75x	1.40x	3.00x	3.70x	4.87x	6.06x	7.56x
F	1.84x	1.83x	1.74x	1.41x	2.86x	2.98x	4.60x	8.18x	13.58x

All	1.81x	1.92x	1.86x	1.40x	2.62x	3.12x	4.84x	7.97x	12.42x

Paramaeters: --threads=4 --owf=1 --wpp -p64

New Features

--version
--help
--loop-input
--mv-constraint to constrain motion vectors
--tiles=2x2 as an alternative syntax for uniform tiles
--hash=md5
Print information about what SIMD optimizations are in use
--mv=full8 --mv=full16 --mv=full32 --mv=full64
--cu-split-termination=zero/off
--crypto for selective encryption of bitstream (for OpenHEVC)
--me-early-termination=sensitive/on/off for early termination of motion vector search
Added 4x8 SMP and 4x12 AMP motion partitions
--subme=0/1/2/3/4 for control over complexity of fractional pixel motion prediction
--lossless for lossless coding
Monochrome coding
--input-format=420/400
--input-bitdepth=8/10
--tmpv for temporal motion vector predictor
--rdoq-skip for not using rdoq for situations where it's unlikely to improve BDRate
Modified --gop=lp-g4d3r1t1 syntax to not take the reference frames as a parameter, it's now --gop=lp-g4d3t1.
Enable WPP and multithreading by default, with detection for number of cores
Update all presets to ratedistortion-complexity optimized versions. These are based on a search of all (~ish) possible encoding parameters and bring a huge boost to both speed and BDRate when encoding with the presets (10x speed for veryslow, ~1.1x-4x for others, up to 30% improved BDRate for some presets).
Set default options to match medium with intra period of 64, QP 22 and --gop=lp-g4d3t1
--implicit-rdpcm RExt feature

Optimizations

AVX2 version for Sample Adaptive Offset (SAO)
Optimized memory copying
AVX2 versions of filters for fractional pixel motion estimation
AVX2 version for half pixel chroma sampling for SMP/AMP
AVX2 versions for calculating two or four SATD values at once for small blocks
Rewrote AVX2 version of fractional pixel motion compensation
Rewrote motion vector cost calculation. It only got slightly faster, but BDRate improved a bunch due to the new implementation being more correct.
Made AVX2 SAD use SSE4.1 for cases where there isn't an AVX2 implementation, speeding up SMP/AMP.

Bugfixes

Fixed a bug in rate control where an int overflowed after coding 2^31 bits (2Gb)
Fixed non-determinism intiles
Fixed chroma reconstruction bug in tiles
Fixed a bug with calculating the number of bits used for intra mode on 4x4 CUs
Stopped checking zero motion vector multiple times in motion compensation
Fixed possible segfault in motion compensation
Fixed a race condition with OWF and SMP/AMP
Gave pthread_cond_timedwait time in correctly, such that main thread now sleeps instead of busylooping when it has nothing to do
Fixed rate control with lp-gop
Fixed full search not taking temporal motion vector into account
Allow non-gop-length intra period for lp-gop

Code / Building / Testing

Moved SAO to it's own file
Removed a ton of unnecessary includes
Updated autotools ax_pthread
Added build test for OS-X for Travis
Made tests check for bitstream correctness
Refactored some of the copypasta in motion vector search starting point selection
Refactored the cu_info_t datastructures to hold information at a 4x4 resolution needed for AMP and SMP
Changed cu_info_t to use bitfields to negate the effect of increasing the cu_info_t array by a factor of 4
Moved bitstream generation from encoderstate.c to encode_coding_tree.c
Renamed encoder_state_t.global to frame, which makes sense since it hold frame level data, not global data
Rewrote integer vector inter prediction, because it was so bad
Refactored init_lcu_t
Added more tests for inter SAD
Added speed tests for dual intra SAD functions
Added more realistic speed tests for inter SAD

Other

Added a manpage
Added scripts for updating manpage and README based on --usage.
Added a Dockerfile. Just because.
Added commit date to --version

v0.8.3

7 years ago