Truly universal encoding detector in pure Python
md.py
can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1normalize
chaos_secondary_pass
, coherence_non_latin
and w_counter
from CharsetMatchunicodedata2
--version
(PR #194)unicodedata2
as Python is quickly catching up, scheduled for removal in 3.0 (PR #194)explain
to True (PR #146)NullHandler
by default from @nmaynes (PR #135)explain
to True will add provisionally (bounded to function lifespan) a specific stream handler (PR #135)set_logging_handler
to configure a specific StreamHandler from @nmaynes (PR #135)CHANGELOG.md
entries, format is based on Keep a Changelog (PR #141)We arrived in a pretty stable state.
Changes:
SyntaxError
(Not about ASCII decoding error) for those trying to install this package using a non-supported Python version
This version pushes forward the detection-coverage to 98%! https://github.com/Ousret/charset_normalizer/runs/3863881150 The great filter (cannot be better than) shall be 99% in conjunction with the current dataset. In future releases.
Changes:
Changes:
Internal: :art: The project now comply with: flake8, mypy, isort and black to ensure a better overall quality #81
Internal: :art: The MANIFEST.in was not exhaustive #78
Improvement: :sparkles: The BC-support with v1.x was improved, the old staticmethods are restored #82
Remove: :fire: The project no longer raise warning on tiny content given for detection, will be simply logged as warning instead #92
Improvement: :sparkles: The Unicode detection is slightly improved, see #93
Bugfix: :bug: In some rare case, the chunks extractor could cut in the middle of a multi-byte character and could mislead the mess detection #95
Bugfix: :bug: Some rare 'space' characters could trip up the UnprintablePlugin
/Mess detection #96
Improvement: :art: Add syntax sugar __bool__ for results CharsetMatches
list-container see #91
This release push further the detection coverage to 97 % !