Truly universal encoding detector in pure Python
python -m charset_normalizer.cli
or python -m charset_normalizer
encoding.aliases
as they have no alias (#323)from_path
no longer enforce PathLike
as its first argumentis_binary
that relies on main capabilities, and is optimized to detect binariesenable_fallback
argument throughout from_bytes
, from_path
, and from_fp
that allow a deeper control over the detection (default True)should_rename_legacy
for legacy function detect
and disregard any new arguments without errors (PR #262)language_threshold
in from_bytes
, from_path
and from_fp
to adjust the minimum expected coherence rationormalizer --version
now specify if the current version provides extra speedup (meaning mypyc compilation whl)md.py
can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1first()
and best()
from CharsetMatchnormalize
chaos_secondary_pass
, coherence_non_latin
and w_counter
from CharsetMatchunicodedata2
This is the last version (3.0.x) to support Python 3.6 We plan to drop it for 3.1.x
This is the last pre-release. If everything goes well, I will publish the stable tag.
language_threshold
in from_bytes
, from_path
and from_fp
to adjust the minimum expected coherence rationormalizer --version
now specify if current version provide extra speedup (meaning mypyc compilation whl)first()
and best()
from CharsetMatchnormalize
scheduled for removal in 3.0