Detect character encoding using ICU
Detect character encoding using ICU
Tip: If you don’t need ICU in particular, consider using ced, which is based on Google’s lighter compact_enc_det library.
$ npm install detect-character-encoding
detect-character-encoding is a C++ addon. Therefore, you may need to install various build tools. Check node-gyp’s readme for more information.
const fs = require('fs');
const detectCharacterEncoding = require('detect-character-encoding');
const fileBuffer = fs.readFileSync('file.txt');
const charsetMatch = detectCharacterEncoding(fileBuffer);
console.log(charsetMatch);
// {
// encoding: 'UTF-8',
// confidence: 60
// }
detect-character-encoding may return null
if no charset matches.
detect-character-encoding does not support 32-bit operating systems.
As listed in ICU’s user guide:
detect-character-encoding is licensed under the BSD 2-clause license but includes third-party software under different licenses. See LICENSE.md
for the full license text.