Tinyld Versions Save

Simple and Performant Language detection library for NodeJS

1.3.4

1 year ago

Description

Fix typing issue on flavor https://github.com/komodojp/tinyld/issues/22
Update few dev dependencies
Update dataset and regenerate profiles

Screenshot

Typing in javascript and require

Typing in typescript and import

1.3.3

1 year ago

Description

Fix issue with missing bin script file : #21
Update few dependencies

1.3.2

1 year ago

Description

Maintenance version

Update deps
Update tatoeba
Add heavy flavor

1.3.1

1 year ago

Description

Maintenance version with only small modifications

update package.json : https://github.com/komodojp/tinyld/pull/16
update few dependencies (esbuild, typescript)

1.3.0

1 year ago

Description

Few Chores
- Update Tatoeba Dataset
- Update Node to 18.x
- Update Dependencies (typescript, esbuild, ...)
Tuning
- Increase the amount for chunk being analyzed for long text #14
- Change a bit verbose log to be more readable

detect('これは日本語です.', { verbose: true })

Few Fixes
- Fix a compatibility issue between Deno and esbuild #12
- Fix an issue with ESM, the library is now exported in 2 flavor, the node ESM and the browser ESM. This is managed in package.json #13

"exports": {
    ".": {
      "require": "./dist/tinyld.normal.node.js",
      "import": "./dist/tinyld.normal.node.mjs",
      "browser": "./dist/tinyld.normal.browser.js"
    },
    "./light": {
      "require": "./dist/tinyld.light.node.js",
      "import": "./dist/tinyld.light.node.mjs",
      "browser": "./dist/tinyld.light.browser.js"
    }
},

1.2.3

2 years ago

Description

Small maintenance version

Update few dependencies
Fix and issue related to TS types (https://github.com/komodojp/tinyld/issues/9)
Update documentation

Type Declaration

Npm repository does not contains the src/ folder anymore, but type definitions directly in the dist folder.

1.2.2

2 years ago

Description

Fix an issue with tinyld-light which was returning the wrong supportedLanguage list
Update documentation (autogenerated graphs)
Change charset setup of esbuild
Optimize profile files to take less space (replace json objects per short string in base36)
- Reduce tinyld 930KB -> 590KB
- Reduce tinyld-light 110KB -> 68KB

Full Changelog: https://github.com/komodojp/tinyld/compare/1.2.0...1.2.2

1.2.0

2 years ago

Description

After lot of unsuccessful experimentations, I'm glad to have find a way to improve the accuracy and release it. I decided to focus on accuracy over quantity for the moment. Making sure the algorithm work properly before trying to scale it up.

With this version 1.2.0:

Both tinyld and tinyld-light are over 97% accuracy on 16 most common languages
tinyld global accuracy on all language (64) is over 95% and each language has an accuracy > 80%
This change cause a small disk size increase

Change

Change to the algorithm

Remove the word ranking step
Improve the n-gram ranking (based on a variable number of gram)
Per language coefficient to more accurately specify how much ngram to store per language (optimize space storage)
use 4-gram and 5-gram more often (as a replacement of word)

New API

Few new API to get the list of supported language and their names

import { supportedLanguages, langName, langRegion } from 'tinyld'

// all supported languages (ISO3 format)
supportedLanguages // ['jpn', 'cmn', ...]

// and few utils about langs
langName('jpn') // Japanese
langRegion('jpn') // east-asia

Language support

Few languages were disabled
Few languages were added
The total number of language is now 64, for the ones removed it's mostly because of bad accuracy (often because of a not good enough training dataset). I will try to bring them back as soon a possible when their accuracy pass over the 80% accuracy threshold.

Per language Detection Accuracy

 - Greek (ell) - 100%
 - Hindi (hin) - 100%
 - Bengali (ben) - 100%
 - Thai (tha) - 100%
 - Telugu (tel) - 100%
 - Gujarati (guj) - 100%
 - Tamil (tam) - 100%
 - Amharic (amh) - 100%
 - Kannada (kan) - 100%
 - Burmese (mya) - 100%
 - Armenian (hye) - 99.9555%
 - Japanese (jpn) - 99.9333%
 - Vietnamese (vie) - 99.9067%
 - Korean (kor) - 99.8134%
 - Khmer (khm) - 99.7354%
 - Urdu (urd) - 99.2537%
 - Hebrew (heb) - 99.1068%
 - Berber (ber) - 99.0135%
 - German (deu) - 98.9601%
 - Toki Pona (toki) - 98.8801%
 - Russian (rus) - 98.8268%
 - Persian (pes) - 98.8135%
 - Polish (pol) - 98.8002%
 - Chinese (cmn) - 98.7602%
 - French (fra) - 98.7068%
 - Arabic (ara) - 98.4669%
 - Finnish (fin) - 98.0936%
 - English (eng) - 98.0136%
 - Yiddish (yid) - 97.9869%
 - Romanian (ron) - 97.9336%
 - Mongolian (mon) - 97.8058%
 - Lithuanian (lit) - 97.8003%
 - Icelandic (isl) - 97.7203%
 - Klingon (tlh) - 97.6803%
 - Hungarian (hun) - 97.5603%
 - Kazakh (kaz) - 97.4214%
 - Indonesian (ind) - 97.267%
 - Dutch (nld) - 96.8937%
 - Tatar (tat) - 96.8271%
 - Latvian (lvs) - 96.4734%
 - Tagalog (tgl) - 95.8539%
 - Ukrainian (ukr) - 95.4673%
 - Turkish (tur) - 95.214%
 - Portuguese (por) - 95.054%
 - Kirundi (run) - 94.6058%
 - Turkmen (tuk) - 94.5193%
 - Italian (ita) - 94.4541%
 - Belarusian (bel) - 94.2808%
 - Esperanto (epo) - 93.9475%
 - Spanish (spa) - 93.4009%
 - Volapuk (vol) - 92.6978%
 - Swedish (swe) - 91.9344%
 - Irish (gle) - 89.6735%
 - Latin (lat) - 89.0948%
 - Estonian (est) - 88.6921%
 - Czech (ces) - 88.5749%
 - Catalan (cat) - 88.0949%
 - Danish (dan) - 87.375%
 - Afrikaans (afr) - 86.578%
 - Bulgarian (bul) - 84.5754%
 - Slovak (slk) - 83.4555%
 - Serbian (srp) - 83.0823%
 - Macedonian (mkd) - 82.709%
 - Norwegian (nob) - 81.5358%