Tinyld Versions Save

Simple and Performant Language detection library for NodeJS

1.3.4

1 year ago

Description

Screenshot

Typing in javascript and require image

Typing in typescript and import image

1.3.3

1 year ago

Description

  • Fix issue with missing bin script file : #21
  • Update few dependencies

1.3.2

1 year ago

Description

Maintenance version

  • Update deps
  • Update tatoeba
  • Add heavy flavor

1.3.1

1 year ago

Description

Maintenance version with only small modifications

1.3.0

1 year ago

Description

  • Few Chores
    • Update Tatoeba Dataset
    • Update Node to 18.x
    • Update Dependencies (typescript, esbuild, ...)
  • Tuning
    • Increase the amount for chunk being analyzed for long text #14
    • Change a bit verbose log to be more readable
detect('これは日本語です.', { verbose: true })
  • Few Fixes
    • Fix a compatibility issue between Deno and esbuild #12
    • Fix an issue with ESM, the library is now exported in 2 flavor, the node ESM and the browser ESM. This is managed in package.json #13
"exports": {
    ".": {
      "require": "./dist/tinyld.normal.node.js",
      "import": "./dist/tinyld.normal.node.mjs",
      "browser": "./dist/tinyld.normal.browser.js"
    },
    "./light": {
      "require": "./dist/tinyld.light.node.js",
      "import": "./dist/tinyld.light.node.mjs",
      "browser": "./dist/tinyld.light.browser.js"
    }
},

1.2.3

2 years ago

Description

Small maintenance version

Type Declaration

Npm repository does not contains the src/ folder anymore, but type definitions directly in the dist folder.

image

1.2.2

2 years ago

Description

  • Fix an issue with tinyld-light which was returning the wrong supportedLanguage list
  • Update documentation (autogenerated graphs)
  • Change charset setup of esbuild
  • Optimize profile files to take less space (replace json objects per short string in base36)
    • Reduce tinyld 930KB -> 590KB
    • Reduce tinyld-light 110KB -> 68KB

Full Changelog: https://github.com/komodojp/tinyld/compare/1.2.0...1.2.2

1.2.0

2 years ago

Description

After lot of unsuccessful experimentations, I'm glad to have find a way to improve the accuracy and release it. I decided to focus on accuracy over quantity for the moment. Making sure the algorithm work properly before trying to scale it up.

With this version 1.2.0:

  • Both tinyld and tinyld-light are over 97% accuracy on 16 most common languages
  • tinyld global accuracy on all language (64) is over 95% and each language has an accuracy > 80%
  • This change cause a small disk size increase

Change

Change to the algorithm

  • Remove the word ranking step
  • Improve the n-gram ranking (based on a variable number of gram)
  • Per language coefficient to more accurately specify how much ngram to store per language (optimize space storage)
  • use 4-gram and 5-gram more often (as a replacement of word)

New API

Few new API to get the list of supported language and their names

import { supportedLanguages, langName, langRegion } from 'tinyld'

// all supported languages (ISO3 format)
supportedLanguages // ['jpn', 'cmn', ...]

// and few utils about langs
langName('jpn') // Japanese
langRegion('jpn') // east-asia

Language support

  • Few languages were disabled
  • Few languages were added
  • The total number of language is now 64, for the ones removed it's mostly because of bad accuracy (often because of a not good enough training dataset). I will try to bring them back as soon a possible when their accuracy pass over the 80% accuracy threshold.

Per language Detection Accuracy

 - Greek (ell) - 100%
 - Hindi (hin) - 100%
 - Bengali (ben) - 100%
 - Thai (tha) - 100%
 - Telugu (tel) - 100%
 - Gujarati (guj) - 100%
 - Tamil (tam) - 100%
 - Amharic (amh) - 100%
 - Kannada (kan) - 100%
 - Burmese (mya) - 100%
 - Armenian (hye) - 99.9555%
 - Japanese (jpn) - 99.9333%
 - Vietnamese (vie) - 99.9067%
 - Korean (kor) - 99.8134%
 - Khmer (khm) - 99.7354%
 - Urdu (urd) - 99.2537%
 - Hebrew (heb) - 99.1068%
 - Berber (ber) - 99.0135%
 - German (deu) - 98.9601%
 - Toki Pona (toki) - 98.8801%
 - Russian (rus) - 98.8268%
 - Persian (pes) - 98.8135%
 - Polish (pol) - 98.8002%
 - Chinese (cmn) - 98.7602%
 - French (fra) - 98.7068%
 - Arabic (ara) - 98.4669%
 - Finnish (fin) - 98.0936%
 - English (eng) - 98.0136%
 - Yiddish (yid) - 97.9869%
 - Romanian (ron) - 97.9336%
 - Mongolian (mon) - 97.8058%
 - Lithuanian (lit) - 97.8003%
 - Icelandic (isl) - 97.7203%
 - Klingon (tlh) - 97.6803%
 - Hungarian (hun) - 97.5603%
 - Kazakh (kaz) - 97.4214%
 - Indonesian (ind) - 97.267%
 - Dutch (nld) - 96.8937%
 - Tatar (tat) - 96.8271%
 - Latvian (lvs) - 96.4734%
 - Tagalog (tgl) - 95.8539%
 - Ukrainian (ukr) - 95.4673%
 - Turkish (tur) - 95.214%
 - Portuguese (por) - 95.054%
 - Kirundi (run) - 94.6058%
 - Turkmen (tuk) - 94.5193%
 - Italian (ita) - 94.4541%
 - Belarusian (bel) - 94.2808%
 - Esperanto (epo) - 93.9475%
 - Spanish (spa) - 93.4009%
 - Volapuk (vol) - 92.6978%
 - Swedish (swe) - 91.9344%
 - Irish (gle) - 89.6735%
 - Latin (lat) - 89.0948%
 - Estonian (est) - 88.6921%
 - Czech (ces) - 88.5749%
 - Catalan (cat) - 88.0949%
 - Danish (dan) - 87.375%
 - Afrikaans (afr) - 86.578%
 - Bulgarian (bul) - 84.5754%
 - Slovak (slk) - 83.4555%
 - Serbian (srp) - 83.0823%
 - Macedonian (mkd) - 82.709%
 - Norwegian (nob) - 81.5358%