HanBaoBao Save

Mandarin Chinese text segmentation and mobile dictionary Android app (中文分词)

Project README

HànBǎoBāo

Mandarin Chinese text segmentation and mobile dictionary Android app (中文分词)

I wrote this app to assist myself in learning Mandarin.

This repository consists of two parts:

  • A floating dictionary Android app which segments, transliterates, and provides dictionary definitions for Chinese text (simplified & traditional)
  • A program for building the database used by that app

Features:

  • Text Segmentation - split sentences into individual words. Tap a word multiple times to re-split.
  • Transliteration - transliterate words into their Pinyin representation.
  • Dictionary Definitions - tapping a word opens a list of dictionary definitions (CCEDict, NTI Buddhist Dictionary, ADSO, others).
  • Tone Markings - words are marked with their tone using both glyphs over the pinyin and colorization.
  • Tap to Read - tap on text in your chat app to load it into HanBaoBao.
  • Hide by HSK Level - optionally hide transliteration for all words below a given HSK level.
  • Part of Speech Tags - many words have part-of-speech and ontology tags.
  • Translation Tool - drag the icon into the translation tool to translate the sentence using Microsoft Translator or Google Translate (if installed)

The database building program compiles data from many sources and outputs a SQLite db which is read by the Android app. The database is likely useful for creating other apps and services.

The text segmentation algorithm used in the app is a custom one, but it works fairly well for my purposes, particularly since segments (words) can be resegmented by tapping on them.

Here's an older version of the app in action: https://www.youtube.com/watch?v=a9x9MBoLfxs

The app needs work to support Android 8 and some of the dictionary data is out-dated.

The dictionary data contained within is presented without license: obtain usage permission as needed.

Open Source Agenda is not affiliated with "HanBaoBao" Project. README Source: ReubenBond/HanBaoBao

Open Source Agenda Badge

Open Source Agenda Rating