Vietnamese Accent Prediction Save

A simple/fast/accurate accent prediction for non-accented Vietnamese text

Project README

Vietnamese Accent Prediction

A very simple/fast/accurate accent prediction for non-accented Vietnamese text using n-gram language model with Markov Chain

Performances

All the tests were done on my Macbook, 2.5 GHz Intel Core i7, 16 GB Ram

Speed: 350 sentences per second ~ 3500 words/syllables per second
Accuracy: 96.52% on test.txt provided in datasets folder

AccuracyCalculator ac = new AccuracyCalculator(); 
System.out.println("Accuracy:" + ac.getAccuracy("datasets/test.txt") +"%");

Examples

Anh yeu em --> Anh yêu em (I love you)
Toi dang di du lich o ha long --> Tôi đang đi du lịch ở hạ long (I am visting Halong)

API

Using the provided n-grams data

AccentPredictor ap = new AccentPredictor();
String str = "Toi thich di du lich Ha Noi";
String predictedStr = ap.predictAccents(str);

You can also get top N predicted results as follows:

AccentPredictor ap = new AccentPredictor();
String str = "Toi thich di du lich Ha Noi";

// (matched_str,  matched_score) map
LinkedHashMap<String, Double> = ap.predictAccentsWithMultiMatches(str, 5); //Return the 5 best matches

Using your own n-gram data

AccentPredictor ap = new AccentPredictor("_Your1GramFile", "_Your2GramsFile");
String str = "Toi thich di du lich Ha Noi";
String predictedStr = ap.predictAccents(str);

To create your own n-gram data, you can use the following API:

String dataFolderPath = "path_to_your_data"; // The folder contains your text data
int numberOfProcessingFiles = -1; // The max number of files you plan to process (-1 means using all the data)
boolean toLowercase = true; // if it is set to "true", the n-grams will be converted to lowercase
String _1GramFileOut =  "datasets/news1gram";
String _2GramsFileOut =  "datasets/news2grams";
new NGramer(dataFolderPath).statisticNGrams(numberOfProcessingFiles, toLowercase, _1GramFileOut, _2GramsFileOut);

Open Source Agenda is not affiliated with "Vietnamese Accent Prediction" Project. README Source: tienthanhdhcn/Vietnamese-Accent-Prediction

Stars

Open Issues

Last Commit

6 years ago

Repository

tienthanhdhcn/Vietnamese-Accent-Prediction

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/vietnamese-accent-prediction"><img src="https://www.opensourceagenda.com/projects/vietnamese-accent-prediction/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog

From the blog

Dec 11, 2022

How to Choose Which Programming Language to Learn First?

From the blog

Dec 11, 2022