Nihongodera Limelight Save

A php Japanese language text analyzer and parser.

Project README

Limelight

A php Japanese language analyzer and parser.

Split Japanese text into individual, full words
Find parts of speech for words
Find dictionary entries (lemmas) for conjugated words
Get readings and pronunciations for words
Build furigana for words
Convert Japanese to romaji (English lettering)

Version Notes

April 25, 2016: The Limelight API changed in Version 1.6.0. The new API uses collection methods to give developers better control of Limelight parse results. Please see the wiki for the updated documentation.
April 11, 2016: php-mecab, the MeCab bindings Limelight uses, were updated to version 0.6.0 in Dec. 2015 for php 7 support. The pre-0.6.0 bindings no longer work with the master branch of Limelight. If you are using an older version of php-mecab, please update your bindings or use the php-mecab_pre_0.6.0 version.

Install Limelight

Using Docker

From the project root, build the image:

docker build -f docker/Dockerfile -t limelight .

Once it is built, run the container:

docker run --name limelight -v /host/path/to/limelight:/usr/limelight -d --rm limelight

Access the project in the container:

docker exec -it limelight bash

Install composer dependencies from within the container:

composer install

Without Docker

Requirements

php > 5.6

Dependencies

Before installing Limelight, you must install both mecab and the php extension php-mecab on your system.

Linux Ubuntu Users

Use the install script included in this repository. The script only works for and php7. Download the script:

curl -O https://raw.githubusercontent.com/nihongodera/limelight/master/install_mecab_php-mecab.sh

Make the file executable:

chmod +x install_mecab_php-mecab.sh

Execute the script:

./install_mecab_php-mecab.sh

You may need to restart your server to complete the process.

For information about what the script does, see here.

Other Systems

Please see this page to learn more about installing on your system.

Install Limelight

Install Limelight through composer.

composer require nihongodera/limelight

Parse Text

Make a new instance of Limelight\Limelight. Limelight takes no arguments.

$limelight = new Limelight();

Use the parse() method on the Limelight object to parse Japanese text.

$results = $limelight->parse('庭でライムを育てています。');

The returned object is an instance of Limelight\Classes\LimelightResults.

Get Results

Get results for the entire text using methods available on LimelightResults.

$results = $limelight->parse('庭でライムを育てています。');

echo 'Words: ' . $results->string('word') . "\n";
echo 'Readings: ' . $results->string('reading') . "\n";
echo 'Pronunciations: ' . $results->string('pronunciation') . "\n";
echo 'Lemmas: ' . $results->string('lemma') . "\n";
echo 'Parts of speech: ' . $results->string('partOfSpeech') . "\n";
echo 'Hiragana: ' . $results->toHiragana()->string('word') . "\n";
echo 'Katakana: ' . $results->toKatakana()->string('word') . "\n";
echo 'Romaji: ' . $results->string('romaji', ' ') . "\n";
echo 'Furigana: ' . $results->string('furigana') . "\n";

Output: Words: 庭でライムを育てています。 Readings: ニワデライムヲソダテテイマス。 Pronunciations: ニワデライムヲソダテテイマス。 Lemmas: 庭でライムを育てる。 Parts of speech: noun postposition noun postposition verb symbol Hiragana: にわでらいむをそだてています。 Katakana: ニワデライムヲソダテテイマス。 Romaji: niwa de raimu o sodateteimasu. Furigana: 庭(にわ)でライムを育(そだ)てています。

Alter the collection of words however you like using the library of collection methods.

Get individual words off the LimelightResults object by using one of several applicable collection methods. Use methods available on the returned LimelightWord object.

$results = $limelight->parse('庭でライムを育てています。');

$word1 = $results->pull(2);

$word2 = $results->where('word', '庭');

echo $word1->string('romaji') . "\n";

echo $word2->string('furigana') . "\n";

Output: raimu 庭にわ

Methods on the LimelightResults object and the LimelightWord object follow the same conventions, but LimelightResults methods are plural (words()) while LimelightWord methods are singular (word()).

Alternatively, loop through all the words on the LimelightResults object.

$results = $limelight->parse('庭でライムを育てています。');

foreach ($results as $word) {
    echo $word->word() . ' is a ' . $word->partOfSpeech() . ' read like ' . $word->reading() . "\n";
}

Output: 庭 is a noun read like ニワで is a postposition read like デライム is a noun read like ライムを is a postposition read like ヲ育てています is a verb read like ソダテテイマス。 is a symbol read like 。

Full Documentation

Full documentation for Limelight can be found on the Limelight Wiki page.

Sources, Contributions, and Contributing

The Japanese parsing logic used in Limelight was adapted from Kimtaro's excellent Ruby program Ve. A big thank you to him and all the others who contributed on that project.

Limelight relies heavily on both MeCab and php-mecab.

Collection methods and methods in the Arr class were derived from Laravel's collection methods.

Contributors more than welcome.

Top

Open Source Agenda is not affiliated with "Nihongodera Limelight" Project. README Source: nihongodera/limelight

Stars

Open Issues

Last Commit

5 months ago

Repository

nihongodera/limelight

License

MIT

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/nihongodera-limelight"><img src="https://www.opensourceagenda.com/projects/nihongodera-limelight/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog

From the blog

Dec 11, 2022

How to Choose Which Programming Language to Learn First?

From the blog

Dec 11, 2022