A php Japanese language text analyzer and parser.
From the project root, build the image:
docker build -f docker/Dockerfile -t limelight .
Once it is built, run the container:
docker run --name limelight -v /host/path/to/limelight:/usr/limelight -d --rm limelight
Access the project in the container:
docker exec -it limelight bash
Install composer dependencies from within the container:
composer install
Before installing Limelight, you must install both mecab and the php extension php-mecab on your system.
Use the install script included in this repository. The script only works for and php7. Download the script:
curl -O https://raw.githubusercontent.com/nihongodera/limelight/master/install_mecab_php-mecab.sh
Make the file executable:
chmod +x install_mecab_php-mecab.sh
Execute the script:
./install_mecab_php-mecab.sh
You may need to restart your server to complete the process.
For information about what the script does, see here.
Please see this page to learn more about installing on your system.
Install Limelight through composer.
composer require nihongodera/limelight
Make a new instance of Limelight\Limelight. Limelight takes no arguments.
$limelight = new Limelight();
Use the parse() method on the Limelight object to parse Japanese text.
$results = $limelight->parse('庭でライムを育てています。');
The returned object is an instance of Limelight\Classes\LimelightResults.
Get results for the entire text using methods available on LimelightResults.
$results = $limelight->parse('庭でライムを育てています。');
echo 'Words: ' . $results->string('word') . "\n";
echo 'Readings: ' . $results->string('reading') . "\n";
echo 'Pronunciations: ' . $results->string('pronunciation') . "\n";
echo 'Lemmas: ' . $results->string('lemma') . "\n";
echo 'Parts of speech: ' . $results->string('partOfSpeech') . "\n";
echo 'Hiragana: ' . $results->toHiragana()->string('word') . "\n";
echo 'Katakana: ' . $results->toKatakana()->string('word') . "\n";
echo 'Romaji: ' . $results->string('romaji', ' ') . "\n";
echo 'Furigana: ' . $results->string('furigana') . "\n";
Output: Words: 庭でライムを育てています。 Readings: ニワデライムヲソダテテイマス。 Pronunciations: ニワデライムヲソダテテイマス。 Lemmas: 庭でライムを育てる。 Parts of speech: noun postposition noun postposition verb symbol Hiragana: にわでらいむをそだてています。 Katakana: ニワデライムヲソダテテイマス。 Romaji: niwa de raimu o sodateteimasu. Furigana:
庭 でライムを育 てています。
Alter the collection of words however you like using the library of collection methods.
Get individual words off the LimelightResults object by using one of several applicable collection methods. Use methods available on the returned LimelightWord object.
$results = $limelight->parse('庭でライムを育てています。');
$word1 = $results->pull(2);
$word2 = $results->where('word', '庭');
echo $word1->string('romaji') . "\n";
echo $word2->string('furigana') . "\n";
Output: raimu 庭
Methods on the LimelightResults object and the LimelightWord object follow the same conventions, but LimelightResults methods are plural (words()) while LimelightWord methods are singular (word()).
Alternatively, loop through all the words on the LimelightResults object.
$results = $limelight->parse('庭でライムを育てています。');
foreach ($results as $word) {
echo $word->word() . ' is a ' . $word->partOfSpeech() . ' read like ' . $word->reading() . "\n";
}
Output: 庭 is a noun read like ニワ で is a postposition read like デ ライム is a noun read like ライム を is a postposition read like ヲ 育てています is a verb read like ソダテテイマス 。 is a symbol read like 。
Full documentation for Limelight can be found on the Limelight Wiki page.
The Japanese parsing logic used in Limelight was adapted from Kimtaro's excellent Ruby program Ve. A big thank you to him and all the others who contributed on that project.
Limelight relies heavily on both MeCab and php-mecab.
Collection methods and methods in the Arr class were derived from Laravel's collection methods.
Contributors more than welcome.