A collection of languages stemmers and stopwords for Lunr Javascript library
Lunr Languages is a Lunr addon that helps you search in documents written in the following languages:
Lunr Languages is compatible with Lunr version 0.6
, 0.7
, 1.0
and 2.X
.
Lunr-languages works well with script loaders (Webpack, requirejs) and can be used in the browser and on the server.
The following example is for the German language (de).
Add the following JS files to the page:
<script src="lunr.js"></script> <!-- lunr.js library -->
<script src="lunr.stemmer.support.js"></script>
<script src="lunr.de.js"></script> <!-- or any other language you want -->
then, use the language in when initializing lunr:
var idx = lunr(function () {
// use the language (de)
this.use(lunr.de);
// then, the normal lunr index initialization
this.field('title', { boost: 10 });
this.field('body');
// now you can call this.add(...) to add documents written in German
});
That's it. Just add the documents and you're done. When searching, the language stemmer and stopwords list will be the one you used.
Add require.js
to the page:
<script src="lib/require.js"></script>
then, use the language in when initializing lunr:
require(['lib/lunr.js', '../lunr.stemmer.support.js', '../lunr.de.js'], function(lunr, stemmerSupport, de) {
// since the stemmerSupport and de add keys on the lunr object, we'll pass it as reference to them
// in the end, we will only need lunr.
stemmerSupport(lunr); // adds lunr.stemmerSupport
de(lunr); // adds lunr.de key
// at this point, lunr can be used
var idx = lunr(function () {
// use the language (de)
this.use(lunr.de);
// then, the normal lunr index initialization
this.field('title', { boost: 10 })
this.field('body')
// now you can call this.add(...) to add documents written in German
});
});
var lunr = require('./lib/lunr.js');
require('./lunr.stemmer.support.js')(lunr);
require('./lunr.de.js')(lunr); // or any other language you want
var idx = lunr(function () {
// use the language (de)
this.use(lunr.de);
// then, the normal lunr index initialization
this.field('title', { boost: 10 })
this.field('body')
// now you can call this.add(...) to add documents written in German
});
If your documents are written in more than one language, you can enable multi-language indexing. This ensures every word is properly trimmed and stemmed, every stopword is removed, and no words are lost (indexing in just one language would remove words from every other one.)
var lunr = require('./lib/lunr.js');
require('./lunr.stemmer.support.js')(lunr);
require('./lunr.ru.js')(lunr);
require('./lunr.multi.js')(lunr);
var idx = lunr(function () {
// the reason "en" does not appear above is that "en" is built in into lunr js
this.use(lunr.multiLanguage('en', 'ru'));
// then, the normal lunr index initialization
// ...
});
You can combine any number of supported languages this way. The corresponding lunr language scripts must be loaded (English is built in).
If you serialize the index and load it in another script, you'll have to initialize the multi-language support in that script, too, like this:
lunr.multiLanguage('en', 'ru');
var idx = lunr.Index.load(serializedIndex);
Check the Contributing section
Searching inside documents is not as straight forward as using indexOf()
, since there are many things to consider in order to get quality search results:
['Hope', 'you', 'like', 'using', 'Lunr', 'Languages!']
Languages!
into Languages
consignment
but we want to search for consigned
? It should find it, since its meaning is the same, only the form is different.the
, it
, so
, etc. These words are called Stop words
I've created this project by compiling and wrapping stemmers toghether with stop words from various sources (including users contributions) so they can be directly used with all the current versions of Lunr.
I am providing code in the repository to you under an open source license. Because this is my personal repository, the license you receive to my code is from me and not my employer (Facebook)