Spark Lucenerdd Versions Save

Spark RDD with Lucene's query and entity linkage capabilities

7 months ago

Changelog:

2 years ago

Changelog:

Closes #306 , #304 , #300 , #301 , #300 , #291 , #275

4 years ago

See the project milestone 0.3.8.

5 years ago

Breaking change:

LuceneRDDResponse is now extending RDD[Row] both for searching and linkage. LuceneRDDResponse offers now a toDF() for DataFrame convertion.

Changelog:

LuceneRDD with DataFrame constructor handles ArrayTypes, i.e., multivalued fields
LuceneRDDResponse extends RDD[Row] (breaking change)
[sbt] update to spark 2.4.2
Allow different Lucene Analyzers per field (#164)
Update to Lucene 8 (#161)
Allow the user to set a custom analyzer by specifying the class name.

5 years ago

Breaking changes:

Method LuceneRDD.blockEntityLinkage now takes as argument a linker of type Row => org.apache.lucene.search.Query
Method LuceneRDD.blockDedup now takes as argument a linker of type Row => org.apache.lucene.search.Query

Changelog:

5 years ago

Changelog:

Persist RDD to disk to reduce memory footprint of LuceneRDD, see here
Allow users to select which fields to not analyze, i.e., by configuration file or any field ending with _notanalyzed
Upgrade to Lucene version 7.6.0

5 years ago

Changelog:

6 years ago

Changelog:

7 years ago

Changelog:

Add .toDF() method in LuceneRDDResponse
Update Lucene to v5.5.4
Fix issue that numeric fields where also stored as string fields in SparkDoc (62be817)
Use english analyzers as default analyzers.
Add field(fieldName: String) method in SparkDoc

7 years ago

Changelog: