Spark Lucenerdd Versions Save

Spark RDD with Lucene's query and entity linkage capabilities

v0.4.0

7 months ago

Changelog:

  • Versions bumps to support Spark 3.5.X and Scala 2.12

v0.3.10

2 years ago

Changelog:

  • Update sbt to 1.4.7

  • Create codacy-analysis.yml (#302)

  • Create codacy-analysis.yml

  • [sbt] patch updates (#307)

Closes #306 , #304 , #300 , #301 , #300 , #291 , #275

  • Update scalatest to 3.2.8 (#317)

  • Update scalactic to 3.2.8 (#316)

  • Update sbt-release to 1.0.15 (#308)

  • Update sbt-pgp to 2.1.2 (#309)

  • Update scalactic to 3.2.9 (#325)

  • Update scalatest to 3.2.9 (#326)

  • Update spark-core, spark-mllib, spark-sql to 2.4.8 (#327)

  • Update algebird-core to 0.13.8 (#330)

  • Update jts-core to 1.18.1 (#312)

  • Update sbt-scoverage to 1.8.2 (#331)

  • Update lucene to 8.8.2 (#311)

  • [hotfix] sbt migration (#332)

  • Update spatial4j to 0.8 (#313)

  • Update sbt to 1.5.3 (#333)

v0.3.8

4 years ago

See the project milestone 0.3.8.

v0.3.7

5 years ago

Breaking change:

LuceneRDDResponse is now extending RDD[Row] both for searching and linkage. LuceneRDDResponse offers now a toDF() for DataFrame convertion.

Changelog:

  • LuceneRDD with DataFrame constructor handles ArrayTypes, i.e., multivalued fields
  • LuceneRDDResponse extends RDD[Row] (breaking change)
  • [sbt] update to spark 2.4.2
  • Allow different Lucene Analyzers per field (#164)
  • Update to Lucene 8 (#161)
  • Allow the user to set a custom analyzer by specifying the class name.

v0.3.6

5 years ago

Breaking changes:

  • Method LuceneRDD.blockEntityLinkage now takes as argument a linker of type Row => org.apache.lucene.search.Query
  • Method LuceneRDD.blockDedup now takes as argument a linker of type Row => org.apache.lucene.search.Query

Changelog:

  • Lucene version 7.7.1
  • Fix issue (#150) on blockEntityLinkage with same name on columns.

v0.3.5

5 years ago

Changelog:

  • Persist RDD to disk to reduce memory footprint of LuceneRDD, see here
  • Allow users to select which fields to not analyze, i.e., by configuration file or any field ending with _notanalyzed
  • Upgrade to Lucene version 7.6.0

v0.3.4

5 years ago

Changelog:

  • Update to spark 2.4.0
  • Remove unused parameter, thanks to @tuleism
  • Allow user not to analyze fields by configuration and by naming convention

v0.3.1

6 years ago

Changelog:

  • LuceneRDD and ShapeLuceneRDD linkage supports linkage using RDD.cartesian
  • ShapeLuceneRDD complete kryo class registration
  • Update to Lucene 7.1.0
  • Minor fixes in Javadoc

v0.2.7

7 years ago

Changelog:

  • Add .toDF() method in LuceneRDDResponse
  • Update Lucene to v5.5.4
  • Fix issue that numeric fields where also stored as string fields in SparkDoc (62be817)
  • Use english analyzers as default analyzers.
  • Add field(fieldName: String) method in SparkDoc

v0.2.6

7 years ago

Changelog:

  • Provide more configuration options for StringFields
  • Add Ngram custom analyzer
  • Add termvector() method in LuceneRDD
  • Add indexStats() method in LuceneRDD
  • Lazily evaluate fields() method in LuceneRDD
  • Add TermDocMatrix class
  • Minor version updates