Text Classification Engine
Updated Spark Version, incorporated new learning algorithms (thanks to EmergentOrder).
Added minor change for avoiding NaN values in Naive Bayes confidence score computation (shift values over).
Minor change in PreparedData, added numFeatures to NBAlgorithmParams class for HashingTF instance.
Includes text vectorization, t.f.-i.d.f. based feature preparation, as well as multinomial naive Bayes and regularized logistic regression algorithm implementations for classification.
Includes text vectorization, t.f.-i.d.f. based feature preparation, as well as multinomial naive Bayes and regularized logistic regression algorithm implementations for classification.
Includes text vectorization, t.f.-i.d.f. based feature preparation, and a multinomial Naive Bayes algorithm implementation for classification.
Modified sample data file and import process.
Includes text vectorization, t.f.-i.d.f. based feature preparation, and a multinomial Naive Bayes algorithm implementation for classification.