TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
Bug fixes:
ModelInsight
tests #407
Days.daysBetween
int overflow #471
New features / updates:
maxTrainingSample
for regression #413 and multi-class classification #414
InsightLOCOTest
#412
OpLinearRegression
#421
TextTokenizerTest
#442
SmartTextVectorizer
#448, #455
MinVarianceFilter
which checks that computed features have a minimum variance #463, #465
TextStats
length distribution to be token-based and refactor for testability #464
Dependency updates (#402, #466):
Miscellaneous:
Bug fixes:
ModelInsight
tests #395
SparseVector
s for LOCO #377
New features / updates:
FeatureDistribution
to SerializationFormat
s #383
OpStandadrdScaler
to allow for descaling #378
evalMetFromJson
#380
Dependency updates:
Bug fixes:
New features / updates:
key
ctor field in all RawFeatureFilter results #348
Dependency updates:
Bug fixes:
New features / updates:
Dependency updates: N/A
Bug fixes:
DataCutter
and DataBalancer
#256
OpXGBoostClassificationModel
#229
New features / updates:
ModelInsights
#237, #252, #258, #276
OpBinScoreEvaluator
to allow for lift analysis #233
RawFeatureFilter
#250
Dependency updates:
Bug fixes:
New features / updates:
Dependency updates:
New features and bug fixes:
Dependency upgrades & misc:
New features and bug fixes:
RawFeatureFilter
(see RawFeatureFilter.textBinsFormula
argument) #99
Geolocation
and GeolocationMap
so that keep the name of the column in descriptorValue. #100
RawFeatureFilter
to ModelInsights
#103
BinarySequenceTransformer
and BinarySequenceEstimator
+ plus the associated base traits #84
StringIndexerHandleInvalid.Keep
option into OpStringIndexer
(same as in underlying Spark estimator) #93
Text
, TextArea
, TextMap
and TextAreaMap
#63
Date
, DateTime
, DateMap
and DateTimeMap
#100
Breaking changes:
FileOutputCommiter
a default and got rid of DirectMapreduceOutputCommitter
and DirectOutputCommitter
#86
OpVectorColumnMetadata
to allow numeric column descriptors #89
JaccardDistance
to JaccardSimilarity
#80
Prediction
(instead of a variable number of feature - (pred, raw, prob)
). Example:val (pred, raw, prob) = MultiClassificationModelSelector() // won't compile anymore
val prediction = MultiClassificationModelSelector() // ok!
Another change is the way parameters are passed into model selectors. Example:
BinaryClassificationModelSelector
.withCrossValidation()
.setLogisticRegressionRegParam(0.05, 0.1) // won't compile anymore
Instead one should do:
val lr = new OpLogisticRegression()
val models = Seq(lr -> new ParamGridBuilder().addGrid(lr.regParam, Array(0.05, 0.1)).build())
BinaryClassificationModelSelector
.withCrossValidation(modelsAndParameters = models)
For more example on how to use new model selectors please refer to our documentation and helloworld examples.
Dependency upgrades & misc:
scala-graph
to 1.12.5
scalafmt
to 1.5.1
transmogrifai-local
subproject #41 introduces aardpfark
and hadrian
dependencies.Performance improvements:
New features and bug fixes:
Dependency upgrades:
Released to Bintray - https://bintray.com/salesforce/maven/TransmogrifAI