A modular active learning framework for Python
modAL 0.4.0 is finally here! This new release is made possible by the contributions of @BoyanH, @damienlancry, and @OskarLiew, many thanks to them!
pandas.DataFrame
support, thanks to @BoyanH! This was a frequently requested feature which I was unable to properly implement, but @BoyanH has found a solution for this in #105.on_transformed=True
upon initialization.Committee
sets classes when fitting, this solves the error which occurred when no training data was provided during initialization. This fix was contributed in #100 by @OskarLiew, thanks for that!Committee.teach()
(#63)ActiveLearner
now supports np.nan
and np.inf
in the data by setting force_all_finite=False
upon initialization. #58check_X_y
no longer converts between datatypes. #49modAL.utils.data_vstack
now falls back to numpy.concatenate if possible.Fixes by @zhangyu94:
modAL.selection.shuffled_argmax
#32modAL.batch.ranked_batch
fixed. #30modAL.batch.select_instance
fixed. #29random_tie_break=True
to the query strategies first shuffles the pool then uses a stable sorting to find the instances to query. In the case where the maximum utility score is not unique, it is equivalent of randomly sampling from the top scoring instances.modAL.expected_error.expected_error_reduction
runtime improved by omitting unnecessary cloning of the estimator for every instance in the pool.In this small release, the expected error and log loss reduction algorithms (Roy and McCallum, 2001) were added.
In this release, the focus was on multilabel active learning strategies. The following algorithms were added:
The new release of modAL is here! This is a milestone in its evolution, because it has just received its first contributions from the open source community! :) Thanks for @dataframing and @nikolay-bushkov for their work! Hoping to see many more contributions from the community, because modAL still has a long way to go! :)
learner.query()
can be used without training the model first..query()
methods changed for BaseLearner
and BaseCommittee
to allow more general arguments for query strategies. Now it can accept any argument as long as the query_strategy
function supports it..score()
method was added for Committee
. Fixes #6.modAL.density
module was refactored using functions from sklearn.metrics.pairwise
. This resulted in a major increase in performance as well as a more sustainable codebase for the module.numpy.vstack
calls replaced with numpy.concatenate
.
Fixes #15.np.sum(generator)
calls were replaced with np.sum(np.from_iter(generator))
because deprecation of the original one.ActiveLearner
. Sampling for values are made by strategies estimating the possible gains for each point. Among these, three strategies are implemented currently: probability of improvement, expected improvement and upper confidence bounds.modAL.models.BaseLearner
abstract base class implemented. ActiveLearner
and BayesianOptimizer
both inherit from it.modAL.models.ActiveLearner.query()
now passes the ActiveLearner
object to the query function instead of just the estimator.modAL.utils.selection.multi_argmax()
now works for arrays with shape (-1, )
as well as (-1, 1)
.