An efficient open-source AutoML system for automating machine learning lifecycle, including feature engineering, neural architecture search, and hyper-parameter tuning.
MindWare is an efficient open-source system to help users to automate the process of 1) data pre-processing, 2) feature engineering, 3) algorithm selection, 4) architecture design, 5) hyper-parameter tuning, and 6) model ensembling. It is capable of improving its AutoML power by decomposing the entire large AutoML search space into small ones, and solve each sub-problems jointly and efficiently. MindWare is designed and developed by the AutoML team from the DAIR Lab at Peking University. The goal is to make machine learning easier to apply both in industry and academia, and help facilitate data science.
MindWare requires python>=3.6. There are two ways to install MindWare:
MindWare is available on PyPI. You can install it by tying:
pip install mindware
If you want to try the latest version, please manually install MindWare from source code by:
git clone https://github.com/thomas-young-2013/mindware.git && cd mindware
cat requirements/main.txt | xargs -n 1 -L 1 pip install
python setup.py install --user
For more detailed installation instructions, please refer to our Installation Guide Document.
Here is a brief example that uses the package.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from mindware.utils.data_manager import DataManager
from mindware.estimators import Classifier
iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1, stratify=y)
dm = DataManager(X_train, y_train)
train_data = dm.get_data_node(X_train, y_train)
test_data = dm.get_data_node(X_test, y_test)
clf = Classifier(time_limit=3600)
clf.fit(train_data)
pred = clf.predict(test_data)
For more details and characteristics, please read the documents and examples.
MindWare has a frequent release cycle. Please let us know if you encounter a bug by filling an issue.
We appreciate all contributions. If you are planning to contribute any bug-fixes, please do so without further discussions.
If you plan to contribute new features, new modules, etc. please first open an issue or reuse an existing issue, and discuss the feature with us.
To learn more about making a contribution to MindWare, please refer to our How-to contribution page.
We appreciate all contributions and thank all the contributors!
Targeting at openness and advancing state-of-art technology, we have also released several open source projects.
We encourage researchers to leverage the project to accelerate the AI development and research.
VolcanoML: Speeding up End-to-End AutoML via Scalable Search Space Decomposition Yang Li, Yu Shen, Wentao Zhang, Jiawei Jiang, Bolin Ding, Yaliang Li, Jingren Zhou, Zhi Yang, Wentao Wu, Ce Zhang and Bin Cui International Conference on Very Large Data Bases (VLDB 2021). https://arxiv.org/abs/2107.08861
Efficient Automatic CASH via Rising Bandits
Yang Li, Jiawei Jiang, Jinyang Gao, Yingxia Shao, Ce Zhang and Bin Cui
Proceedings of the AAAI Conference on Artificial Intelligence (AAAI 2020).
https://ojs.aaai.org/index.php/AAAI/article/view/5910
MFES-HB: Efficient Hyperband with Multi-Fidelity Quality Measurements Yang Li, Yu Shen, Jiawei Jiang, Jinyang Gao, Ce Zhang and Bin Cui Proceedings of the AAAI Conference on Artificial Intelligence (AAAI 2021). https://arxiv.org/abs/2012.03011
OpenBox: A Generalized Black-box Optimization Service Yang Li, Yu Shen, Wentao Zhang, Yuanwei Chen, Huaijun Jiang, Mingchao Liu, Jiawei Jiang, Jinyang Gao, Wentao Wu, Zhi Yang, Ce Zhang and Bin Cui ACM SIGKDD Conference on Knowledge Discovery and Data Mining (SIGKDD 2021). https://arxiv.org/abs/2106.00421
The entire codebase is under MIT license.