Open source platform for the machine learning lifecycle
Simple and Distributed Machine Learning
lakeFS - Data version control for your data lake | Git for data
酷玩 Spark: Spark 源代码解析、Spark 类库等
Interactive and Reactive Data Science using Scala and Spark.
Kubernetes operator for managing the lifecycle of Apache Spark applicati...
Apache Spark docker image
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET de...
Feathr – A scalable, unified data and AI engineering platform for enterp...
Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time ...
A curated list of awesome Apache Spark packages and resources.
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cu...
The Internals of Apache Spark
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Wareh...
PySpark + Scikit-learn = Sparkit-learn