Simple and Distributed Machine Learning
the portable Python dataframe library
State of the Art Natural Language Processing
Apache Linkis builds a computation middleware layer to facilitate connec...
Feathr – A scalable, unified data and AI engineering platform for enterp...
Petastorm library enables single machine or distributed training and eva...
A curated list of awesome Apache Spark packages and resources.
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Mach...
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cu...
Implementing best practices for PySpark ETL jobs and applications.
Jupyter magics and kernels for working with remote Spark clusters
PySpark-Tutorial provides basic algorithms using PySpark
Hopsworks - Data-Intensive AI platform with a Feature Store
SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLi...
MapReduce, Spark, Java, and Scala for Data Algorithms Book