Sparkling Water provides H2O functionality inside Spark cluster
Scriptis is for interactive data analysis with script development(SQL, P...
Kuwala is the no-code data platform for BI analysts and engineers enabli...
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log ...
pyspark methods to enhance developer productivity 📣 👯 🎉
This is a repo documenting the best practices in PySpark.
A boilerplate for writing PySpark Jobs
Pandas and Spark DataFrame comparison for humans and more!
Process Common Crawl data with Python and Spark
t-Digest data structure in Python. Useful for percentiles and quantiles,...
PySpark Cheat Sheet - example code to help you learn PySpark and develop...
Gathers Python deployment, infrastructure and practices.
Spark Gotchas. A subjective compilation of the Apache Spark tips and tricks
Fundamentals of Spark with Python (using PySpark), code examples
Delta Lake helper methods in PySpark