Spark PMoF Save Abandoned

Spark Shuffle Optimization with RDMA+AEP

Project README

Spark-PMoF: RPMem extension for Spark Shuffle

Spark-PMoF (Persistent Memory over Fabric), RPMem extension for Spark Shuffle, is a Spark Shuffle Plugin which enables persistent memory and high performance fabric technology like RDMA for Spark shuffle to improve Spark performance in shuffle intensive scneario.

IMPORTANT NOTE

Spark-PMof has been migrated and integrated to OAP: https://github.com/Intel-bigdata/OAP/tree/master/oap-shuffle/RPMem-shuffle. Please Check OAP for most recent update.

Contents

Introduction

Installation

Make sure you got HPNL installed.

git clone https://github.com/Intel-bigdata/Spark-PMoF.git
cd Spark-PMoF; mvn package -DskipTests -Pspark-2

If the pmem hardware is ready,it's useful to test by removing the -DskipTests option:

mvn package

Benchmark

Usage

This plugin current supports Spark 2.3 and works well on various Network fabrics, including Socket, RDMA and Omni-Path. Before runing Spark workload, add following contents in spark-defaults.conf, then have fun! :-)

spark.driver.extraClassPath Spark-PMoF-PATH/target/sso-0.1-jar-with-dependencies.jar
spark.executor.extraClassPath Spark-PMoF-PATH/target/sso-0.1-jar-with-dependencies.jar
spark.shuffle.manager org.apache.spark.shuffle.pmof.PmofShuffleManager

Contact

Chendi Xue, [email protected] Jian Zhang, [email protected]

Open Source Agenda is not affiliated with "Spark PMoF" Project. README Source: Intel-bigdata/Spark-PMoF
Stars
30
Open Issues
10
Last Commit
11 months ago
License

Open Source Agenda Badge

Open Source Agenda Rating