Dpark Versions Save

Python clone of Spark, a MapReduce alike framework in Python

0.5.0

5 years ago

API change

  • Remove module-level api like dpark.textFile.
  • Support Streaming shuffle and Disk shuffle (Experimental, compatible).

Fixes

  • Bug when parsing mfs chunk info.

Improvement

  • Better broadcast impl using shared memory for tasks on the same slave to reduce memory cost.
  • Better offer-matching logic for MesosScheduler which remember bad slaves.
  • Refactor: style and layout.

New Feature

  • Multi segment dump to save memory.
  • Gather statics for stage.
  • Support run tests/test_rdd on mesos.
  • Add colorful progress bar for dpark.
  • Support mesos role.
  • Support multi named mesos master in conf.
  • Loghub for admin.

0.4.2

6 years ago
  • Support Python3 & PyPy
  • Support MooseFS 3.x & refactor on file-system interface

0.4.1

7 years ago
  • Enhancement for the containerizer in DPark
  • Use broadcast when parallelize big dataset
  • Fix missing line bug for bzip2 files
  • Add TopByKey in RDD
  • Other minor bugs

0.4.0

7 years ago
  • Bugfix: deserialize error of old-style class.
  • Refactor beansdb RDD
  • Web UI support for dpark
  • Use pymesos >= 0.2.0
  • Eager serialize values of ParallelCollection