Pysparkling Versions Save

A pure Python implementation of Apache Spark's RDD and DStream interfaces.

v0.6.2

1 year ago
  • make dependencies optional: boto, requests
  • compatibility

v0.6.1

3 years ago

testing continuous deployment

v0.6.0

4 years ago
  • Broadcast, Accumulator and AccumulatorParam by @alexprengere
  • support for increasing partition numbers in coalesce and repartition by @tools4origins

v0.5.0

5 years ago
  • fixes for HDFS thanks to @tools4origins
  • fix for empty partitions by @tools4origins
  • api fixes by @artem0 and @tools4origins
  • various updates for streaming submodule
  • various updates to lint and test system
  • logging: converted some info messages to debug
  • ... documentation for some point releases is missing

v0.4.1

7 years ago
  • retries for failed partitions
  • improve pysparkling.streaming.DStream
  • updates to docs

v0.4.0

7 years ago
  • major addition: pysparkling.streaming
  • updates to RDD.sample()
  • reorganized scripts and tests
  • added RDD.partitionBy()
  • minor updates to pysparkling.fileio

v0.3.23

7 years ago

small improvements to fileio and better documentation

v0.3.22

7 years ago
  • reimplement RDD.groupByKey()
  • clean up of docstrings

v0.3.21

8 years ago
  • faster text file reading by using io.TextIOWrapper for decoding

v0.3.20

8 years ago
* Google Storage file system (using ``gs://``)
* dependencies: ``requests`` and ``boto`` are not optional anymore
* ``aggregateByKey()`` and ``foldByKey()`` return RDDs
* Python 3: use ``sys.maxsize`` instead of ``sys.maxint``
* flake8 linting