Pysparkling Versions Save

A pure Python implementation of Apache Spark's RDD and DStream interfaces.

v0.6.2

1 year ago

make dependencies optional: boto, requests
compatibility

v0.6.1

3 years ago

testing continuous deployment

v0.6.0

4 years ago

Broadcast, Accumulator and AccumulatorParam by @alexprengere
support for increasing partition numbers in coalesce and repartition by @tools4origins

v0.5.0

5 years ago

fixes for HDFS thanks to @tools4origins
fix for empty partitions by @tools4origins
api fixes by @artem0 and @tools4origins
various updates for streaming submodule
various updates to lint and test system
logging: converted some info messages to debug
... documentation for some point releases is missing

v0.4.1

7 years ago

retries for failed partitions
improve pysparkling.streaming.DStream
updates to docs

v0.4.0

7 years ago

major addition: pysparkling.streaming
updates to RDD.sample()
reorganized scripts and tests
added RDD.partitionBy()
minor updates to pysparkling.fileio

v0.3.23

7 years ago

small improvements to fileio and better documentation

v0.3.22

7 years ago

reimplement RDD.groupByKey()
clean up of docstrings

v0.3.21

8 years ago

faster text file reading by using io.TextIOWrapper for decoding

v0.3.20

8 years ago

* Google Storage file system (using ``gs://``)
* dependencies: ``requests`` and ``boto`` are not optional anymore
* ``aggregateByKey()`` and ``foldByKey()`` return RDDs
* Python 3: use ``sys.maxsize`` instead of ``sys.maxint``
* flake8 linting