A pure Python implementation of Apache Spark's RDD and DStream interfaces.
testing continuous deployment
pysparkling.streaming.DStream
pysparkling.streaming
RDD.sample()
scripts
and tests
RDD.partitionBy()
pysparkling.fileio
small improvements to fileio and better documentation
RDD.groupByKey()
io.TextIOWrapper
for decoding* Google Storage file system (using ``gs://``)
* dependencies: ``requests`` and ``boto`` are not optional anymore
* ``aggregateByKey()`` and ``foldByKey()`` return RDDs
* Python 3: use ``sys.maxsize`` instead of ``sys.maxint``
* flake8 linting