Skale Versions Save

High performance distributed data processing engine

1.2.0

6 years ago

This is a major feature relase. Install it with npm

New

  • Skale-engine is renamed to skale. Version is now 1.2.0, identical to 0.8.0.
  • Add a machine learning library with classification, regression, clustering
  • Allows dependencies to be deployed in workers with new routine sc.require(). This will ease considerably the integration of various connectors to data sources, databases, etc.
  • Major improvements to documentation website

Improvements

  • The test suite has been fully reworked, and now uses individual files that can be executed separately
  • Tests are considerably faster and easier to develop and debug
  • Both standalone and distributed engine are now systematically tested
  • save(): now support output to CSV format
  • save(), textFile(): automatic forward of AWS env and credentials to workers
  • Workers: control garbage collect by command line option
  • Modernize javascript syntax
  • Continuous integration: add MacOSX target in addition to Linux and Windows

Fixes

  • Fix a problem insample()
  • Fix support of undefined keys in aggregateByKey()
  • Fix debug traces

0.7.1

7 years ago

This is a stability and bug fix release.

  • Documentation has been improved.
  • A new skale hacker's guide has been added.
  • A worker crash when using sample() with replacement has been fixed.

0.7.0

7 years ago

This is a major release. It brings new features:

  • Support to azure storage for reading (textFile) and writing (save)
  • Support to Apache parquet file format, for reading and writing
  • Performances for wide transformations involving shuffling, such as aggregateByKey, reduceByKey, or coGroup, join etc., have increased considerably vs 0.6 branch.
  • many bug fixes and stability improvements

Despite new major version, this release remains backward compatible with previous branch 0.6.x

Also available as always through npm

0.6.8

7 years ago

This is a stability and bug fix release. Documentation is improved, distributed mode is better: handling of tmp files and environment has been fixed.

0.6.7

7 years ago

Performances and scalability improvement release.

In distributed mode, a direct peer-to-peer shuffle data transfer between workers has been implemented. It improves scalability on large clusters when running with hundreds of simultaneous workers.

Standalone and distributed modes are now described. Debug traces are improved.

0.6.6

7 years ago

This is a stability and performance improvements release.

Memory efficiency has been improved in presence of large datasets (thousands of partitions) and job complexity (hundreds of stages/steps).

S3 support has been fixed, both for input and output.

Multi-machine communications and debugging traces have been improved.