High performance distributed data processing engine
Development activity is stopped, and this project is now archived.
High performance distributed data processing and machine learning.
Skale provides a high-level API in Javascript and an optimized parallel execution engine on top of NodeJS.
npm install skale
Word count example:
var sc = require('skale').context();
sc.textFile('/my/path/*.txt')
.flatMap(line => line.split(' '))
.map(word => [word, 1])
.reduceByKey((a, b) => a + b, 0)
.count(function (err, result) {
console.log(result);
sc.end();
});
In local mode, worker processes are automatically forked and communicate with app through child process IPC channel. This is the simplest way to operate, and it allows to use all machine available cores.
To run in local mode, just execute your app script:
node my_app.js
or with debug traces:
SKALE_DEBUG=2 node my_app.js
In distributed mode, a cluster server process and worker processes must be started prior to start app. Processes communicate with each other via raw TCP or via websockets.
To run in distributed cluster mode, first start a cluster server
on server_host
:
./bin/server.js
On each worker host, start a worker controller process which connects to server:
./bin/worker.js -H server_host
Then run your app, setting the cluster server host in environment:
SKALE_HOST=server_host node my_app.js
The same with debug traces:
SKALE_HOST=server_host SKALE_DEBUG=2 node my_app.js
The original authors of skale are Cedric Artigue and Marc Vertes.