A simplified library for decentralized, privacy preserving machine learning
We are happy to announce Swarm 2.2.0 community release.
In this release, we have delivered key enhancements on UI/UX, which includes experiment tracking for easier “birds-eye” visualization of past training rounds, parallel Swarm installation on multiple hosts, Podman support via SLM-UI etc., that will significantly enhance user experience. We have also added powerful features to Swarm manageability framework for better management of user ML workloads.
Customers can download product bits and documentation from My HPE Software Center
Features
• Targeted SWOP command used to target the task on a specific SWOP node.
-Dynamic addition of peers to an ongoing task execution.
-Retrying the failed Task on a SWOP node.
• WITH ALL PEERS command to trigger a task execution on all available peers.
• UI/UX Enhancements
-Experiment tracking support to display the training attributes for multiple training rounds.
-Parallel Swarm installation - Option to add multiple hosts simultaneously.
-View SWOP profile and task yaml.
-Support Podman.
• Swarm support for SPIRE as certificate manager.
-Added CLI based SPIRE example (spire/cifar10).
• Real world NIH example – Added new example to show case Swarm use case with real world NIH dataset.
• Documentation enhancements
Defect fixes
• Stale SL Admin node stuck waiting for quorum while a new Admin is selected.
• Enabled non-default APLS port support from SLM-UI.
• Issues during re-start of SLM-UI container while running a training.
You can see the updated documentation for all new feature/defect fixes here . For help/clarifications, reach out Slack : https://hpe-external.slack.com/archives/C02PWRJPWVD
In this release, we have delivered key enhancements on SLM-UI (model training metrics, easy browse through of ML logs and centralized swarm log collector), that will significantly enhance user experience. For advanced Swarm Learning users, we have provided couple of additional options for merge algorithms. These will help optimize on training convergence for different customer workloads. Further, we have enabled persistence for blockchain data, which will benefit customers with offline analysis of training related data, faster restart of Swarm network (SN).
One can download product bits and documentation, from My HPE Software Center (https://myenterpriselicense.hpe.com/cwp-ui/auth/login),
Features: • Persistent data in SN o Make the SN blockchain persist on disk
• UI/UX Features o Model training metrics – Accuracy, Loss etc. at SL node and global Swarm level o Browse through ML container logs o Centralized Swarm log collector for faster diagnostic collection o Seamless Product upgrade
• New merge methods for Swarm merge process o Co-ordinate Median, Geometric Median o Configurable merge through I/O or Memory optimized modes
• Swarm on Podman (alternative for Docker) o Support Podman container runtime o Run Swarm containers with rootless privileges o Added support for SELinux with Podman on RHEL
• Enhanced diagnostics for SWOP and SN • Containerized License Server (APLS) • Documentation and example updates
Defect fixes: • Defect fixes in SN restart path • Corrected ‘LIST NODES’ to display only active nodes • Swarm components exits with proper diagnostics if certificates are expired • Swarm Learning Topology updated to reflect active nodes • Reverse proxy updates to consider the port number along with service name
Release Candidate 2 for 2.1.0 release - Not meant for production
We’re excited to announce Swarm Learning 2.0.0 community release!
This release contains following updates.
Handling Sentinel node failure. Any SN node can act as sentinel while adding new node. Supports mesh topology of SN network.
Electing new merge leader when a leader failure is detected. Handles stale leader recovery.
Swarm product installation through SLM-UI. Deploy and Manage Swarm Learning through SLM-UI.
Extend Swarm Learning for new ML platforms.
We’re excited to announce Swarm Learning 1.2.0 community release!
This release contains new features and important bug fixes.
Thank you all for your support! Please let us know for any feedback or queries.
This release contains the following features:
• SWOP Docker logs provides more information if user, SL, or SWOP containers exits due to an error.
• Fixed navigation errors in the web GUI.
• Enhanced logging and descriptive error messages in the web GUI.
• Added configurable SWCI_TASK_MAX_WAIT_TIME, which specifies wait time for WAIT FOR TASKRUNNER command.
• SWCI is updated with a new command (sleep command)
• User ML containers run with non-root privileges.
• SWOP_KEEP_CONTAINERS environmental variable is externalized.
• Enhanced Swarm Learning components to work with private Docker registry path patterns.
• Enhanced documentation.
First Community / Eval version of Swarm learning v0.3.0
This version is NO longer supported. People who have this version already, can refer to the old documentation in github if needed. We encourage all customers to move on to the latest version
All new customers are requested to take the current latest version of the product.
Community release 1.0.0 of Swarm Learning.
This release has the following features:
•Swarm core functionality and user ML workload are not tied to each other in the same Docker image. This enables you to run workload on any version of the ML platform of your choice (Keras, TensorFlow, or PyTorch)
•Swarm Command Interface (SWCI) to create and manage training environments.
•Programmatic Interface to SWCI.
•Swarm Operator (SWOP) to build and execute ML workflows in a decentralized way.
•Support for Nvidia GPUs.
•Web UI Installer for Windows, Linux, and MAC platforms.