FATE Versions Save

An Industrial Grade Federated Learning Framework

v2.1.0

2 months ago

By downloading, installing or using the software, you accept and agree to be bound by all of the terms and conditions of the LICENSE and DISCLAIMER.

Major Features and Improvements

Arch

Some bugs fixed for spark computing engine

Component

Unified IO keys naming format for all components
Add LLMLoader to support running FATE-LLM v2.0 with pipeline

OSX

Compatible with eggroll-v2.x

EggRoll

add 2.x api backport support
bug fix

FATE-Flow

Improved the display issue of output data.
Enhanced the PyPI package: configuration files have been relocated to the user's home directory, and the relative paths for uploading data are based on the user's home directory.
Added support for running FATE algorithms with Spark + Hadoop.
Fixed an issue where failed tasks could not be retried.
Fixed an issue where the system couldn't run when the task cores exceeded the system total cores.

FATE-Client

Pipeline: add supports for fate-llm 2.0
- newly added LLMModelLoader, LLMDatasetLoader, LLMDataFuncLoader
- newly added configuration parsing of seq2seq_runner and ot_runner
Pipeline: unified input interface of components

FATE-LLM

Adapt to fate-v2.0 framework:
- Migrate parameter-efficient fine-tuning training methods and models.
- Migrate Standard Offsite-Tuning and Extended Offsite-Tuning（Federated Offsite-Tuning+)
- Newly trainer，dataset, data_processing function design
New FedKSeed Federated Tuning Algorithm: train large language models in a federated learning setting with extremely low communication cost

FATE-Test

Add Support for Job Runtime Configuration

v2.0.0

4 months ago

By downloading, installing or using the software, you accept and agree to be bound by all of the terms and conditions of the LICENSE and DISCLAIMER.

Major Features and Improvements

FATE 2.0

collaps

Arch 2.0：Building Unified and Standardized API for Heterogeneous Computing Engines Interconnection

Introduce Context to manage useful APIs for developers, such as Distributed Compting, Federation, Cipher, Tensor, Metrics, and IO.
Introduce Tensor data structure to handle local and distributed matrix operation, with built-in heterogeneous acceleration support.
- abstracted PHETensor, smooth switch between various underlying PHE implementations through standard interface
Introduce DataFrame, a 2D tabular data structure for data io and simple feature engineering
- add data block manager to support mixed-type columns & feature anonymization
- added 30+ operator interfaces for statistics, including comparison, indexing, data binning, and transformation, etc
Refactor Federation, a unified interface for federated communication. We provide a unified Serdes control and more user-friendly api.
Introduce Config, a unified configuration for FATE, including safety restrictions, system configuration, and algorithm configuration
Refactor logger, customizable logging for different use cases and flavors.
Introduce Launcher, a simple tool for federated program execution, especially useful for standalone and local debugging
Framework: PSI-ECDH protocol support, single entry for histogram statistical computation
Deepspeed integration: support distributed training using deepspeed with Eggroll.
Protocol: Support for SSHE(mpc and homomophic encryption mixed protocol), ECDH, Secure Aggregation protocols
Experimental Integrate Crypten for SMPC support, more protocols and features will be added in the future

Components 2.0: Building Standardized Algorithm Components for different Scheduling Engines

Introduce components toolbox to wrap ML modules as standard executable programs
spec and loader expose clear API for smooth internal extension and external system integration
Provide several cli tools to interact and execute components
Input-Output: Further decoupling of FATE-Flow, providing standardized black-box calling processes
Component Definition: Support for typing-based definition, automatic checking for component parameters, support for multiple types of data and model input and output, in addition to multiple inputs

ML 2.0: Major functionality migration from FATE-v1.x, decoupling call hierarchy

Data preprocessing: Added DataFrame Transformer; Reader, Union and DataSplit migration completed
Feature Engineering: Migrated HeteroFederatedBinning, HeteroFeatureSelection, DataStatistics, Sampling, FeatureScale and Pearson Correlation
Federated Training Migrated: HeteroSecureBoost, HomoNN, HeteroCoordinatedLogisticRegression, HeteroCoordinatedLinearRegression, SSHE-LogisticRegression and SSHE-LinearRegression
Federated Training Added:
- SSHE-HeteroNN: based on mpc and homomorphic encryption mixed protocal
- FedPASS-HeteroNN: based on fedpass protocol

Algorithm Performance Improvements (Comparison with FATE-v1.11.*)

PSI (Privacy Set Intersection): tested on a dataset of 100 million with an intersection result of 100 million, 1.8+ times of FATE-v1.11.4
Hetero-SSHE-LR: tested on data of guest 10w * 30 dimensions and host 10w * 300 dimensions, 4.3+ times of FATE-v1.11.4
Hetero-NN(Based on FedPass Protocol): tested on data of guest 10w * 30 dimensions and host 10w * 300 dimensions, basically consistent with the plaintext performance, 143+ times of FATE-v1.11.4
Hetero-Coordinated-LR: tested on data of guest 10w * 30 dimensions and host 10w * 300 dimensions, 1.2+ times of FATE-v1.11.4
Hetero-Feature-Binning: tested on data of guest 10w * 30 dimensions and host 10w * 300 dimensions, 1.5+ times of FATE-v1.11.4

OSX(Open Site Exchange) 1.0: Building Open Platform for Cross-Site Communication Interconnection

Implement the transmission interface in accordance with the “ Technical Specification for Financial Industry Privacy Computing Interconnection Platform”,The transmission interface is compatible with FATE 1.X version and FATE 2.X version
Supports GRPC synchronous and streaming transmission, supports TLS secure transmission protocol, and is compatible with FATE1.X rollsite components
Supports Http 1.X protocol transmission and TLS secure transmission protocol
Support message queue mode transmission, used to replace rabbitmq and pulsar components in FATE 1.X
Supports Eggroll and Spark computing engines
Supports networking as an Exchange component, with support for FATE 1.X and FATE 2.X access
Compared to the rollsite component, it improves the exception handling logic during transmission and provides more accurate log output for quickly locating exceptions.
The routing configuration is basically consistent with the original rollsite, reducing the difficulty of porting
Supports HTTP interface modification of routing tables and provides simple permission verification
Improved network connection management logic, reduced connection leakage risk, and improved transmission efficiency
Using different ports to handle access requests both inside and outside the cluster, facilitating the adoption of different security policies for different ports

FATE Flow 2.0: Building Open and Standardized Scheduling Platform for Scheduling Interconnection

collaps

Adapted to new scalable and standardized federated DSL IR
Built an interconnected scheduling layer framework, supported the BFIA protocol
Optimized process scheduling, with scheduling separated and customizable, and added priority scheduling
Optimized algorithm component scheduling，support container-level algorithm loading， enhancing support for cross-platform heterogeneous scenarios
Optimized multi-version algorithm component registration, supporting registration for mode of components
Federated DSL IR extension enhancement: supports multi-party asymmetric scheduling
Optimized client authentication logic, supporting permission management for multiple clients
Optimized RESTful interface, making parameter fields and types, return fields, and status codes clearer
Added OFX(Open Flow Exchange) module: encapsulated scheduling client to allow cross-platform scheduling
Supported the new communication engine OSX, while remaining compatible with all engines from FATE Flow 1.x
Decoupled the System Layer and the Algorithm Layer, with system configuration moved from the FATE repository to the Flow repository
Published FATE Flow package to PyPI and added service-level CLI for service management
Migrated major functionality from FATE Flow 1.x

FATE-Client 2.0: Building Scalable Federated DSL for Application Layer Interconnection And Providing Tools For Fast Federated Modeling

collaps

Introduce new scalable and standardized federated DSL IR(Intermediate Representation) for federated modeling job
Compile python client to DSL IR
Federated DSL IR extension enhancement: supports multi-party asymmetric scheduling.
Support mutual translation between Standardized Fate-2.0.0 DSL IR and UnionPay's BFIA protocol.
Support components with UnionPay's BFIA protocol through adapter mode
Flow CLI and PipeLine share configuration

FATE-Test: FATE Automated Testing Tool

collaps

Migrated automated testing for functionality, performance, and correctness

FATE-Board 2.0

collaps

Refactoring DAG components, adding support for stage status, and displaying dynamic ports.
Update the cache structure to optimize issues such as user timeout handling and duplicate storage of configuration information.
Optimize some interactive functions.
Update the style theme.

Eggroll 3.0

collaps

Enhancements in the JVM Part:

Core Component Reconstruction: The cluster-manager and node-manager components have been entirely rebuilt using Java, ensuring uniformity and enhanced performance.
Transport Component Modification: The rollsite transport component has been removed and replaced with the more efficient osx component.
Improved Process Management: Advanced logic has been implemented to manage processes more effectively, significantly reducing the risk of process leakage.
Enhanced Data Storage Logic: Data storage mechanisms have been refined for better performance and reliability.
Concurrency Control Improvements: We've upgraded the logic for concurrency control in the original components, leading to performance boosts.
Visualization Component: A new visualization component has been added for convenient monitoring of computational information.
Refined Logging: The logging system has been enhanced for more precise outputs, aiding in rapid anomaly detection.

Upgrades in the Python Part:

Reconstruction of roll_pair and egg_pair: These components now support serialization and partition methods controlled by the caller. Serialization safety is uniformly managed by the caller.
Automated Cleanup of Intermediate Tables: The issue of automatic cleaning for intermediate tables between federation and computing has been resolved, eliminating the need for extra operations by the caller.
Unified Configuration Control: A flexible configuration system is introduced, supporting direct pass-through, configuration files, and environment variables to cater to diverse requirements.
Client-Side PyPI Installation: Eggroll 3.0 supports easy installation via PyPI for clients.
Optimized Log Configuration: Callers can now customize log formats according to their needs.
Code Structure Refinement: The codebase has been streamlined for clarity, removing a substantial amount of redundant code.

Eggroll 3.0 brings comprehensive enhancements in system performance, usability, and reliability with these significant updates in both the JVM and Python parts.

Easy Deploy

Supports installation of FATE by PyPi

Commit Authors

collaps

dylan-fan <[email protected]> @dylan-fan
mgqa34 <[email protected]> @mgqa34
sagewe <[email protected]> @sagewe
forgive_dengkai <[email protected]> @forgivedengkai
xiongli <[email protected]> @Xiong-Li-github
weijingchen <[email protected]> @talkingwallace
Yu Wu <[email protected]> @nemirorox
zhihuiwan <[email protected]> @zhihuiwan
Arvin Huang <[email protected]> @idwenwen

v1.11.4

5 months ago

By downloading, installing or using the software, you accept and agree to be bound by all of the terms and conditions of the LICENSE and DISCLAIMER.

Major Features and Improvements

FederatedML

Unified key length configuration of encryption algorithm, update default key length to 2048.

Bug-Fix

Modify hessian computation of softmax cross entropy in SecureBoost, to align with LightGBM.
Fix Model initialization error in Homo Neural Network predicting process.

v2.0.0-beta

7 months ago

By downloading, installing or using the software, you accept and agree to be bound by all of the terms and conditions of the LICENSE and DISCLAIMER.

Major Features and Improvements

Arch 2.0：Building Unified and Standardized API for Heterogeneous Computing Engines Interconnection

Framework: PSI-ECDH protocol support, single entry for histogram statistical computation
Protocol: Support for ECDH, Secure Aggregation protocols
Tensor: abstracted PHETensor, smooth switch between various underlying PHE implementations through standard interface
DataFrame: New data block manager supports mixed-type columns & feature anonymization; added 30+ operator interfaces for statistics, including comparison, indexing, data binning, and transformation, etc.
Enhanced workflow: Support for Cross Validation workflow

Components 2.0: Building Standardized Algorithm Components for different Scheduling Engines

Input-Output: Further decoupling of FATE-Flow, providing standardized black-box calling processes
Component Definition: Support for typing-based definition, automatic checking for component parameters, support for multiple types of data and model input and output, in addition to multiple inputs

ML 2.0: Major functionality migration from FATE-v1.x, decoupling call hierarchy

Data preprocessing: Added DataFrame Transformer, Union and DataSplit migration completed
Feature Engineering: Migrated HeteroFederatedBinning, HeteroFeatureSelection, DataStatistics, Sampling, FeatureScale

Federated Training: Migrated HeteroSecureBoost, HomoNN, vertical CoordinatedLogisticRegression, and CoordinatedLinearRegression
Evaluation: Migrated Evaluation

OSX(Open Site Exchange) 1.0: Building Open Platform for Cross-Site Communication Interconnection

Improved HTTP/1.X protocol support, support for GRPC-to-HTTP transmission
Support for TLS secure transmission protocol
Added routing table configuration interface
Added routing table connectivity automatic check
Improved transmission function in cluster mode
Enhanced flow control in cluster mode
Support for simple interface authentication

FATE Flow 2.0: Building Open and Standardized Scheduling Platform for Scheduling Interconnection

Migrated functions: data upload/download, process scheduling, component output data/model/metric management, multi-storage adaptation for models, authentication, authorization, feature anonymization, multi-computing/storage/communication engine adaptation, and system high availability
Optimized process scheduling, with scheduling separated and customizable, and added priority scheduling
Optimized algorithm component scheduling, dividing execution steps into preprocessing, running, and post-processing
Optimized multi-version algorithm component registration, supporting registration for mode of components
Optimized client authentication logic, supporting permission management for multiple clients
Optimized RESTful interface, making parameter fields and types, return fields, and status codes clearer
Decoupling the system layer from the algorithm layer, with system configuration moved from the FATE repository to the Flow repository
Published FATE Flow package to PyPI and added service-level CLI for service management

Fate-Client 2.0: Building Scalable Federated DSL for Application Layer Interconnection And Providing Tools For Fast Federated Modeling.

Migrated Flow CLI and Flow SDK
Updated federated DSL IR: enhance IR, add DataWarehouse and ModelWarehouse to load data and model from other sources
Update component definitions to support Fate-v2.0.0-beta
Flow CLI and PipeLine share configuration

Fate-Test: FATE Automated Testing Tool

Migrated automated testing for functionality, performance, and correctness

v1.11.3

8 months ago

By downloading, installing or using the software, you accept and agree to be bound by all of the terms and conditions of the LICENSE and DISCLAIMER.

Major Features and Improvements

FederatedML

FedAVGTrainer update code strcuture: support OffsitetTuningTrainer
FedAVGTrainer update log format: report batch progress instead of batch index

v1.11.2

11 months ago

By downloading, installing or using the software, you accept and agree to be bound by all of the terms and conditions of the LICENSE and DISCLAIMER.

Major Features and Improvements

FederatedML

Integrate DeepSpeed, support distributed training of FATE-LLM
Separate upgraded FATE-LLM's from FATE into new “FATE-LLM” github repo
HomoNN now supports data collator and distributed sampler
Hetero SecureBoost supports running multiple boosting rounds in complete secure mode with complete_secure option

Bug-Fix

Fix hessian computation of softmax cross entropy in SecureBoostt

v1.11.1

1 year ago

By downloading, installing or using the software, you accept and agree to be bound by all of the terms and conditions of the LICENSE and DISCLAIMER.

Major Features and Improvements

FederatedML

Support Homo Graph Neural Network
PSI-DH protocol enhancement: use Oakley MODP modulus groups

v1.11.0

1 year ago

By downloading, installing or using the software, you accept and agree to be bound by all of the terms and conditions of the LICENSE and DISCLAIMER.

Major Features and Improvements

FederatedML

Support FATE-LLM (Federated Large Language Models)
- Integration of LLM for federated learning: BERT, ALBERT, RoBERTa, GPT-2, BART, DeBERTa, and DistilBERT. Please note that if using such pretrain-models, compliance with their licenses is needed.
- Integration of Parameter-efficient tuning methods for federated learning: Bottleneck Adapters (including Houlsby, Pfeiffer, Parallel schemes), Invertible Adapters, LoRA, IA3, and Compacter.
- Improved Homo Federated Trainer class, allowing CUDA device specification and DataParallel acceleration for multi-GPU devices.
- TokenizerDataset feature upgrade, better adaptation to HuggingFace Tokenizer.

Bug-Fix

Fix inconsistent bin_num display of Hetero Feature Binning for data contains missing value
Fix inconsistency in transforming data for transforming selected columns of Hetero Feature Binning When using ModelLoader
Fix exclusive_data_type not valid in DataTransform when meta for input data is missing
Fix weighted loss calculation and feature importance display issues in Tree-Based models
Fix sample id display of NN

v2.0.0-alpha

1 year ago

By downloading, installing or using the software, you accept and agree to be bound by all of the terms and conditions of the LICENSE and DISCLAIMER.

Feature Highlights

Arch 2.0：Building Unified and Standardized API for Heterogeneous Computing Engines Interconnection

Introduce Context to manage useful APIs for developers, such as Metrics, Cipher, Tensor and IO.
Introduce Tensor data structure to handle local and distributed matrix operation, with built-in heterogeneous acceleration support.
Introduce DataFrame, a 2D tabular data structure for data io and simple feature engineering.
Refactor logger, customizable logging for different use cases and flavors.
Introduce new high-level federation API suite: context.<role>.get(name)/context.<role>.put(name=value).

Components 2.0: Building Standardized Algorithm Components for different Scheduling Engines

Introduce components toolbox to wrap ML modules as standard executable programs.
spec and loader expose clear API for smooth internal extension and external system integration.
Provide several cli tools to interact and execute components.
Implement base demos components: reader, intersection, feature scale, lr and evaluation.

ML 2.0(demo)

Provide base demos for federated machine learning algorithm: intersection、feature scale、lr and evaluation.

Pipeline 2.0: Building Scalable Federated DSL for Application Layer Interconnection

Introduce new scalable and standardized federated DSL IR(Intermediate Representation) for federated modeling job
Compile python client to DSL IR
Support multiple scalable execution backends, including standalone and Fate-Flow.

OSX(Open Site Exchange) 1.0: Building Open Platform for Cross-Site Communication Interconnection

Standardized Cross-Site lower-level federation api
Support grpc synchronous transmission and streaming transmission; Compatible with eggroll interface and can replace FATE-1.x rollsite component
Support asynchronous message transmission, which can replace rabbitmq and pulsar components in FATE-1.x
Support HTTP-1.X protocol transmission
Support cluster deployment and inter-site traffic control
Support networking as an Exchange component

FATE Flow 2.0: Building Open and Standardized Scheduling Platform for Scheduling Interconnection

Adapted to new scalable and standardized federated DSL IR
Standardized API interface with param type checking
Decoupling Flow from FATE repository
Optimized scheduling logic, with configurable dispatcher decoupled from initiator
Support container-level algorithm loading and task scheduling, enhancing support for cross-platform heterogeneous scenarios
Independent maintenance for system configuration to enhance flexibility and ease of configuration
Support new communication engine OSX, while compatible with all engines from Flow 1.X
Introduce OFX(Open Flow Exchange) module: encapsulated scheduling client to allow cross-platform scheduling

Deploy

Support installing from PyPI

v1.10.0

1 year ago

By downloading, installing or using the software, you accept and agree to be bound by all of the terms and conditions of the LICENSE and DISCLAIMER.

Major Features and Improvements

FederatedML

Renewed Homo NN: PyTorch-based, support flexible model building:
- Support user access to complex self-defined PyTorch models or ready-to-use PyTorch models such as DeepFM, ResNet, BERT, Yolo
- Support various data set types, may build data set based on PyTorch Dataset
- User-defined training loss
- User-defined training process: user-defined aggregation algorithm for client and server
- Provide API for developing Aggregator
Upgraded Hetero NN: support flexible model building and various data set types:
- more flexible pytorch top/bottom model customization; provide access to industry approved PyTorch models
- User-defined training loss
- Support various data set types, may build data set based on PyTorch Dataset
Renewed Homo-federated framework with support for all current homo models, including Homo NN, Homo LR,Homo SecureBoost, Homo Feature Binning, and Hetero KMeans. This provides smoother algorithm customization and development experience
Semi-Supervised Algorithm Positive Unlabeled Learning
Hetero LR & Hetero SecureBoost now supports Intel IPCL
Intersection support Multi-host Elliptic-curve-based PSI
Intersection may compute Multi-host Secure PSI Cardinality
Hetero Feature Optimal Binning now record & show Gini/KS/Chi-Square metrics
Host may load Hetero Binning model with WOE score through Model Loader
Hetero Feature Binning support binning by user-provided split points
Sampler support weighted sampling by instance weight

Fate-Client

Flow CLI adds min-test options
Pipeline adds data-bind API, useful for local development
Pipeline may reconfigure role/model_id/model_version, switching party_id for prediction task