FATE Versions Save

An Industrial Grade Federated Learning Framework

v2.1.0

2 months ago

By downloading, installing or using the software, you accept and agree to be bound by all of the terms and conditions of the LICENSE and DISCLAIMER.

Major Features and Improvements

Arch

  • Some bugs fixed for spark computing engine

Component

  • Unified IO keys naming format for all components
  • Add LLMLoader to support running FATE-LLM v2.0 with pipeline

OSX

  • Compatible with eggroll-v2.x

EggRoll

  • add 2.x api backport support
  • bug fix

FATE-Flow

  • Improved the display issue of output data.

  • Enhanced the PyPI package: configuration files have been relocated to the user's home directory, and the relative paths for uploading data are based on the user's home directory.

  • Added support for running FATE algorithms with Spark + Hadoop.

  • Fixed an issue where failed tasks could not be retried.

  • Fixed an issue where the system couldn't run when the task cores exceeded the system total cores.

FATE-Client

  • Pipeline: add supports for fate-llm 2.0
    • newly added LLMModelLoader, LLMDatasetLoader, LLMDataFuncLoader
    • newly added configuration parsing of seq2seq_runner and ot_runner
  • Pipeline: unified input interface of components

FATE-LLM

  • Adapt to fate-v2.0 framework:
    • Migrate parameter-efficient fine-tuning training methods and models.
    • Migrate Standard Offsite-Tuning and Extended Offsite-Tuning(Federated Offsite-Tuning+)
    • Newly trainer,dataset, data_processing function design
  • New FedKSeed Federated Tuning Algorithm: train large language models in a federated learning setting with extremely low communication cost

FATE-Test

  • Add Support for Job Runtime Configuration

v2.0.0

4 months ago

By downloading, installing or using the software, you accept and agree to be bound by all of the terms and conditions of the LICENSE and DISCLAIMER.

Major Features and Improvements

FATE 2.0

collaps

Arch 2.0:Building Unified and Standardized API for Heterogeneous Computing Engines Interconnection

  • Introduce Context to manage useful APIs for developers, such as Distributed Compting, Federation, Cipher, Tensor, Metrics, and IO.
  • Introduce Tensor data structure to handle local and distributed matrix operation, with built-in heterogeneous acceleration support.
    • abstracted PHETensor, smooth switch between various underlying PHE implementations through standard interface
  • Introduce DataFrame, a 2D tabular data structure for data io and simple feature engineering
    • add data block manager to support mixed-type columns & feature anonymization
    • added 30+ operator interfaces for statistics, including comparison, indexing, data binning, and transformation, etc
  • Refactor Federation, a unified interface for federated communication. We provide a unified Serdes control and more user-friendly api.
  • Introduce Config, a unified configuration for FATE, including safety restrictions, system configuration, and algorithm configuration
  • Refactor logger, customizable logging for different use cases and flavors.
  • Introduce Launcher, a simple tool for federated program execution, especially useful for standalone and local debugging
  • Framework: PSI-ECDH protocol support, single entry for histogram statistical computation
  • Deepspeed integration: support distributed training using deepspeed with Eggroll.
  • Protocol: Support for SSHE(mpc and homomophic encryption mixed protocol), ECDH, Secure Aggregation protocols
  • Experimental Integrate Crypten for SMPC support, more protocols and features will be added in the future

Components 2.0: Building Standardized Algorithm Components for different Scheduling Engines

  • Introduce components toolbox to wrap ML modules as standard executable programs
  • spec and loader expose clear API for smooth internal extension and external system integration
  • Provide several cli tools to interact and execute components
  • Input-Output: Further decoupling of FATE-Flow, providing standardized black-box calling processes
  • Component Definition: Support for typing-based definition, automatic checking for component parameters, support for multiple types of data and model input and output, in addition to multiple inputs

ML 2.0: Major functionality migration from FATE-v1.x, decoupling call hierarchy

  • Data preprocessing: Added DataFrame Transformer; Reader, Union and DataSplit migration completed
  • Feature Engineering: Migrated HeteroFederatedBinning, HeteroFeatureSelection, DataStatistics, Sampling, FeatureScale and Pearson Correlation
  • Federated Training Migrated: HeteroSecureBoost, HomoNN, HeteroCoordinatedLogisticRegression, HeteroCoordinatedLinearRegression, SSHE-LogisticRegression and SSHE-LinearRegression
  • Federated Training Added:
    • SSHE-HeteroNN: based on mpc and homomorphic encryption mixed protocal
    • FedPASS-HeteroNN: based on fedpass protocol

Algorithm Performance Improvements (Comparison with FATE-v1.11.*)

  • PSI (Privacy Set Intersection): tested on a dataset of 100 million with an intersection result of 100 million, 1.8+ times of FATE-v1.11.4
  • Hetero-SSHE-LR: tested on data of guest 10w * 30 dimensions and host 10w * 300 dimensions, 4.3+ times of FATE-v1.11.4
  • Hetero-NN(Based on FedPass Protocol): tested on data of guest 10w * 30 dimensions and host 10w * 300 dimensions, basically consistent with the plaintext performance, 143+ times of FATE-v1.11.4
  • Hetero-Coordinated-LR: tested on data of guest 10w * 30 dimensions and host 10w * 300 dimensions, 1.2+ times of FATE-v1.11.4
  • Hetero-Feature-Binning: tested on data of guest 10w * 30 dimensions and host 10w * 300 dimensions, 1.5+ times of FATE-v1.11.4

OSX(Open Site Exchange) 1.0: Building Open Platform for Cross-Site Communication Interconnection

  • Implement the transmission interface in accordance with the “ Technical Specification for Financial Industry Privacy Computing Interconnection Platform”,The transmission interface is compatible with FATE 1.X version and FATE 2.X version
  • Supports GRPC synchronous and streaming transmission, supports TLS secure transmission protocol, and is compatible with FATE1.X rollsite components
  • Supports Http 1.X protocol transmission and TLS secure transmission protocol
  • Support message queue mode transmission, used to replace rabbitmq and pulsar components in FATE 1.X
  • Supports Eggroll and Spark computing engines
  • Supports networking as an Exchange component, with support for FATE 1.X and FATE 2.X access
  • Compared to the rollsite component, it improves the exception handling logic during transmission and provides more accurate log output for quickly locating exceptions.
  • The routing configuration is basically consistent with the original rollsite, reducing the difficulty of porting
  • Supports HTTP interface modification of routing tables and provides simple permission verification
  • Improved network connection management logic, reduced connection leakage risk, and improved transmission efficiency
  • Using different ports to handle access requests both inside and outside the cluster, facilitating the adoption of different security policies for different ports

FATE Flow 2.0: Building Open and Standardized Scheduling Platform for Scheduling Interconnection

collaps
  • Adapted to new scalable and standardized federated DSL IR
  • Built an interconnected scheduling layer framework, supported the BFIA protocol
  • Optimized process scheduling, with scheduling separated and customizable, and added priority scheduling
  • Optimized algorithm component scheduling,support container-level algorithm loading, enhancing support for cross-platform heterogeneous scenarios
  • Optimized multi-version algorithm component registration, supporting registration for mode of components
  • Federated DSL IR extension enhancement: supports multi-party asymmetric scheduling
  • Optimized client authentication logic, supporting permission management for multiple clients
  • Optimized RESTful interface, making parameter fields and types, return fields, and status codes clearer
  • Added OFX(Open Flow Exchange) module: encapsulated scheduling client to allow cross-platform scheduling
  • Supported the new communication engine OSX, while remaining compatible with all engines from FATE Flow 1.x
  • Decoupled the System Layer and the Algorithm Layer, with system configuration moved from the FATE repository to the Flow repository
  • Published FATE Flow package to PyPI and added service-level CLI for service management
  • Migrated major functionality from FATE Flow 1.x

FATE-Client 2.0: Building Scalable Federated DSL for Application Layer Interconnection And Providing Tools For Fast Federated Modeling

collaps
  • Introduce new scalable and standardized federated DSL IR(Intermediate Representation) for federated modeling job
  • Compile python client to DSL IR
  • Federated DSL IR extension enhancement: supports multi-party asymmetric scheduling.
  • Support mutual translation between Standardized Fate-2.0.0 DSL IR and UnionPay's BFIA protocol.
  • Support components with UnionPay's BFIA protocol through adapter mode
  • Flow CLI and PipeLine share configuration

FATE-Test: FATE Automated Testing Tool

collaps
  • Migrated automated testing for functionality, performance, and correctness

FATE-Board 2.0

collaps
  • Refactoring DAG components, adding support for stage status, and displaying dynamic ports.
  • Update the cache structure to optimize issues such as user timeout handling and duplicate storage of configuration information.
  • Optimize some interactive functions.
  • Update the style theme.

Eggroll 3.0

collaps

Enhancements in the JVM Part:

  • Core Component Reconstruction: The cluster-manager and node-manager components have been entirely rebuilt using Java, ensuring uniformity and enhanced performance.
  • Transport Component Modification: The rollsite transport component has been removed and replaced with the more efficient osx component.
  • Improved Process Management: Advanced logic has been implemented to manage processes more effectively, significantly reducing the risk of process leakage.
  • Enhanced Data Storage Logic: Data storage mechanisms have been refined for better performance and reliability.
  • Concurrency Control Improvements: We've upgraded the logic for concurrency control in the original components, leading to performance boosts.
  • Visualization Component: A new visualization component has been added for convenient monitoring of computational information.
  • Refined Logging: The logging system has been enhanced for more precise outputs, aiding in rapid anomaly detection.

Upgrades in the Python Part:

  • Reconstruction of roll_pair and egg_pair: These components now support serialization and partition methods controlled by the caller. Serialization safety is uniformly managed by the caller.
  • Automated Cleanup of Intermediate Tables: The issue of automatic cleaning for intermediate tables between federation and computing has been resolved, eliminating the need for extra operations by the caller.
  • Unified Configuration Control: A flexible configuration system is introduced, supporting direct pass-through, configuration files, and environment variables to cater to diverse requirements.
  • Client-Side PyPI Installation: Eggroll 3.0 supports easy installation via PyPI for clients.
  • Optimized Log Configuration: Callers can now customize log formats according to their needs.
  • Code Structure Refinement: The codebase has been streamlined for clarity, removing a substantial amount of redundant code.

Eggroll 3.0 brings comprehensive enhancements in system performance, usability, and reliability with these significant updates in both the JVM and Python parts.

Easy Deploy

  • Supports installation of FATE by PyPi

Commit Authors

collaps

v1.11.4

5 months ago

By downloading, installing or using the software, you accept and agree to be bound by all of the terms and conditions of the LICENSE and DISCLAIMER.

Major Features and Improvements

FederatedML

  • Unified key length configuration of encryption algorithm, update default key length to 2048.

Bug-Fix

  • Modify hessian computation of softmax cross entropy in SecureBoost, to align with LightGBM.
  • Fix Model initialization error in Homo Neural Network predicting process.

v2.0.0-beta

7 months ago

By downloading, installing or using the software, you accept and agree to be bound by all of the terms and conditions of the LICENSE and DISCLAIMER.

Major Features and Improvements

Arch 2.0:Building Unified and Standardized API for Heterogeneous Computing Engines Interconnection

  • Framework: PSI-ECDH protocol support, single entry for histogram statistical computation
  • Protocol: Support for ECDH, Secure Aggregation protocols
  • Tensor: abstracted PHETensor, smooth switch between various underlying PHE implementations through standard interface
  • DataFrame: New data block manager supports mixed-type columns & feature anonymization; added 30+ operator interfaces for statistics, including comparison, indexing, data binning, and transformation, etc.
  • Enhanced workflow: Support for Cross Validation workflow

Components 2.0: Building Standardized Algorithm Components for different Scheduling Engines

  • Input-Output: Further decoupling of FATE-Flow, providing standardized black-box calling processes
  • Component Definition: Support for typing-based definition, automatic checking for component parameters, support for multiple types of data and model input and output, in addition to multiple inputs

ML 2.0: Major functionality migration from FATE-v1.x, decoupling call hierarchy

  • Data preprocessing: Added DataFrame Transformer, Union and DataSplit migration completed
  • Feature Engineering: Migrated HeteroFederatedBinning, HeteroFeatureSelection, DataStatistics, Sampling, FeatureScale
  1. Federated Training: Migrated HeteroSecureBoost, HomoNN, vertical CoordinatedLogisticRegression, and CoordinatedLinearRegression
  2. Evaluation: Migrated Evaluation

OSX(Open Site Exchange) 1.0: Building Open Platform for Cross-Site Communication Interconnection

  • Improved HTTP/1.X protocol support, support for GRPC-to-HTTP transmission
  • Support for TLS secure transmission protocol
  • Added routing table configuration interface
  • Added routing table connectivity automatic check
  • Improved transmission function in cluster mode
  • Enhanced flow control in cluster mode
  • Support for simple interface authentication

FATE Flow 2.0: Building Open and Standardized Scheduling Platform for Scheduling Interconnection

  • Migrated functions: data upload/download, process scheduling, component output data/model/metric management, multi-storage adaptation for models, authentication, authorization, feature anonymization, multi-computing/storage/communication engine adaptation, and system high availability
  • Optimized process scheduling, with scheduling separated and customizable, and added priority scheduling
  • Optimized algorithm component scheduling, dividing execution steps into preprocessing, running, and post-processing
  • Optimized multi-version algorithm component registration, supporting registration for mode of components
  • Optimized client authentication logic, supporting permission management for multiple clients
  • Optimized RESTful interface, making parameter fields and types, return fields, and status codes clearer
  • Decoupling the system layer from the algorithm layer, with system configuration moved from the FATE repository to the Flow repository
  • Published FATE Flow package to PyPI and added service-level CLI for service management

Fate-Client 2.0: Building Scalable Federated DSL for Application Layer Interconnection And Providing Tools For Fast Federated Modeling.

  • Migrated Flow CLI and Flow SDK
  • Updated federated DSL IR: enhance IR, add DataWarehouse and ModelWarehouse to load data and model from other sources
  • Update component definitions to support Fate-v2.0.0-beta
  • Flow CLI and PipeLine share configuration

Fate-Test: FATE Automated Testing Tool

  • Migrated automated testing for functionality, performance, and correctness

v1.11.3

8 months ago

By downloading, installing or using the software, you accept and agree to be bound by all of the terms and conditions of the LICENSE and DISCLAIMER.

Major Features and Improvements

FederatedML

  • FedAVGTrainer update code strcuture: support OffsitetTuningTrainer
  • FedAVGTrainer update log format: report batch progress instead of batch index

v1.11.2

11 months ago

By downloading, installing or using the software, you accept and agree to be bound by all of the terms and conditions of the LICENSE and DISCLAIMER.

Major Features and Improvements

FederatedML

  • Integrate DeepSpeed, support distributed training of FATE-LLM
  • Separate upgraded FATE-LLM's from FATE into new “FATE-LLM” github repo
  • HomoNN now supports data collator and distributed sampler
  • Hetero SecureBoost supports running multiple boosting rounds in complete secure mode with complete_secure option

Bug-Fix

  • Fix hessian computation of softmax cross entropy in SecureBoostt

v1.11.1

1 year ago

By downloading, installing or using the software, you accept and agree to be bound by all of the terms and conditions of the LICENSE and DISCLAIMER.

Major Features and Improvements

FederatedML

  • Support Homo Graph Neural Network
  • PSI-DH protocol enhancement: use Oakley MODP modulus groups

v1.11.0

1 year ago

By downloading, installing or using the software, you accept and agree to be bound by all of the terms and conditions of the LICENSE and DISCLAIMER.

Major Features and Improvements

FederatedML

  • Support FATE-LLM (Federated Large Language Models)
    • Integration of LLM for federated learning: BERT, ALBERT, RoBERTa, GPT-2, BART, DeBERTa, and DistilBERT. Please note that if using such pretrain-models, compliance with their licenses is needed.
    • Integration of Parameter-efficient tuning methods for federated learning: Bottleneck Adapters (including Houlsby, Pfeiffer, Parallel schemes), Invertible Adapters, LoRA, IA3, and Compacter.
    • Improved Homo Federated Trainer class, allowing CUDA device specification and DataParallel acceleration for multi-GPU devices.
    • TokenizerDataset feature upgrade, better adaptation to HuggingFace Tokenizer.

Bug-Fix

  • Fix inconsistent bin_num display of Hetero Feature Binning for data contains missing value
  • Fix inconsistency in transforming data for transforming selected columns of Hetero Feature Binning When using ModelLoader
  • Fix exclusive_data_type not valid in DataTransform when meta for input data is missing
  • Fix weighted loss calculation and feature importance display issues in Tree-Based models
  • Fix sample id display of NN

v2.0.0-alpha

1 year ago

By downloading, installing or using the software, you accept and agree to be bound by all of the terms and conditions of the LICENSE and DISCLAIMER.

Feature Highlights

Arch 2.0:Building Unified and Standardized API for Heterogeneous Computing Engines Interconnection

  • Introduce Context to manage useful APIs for developers, such as Metrics, Cipher, Tensor and IO.
  • Introduce Tensor data structure to handle local and distributed matrix operation, with built-in heterogeneous acceleration support.
  • Introduce DataFrame, a 2D tabular data structure for data io and simple feature engineering.
  • Refactor logger, customizable logging for different use cases and flavors.
  • Introduce new high-level federation API suite: context.<role>.get(name)/context.<role>.put(name=value).

Components 2.0: Building Standardized Algorithm Components for different Scheduling Engines

  • Introduce components toolbox to wrap ML modules as standard executable programs.
  • spec and loader expose clear API for smooth internal extension and external system integration.
  • Provide several cli tools to interact and execute components.
  • Implement base demos components: reader, intersection, feature scale, lr and evaluation.

ML 2.0(demo)

  • Provide base demos for federated machine learning algorithm: intersection、feature scale、lr and evaluation.

Pipeline 2.0: Building Scalable Federated DSL for Application Layer Interconnection

  • Introduce new scalable and standardized federated DSL IR(Intermediate Representation) for federated modeling job
  • Compile python client to DSL IR
  • Support multiple scalable execution backends, including standalone and Fate-Flow.

OSX(Open Site Exchange) 1.0: Building Open Platform for Cross-Site Communication Interconnection

  • Standardized Cross-Site lower-level federation api
  • Support grpc synchronous transmission and streaming transmission; Compatible with eggroll interface and can replace FATE-1.x rollsite component
  • Support asynchronous message transmission, which can replace rabbitmq and pulsar components in FATE-1.x
  • Support HTTP-1.X protocol transmission
  • Support cluster deployment and inter-site traffic control
  • Support networking as an Exchange component

FATE Flow 2.0: Building Open and Standardized Scheduling Platform for Scheduling Interconnection

  • Adapted to new scalable and standardized federated DSL IR
  • Standardized API interface with param type checking
  • Decoupling Flow from FATE repository
  • Optimized scheduling logic, with configurable dispatcher decoupled from initiator
  • Support container-level algorithm loading and task scheduling, enhancing support for cross-platform heterogeneous scenarios
  • Independent maintenance for system configuration to enhance flexibility and ease of configuration
  • Support new communication engine OSX, while compatible with all engines from Flow 1.X
  • Introduce OFX(Open Flow Exchange) module: encapsulated scheduling client to allow cross-platform scheduling

Deploy

  • Support installing from PyPI

v1.10.0

1 year ago

By downloading, installing or using the software, you accept and agree to be bound by all of the terms and conditions of the LICENSE and DISCLAIMER.

Major Features and Improvements

FederatedML

  • Renewed Homo NN: PyTorch-based, support flexible model building:
    • Support user access to complex self-defined PyTorch models or ready-to-use PyTorch models such as DeepFM, ResNet, BERT, Yolo
    • Support various data set types, may build data set based on PyTorch Dataset
    • User-defined training loss
    • User-defined training process: user-defined aggregation algorithm for client and server
    • Provide API for developing Aggregator
  • Upgraded Hetero NN: support flexible model building and various data set types:
    • more flexible pytorch top/bottom model customization; provide access to industry approved PyTorch models
    • User-defined training loss
    • Support various data set types, may build data set based on PyTorch Dataset
  • Renewed Homo-federated framework with support for all current homo models, including Homo NN, Homo LR,Homo SecureBoost, Homo Feature Binning, and Hetero KMeans. This provides smoother algorithm customization and development experience
  • Semi-Supervised Algorithm Positive Unlabeled Learning
  • Hetero LR & Hetero SecureBoost now supports Intel IPCL
  • Intersection support Multi-host Elliptic-curve-based PSI
  • Intersection may compute Multi-host Secure PSI Cardinality
  • Hetero Feature Optimal Binning now record & show Gini/KS/Chi-Square metrics
  • Host may load Hetero Binning model with WOE score through Model Loader
  • Hetero Feature Binning support binning by user-provided split points
  • Sampler support weighted sampling by instance weight

Fate-Client

  • Flow CLI adds min-test options
  • Pipeline adds data-bind API, useful for local development
  • Pipeline may reconfigure role/model_id/model_version, switching party_id for prediction task