Alibaba DeepRec Versions Save

DeepRec is a high-performance recommendation deep learning framework based on TensorFlow. It is hosted in incubation in LF AI & Data Foundation.

r1.15.5-deeprec2206

1 year ago

Major Features and Improvements

Embedding

  • Multi-tier of EmbeddingVariable, add SSD_HashKV which is better performance than LevelDB.
  • Support GPU EmbeddingVariable which gather/apply ops place on GPU.
  • Add user API to record frequence and version for EmbeddingVariable.

Graph Optimization

  • Add Embedding Fusion ops for CPU/GPU.
  • Optimize SmartStage performance on GPU.

Runtime Optimization

  • Executor, support cost-based and critical path ops first.
  • GPUAllocator, support CUDA malloc async allocator. (need to use >= CUDA 11.2)
  • CPUAllocator, automatically memory allocation policy generation.
  • PMEMAllocator, optimize allocator and add statistic.

Ops & Hardware Acceleration

  • Implement SparseReshape, SparseApplyAdam, SparseApplyAdagrad, SparseApplyFtrl, ApplyAdamAsync, SparseApplyAdamAsync, KvSparseApplyAdamAsync GPU kernels.
  • Optimize UnSortedSegment on CPU.
  • Upgrade OneDNN to v2.6.

IO & Dataset

  • ParquetDataset, add parquet dataset which could reduce storage and improve performance.

Model Save/Restore

  • Asynchronous restore EmbeddingVariable from checkpoint.

Serving

  • SessionGroup, highly improve QPS and RT in inference.

ModelZoo

  • Add models SimpleMultiTask, ESSM, DBMTL, MMoE, BST.

Profiler

  • Support for mapping of operators and real thread ids in timeline.

BugFix

  • Fix EmbeddingVariable core when EmbeddingVariable only has primary embedding value.
  • Fix abnormal behavior in L2-norm calculation.
  • Fix save checkpoint issue when use LevelDB in EmbeddingVariable.
  • Fix delete old checkpoint failure when use incremental checkpoint.
  • Fix build failure with CUDA 11.6.

More details of features: https://deeprec.readthedocs.io/zh/latest/

Release Images

CPU Image

alideeprec/deeprec-release:deeprec2206-cpu-py36-ubuntu18.04

GPU Image

alideeprec/deeprec-release:deeprec2206-gpu-py36-cu110-ubuntu18.04

r1.15.5-deeprec2204u1

2 years ago

Major Features and Improvements

BugFix

Release Images

CPU Image

alideeprec/deeprec-release:deeprec2204u1-cpu-py36-ubuntu18.04

GPU Image

alideeprec/deeprec-release:deeprec2204u1-gpu-py36-cu110-ubuntu18.04

r1.15.5-deeprec2204

2 years ago

Major Features and Improvements

Embedding

  • Support hybrid storage of EmbeddingVariable (DRAM, PMEM, LevelDB)
  • Support memory-continuous storage of multi-slot EmbeddingVariable.
  • Optimize beta1_power and beta2_power slots of EmbeddingVariable.
  • Support restore frequency of features in EmbeddingVariable.

Distributed Training

  • Integrate SOK in DeepRec.

Graph Optimization

  • Auto Graph Fusion, support float32/int32/int64 type for select fusion.
  • SmartStage, fix graph contains circle bug when enable SmartStage optimization.

Runtime Optimization

  • GPUTensorPoolAllocator, which reduce GPU memory usage and improve performance.
  • PMEMAllocator, support allocation in persistent memory.

Optimizer

  • Optimize AdamOptimizer performance.

Op & Hardware Acceleration

  • Change fused MatMul layout type and number thread for small size inputs.

IO & Dataset

  • KafkaGroupIODataset, support consumer rebalance.

Model Save/Restore

  • Support dump incremental graph info.

Serving

  • Add serving module (ODL processor), which support Online Deep Learning (ODL).

More details of features: https://deeprec.readthedocs.io/zh/latest/

Release Images

CPU Image

registry.cn-shanghai.aliyuncs.com/pai-dlc-share/deeprec-training:deeprec2204-cpu-py36-ubuntu18.04

GPU Image

registry.cn-shanghai.aliyuncs.com/pai-dlc-share/deeprec-training:deeprec2204-gpu-py36-cu110-ubuntu18.04

Known Issue

Some user report issue when use Embedding Variable, such as https://github.com/alibaba/DeepRec/issues/167. The bug is fixed in r1.15.5-deeprec2204u1.

r1.15.5-deeprec2201

2 years ago

This is the first release of DeepRec. DeepRec has super large-scale distributed training capability, supporting model training of trillion samples and 100 billion Embedding Processing. For sparse model scenarios, in-depth performance optimization has been conducted across CPU and GPU platform.

Major Features and Improvements

Embedding

  • Embedding Variable (including feature eviction and feature filter)
  • Dynamic Dimension Embedding Variable
  • Adaptive Embedding
  • Multi-Hash Variable

Distributed Training

  • GRPC++
  • StarServer

Graph Optimization

  • Auto Micro Batch
  • Auto Graph Fusion
  • Embedding Fusion
  • Smart Stage

Runtime Optimization

  • CPU Memory Optimization
  • GPU Memory Optimization
  • GPU Virtual Memory

Optimizer

  • AdamAsync Optimizer
  • AdagradDecay Optimizer

Op & Hardware Acceleration

  • Unique, Gather, DynamicStitch, BiasAdd, Select, Transpose, SparseSegmentReduction, where, DynamicPartition, SparseConcat tens of ops' CPU/GPU optimization.
  • support oneDNN-2.3.2 & bf16
  • Support TF32

IO & Dataset

  • WorkQueue
  • KafkaDataset

More details of features: https://deeprec.readthedocs.io/zh/latest/

Release Images

CPU Image

registry.cn-shanghai.aliyuncs.com/pai-dlc-share/deeprec-training:deeprec2201-cpu-py36-ubuntu18.04

GPU Image

registry.cn-shanghai.aliyuncs.com/pai-dlc-share/deeprec-training:deeprec2201-gpu-py36-cu110-ubuntu18.04