PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
我们很高兴的发布飞桨框架2.2.2版本,主要是对2.2.1中一些功能和性能问题的修复,并对部分功能点做了增强。
paddle.nn.Mish
和 paddle.nn.functional.mish
,支持逐元素计算mish激活函数。 (#38803)paddle.nn.PReLU
、 paddle.nn.functional.prelu
、paddle.nn.static.prelu
新增支持 data_format
参数,可以设置输入的数据类型。 (#38495)paddle.index_select
新增支持 float16
数据类型。(#38751)paddle.multiplex
当inputs
中张量 size
为 0 时的报错信息。(#38757)paddle.fluid.contrib.slim.quantization.PostTrainingQuantization
新增初始化参数data_loader
,支持传入 paddle.io.DataLoader
对象或者Python Generator
。(#38729)paddle.max
在输入x.ndim > 6 and axis < 0
时运行出错的问题。(#38070)paddle.max
、paddle.min
的bug:在CPU设备上,当参数axis是list类型且len(axis) == x.ndim and axis[i] < 0
时,结果出错。(#38478)paddle.nn.functional.unfold
在InferShape计算时不区分compile time和runtime的问题。(#38925)paddle.nn.functional.cross_entropy
在对labels
进行检查时,存在不必要的GPU与CPU同步的问题。(#38849)paddle.distributed.split
在沿列切分FC时,反向计算时得到的输入梯度结果异常的问题。(#38724)paddle.nn.Layer.to
不支持 paddle.dtype
类型的问题。(#38108)paddle.linalg.svd
当 full_matrics=True
时,输出tensor的shape在动态图和静态图下不同的问题。(#37744)Tensor
切片索引使用多个None
类型索引时结果维度异常的问题。(#37400)Tensor
索引赋值在部分场景下显存泄露的问题。(#38098)save_inference_model
导出后,添加反向 pass 做训练,conv2d
缺失属性报错的问题。 (#38832)动态图转静态图
模型量化
自定义OP
动态图Inplace策略
NHWC 策略
算子修复
框架功能修复
TensorRT 子图引擎修复
MKLDNN引擎修复
This version fixed some function and performance issues of PaddlePaddle 2.2.1 and optimized some functions.
paddle.nn.Mish
and paddle.nn.functional.mish
which support the element-by-element calculation of the mish activation function. (#38803)paddle.nn.PReLU
, paddle.nn.functional.prelu
, and paddle.nn.static.prelu
newly support the data_format
parameter. You can set input data type. (#38495)paddle.index_select
supports float16
data type. (#38751)paddle.multiplex
when tensor size
in inputs
is 0. (#38757)data_loader
for paddle.fluid.contrib.slim.quantization.PostTrainingQuantization
, and support input of the paddle.io.DataLoader
object or Python Generator. (#38729)paddle.max
in input of x.ndim > 6 and axis < 0
. (#38070)paddle.max
and paddle.min
: Result is incorrect on the CPU device when the parameter axis is the list type and len(axis) == x.ndim and axis[i] < 0
. (#38478)paddle.nn.functional.unfold
does not distinguish between compile time and runtime in InferShape calculation. (#38925) (#38834)paddle.nn.functional.cross_entropy
checks labels
. (#38849)paddle.distributed.split
slices the FC along columns. (#38724)paddle.nn.Layer.to
does not support paddle.dtype
type. (#38108)full_matrics=True
in paddle.linalg.svd
under static graphs. (#37744)Tensor
slice index uses multiple None type indexes. (#37400)Tensor
index assignment in some scenarios. (#38098)conv2d
reporting an error with missing attributes after model is exported using save_inference_model
and backward pass is added for training. (#38832)Dynamic Graph to Static Graph
paddle
will be used as a variable when dynamic to static code is transcribed. (#37999)for … zip …
statement in dynamic to static graph. (#37846)Model quantization
clip_extra
settings of quantitative export models. (#38343)flatten_contiguous_range
quantization settings for flatten_contiguous_range
operator output configuration error in quantization. (#37741)Custom OP
Dynamic graph inplace strategy
NHWC strategy
Operator fixing
Framework function fixing
TensorRT subgraph engine fixing
MKLDNN engine fixing
我们很高兴的发布飞桨框架2.2.1版本,主要是对2.2.0中一些功能和性能问题的修复,并对部分功能点做了增强,重点如下:
paddle.linalg.triangular_solve
,用于计算带有三角系数矩阵的线性方程组。paddle.device.cuda.graphs.CUDAGraph
API,支持NVIDIA的CUDA Graph功能,注意目前该API还处于实验阶段,尚未稳定。paddle.linalg.triangular_solve
API,用于计算带有三角系数矩阵的线性方程组。(#36714)paddle.device.cuda.graphs.CUDAGraph
API,支持NVIDIA的CUDA Graph功能,可以将GPU计算全部捕捉到一张CUDA Graph中,往后多次调用,可以去除框架的额外开销,提升运行性能。注意目前该API还处于实验阶段,尚未稳定。(#37109)paddle.incubate.graph_send_recv
API,主要应用于图学习领域,目的是为了减少在消息传递过程中带来的中间变量显存或内存的损耗,包含 SUM、MEAN、MIN、MAX 共四种更新模式。(#37205)paddle.incubate.operators.ResNetUnit
API,用于 ResNet 网络里的卷积、批归一化、shortcut/bottleneck操作融合。(#37109)paddle.incubate.FusedTransformerEncoderLayer
,添加 src_mask=None
的支持,添加pure fp16的支持。 (#37229)@paddle.jit.to_static
装饰单独的 function 时,提供 train()、eval()
函数支持切换到 train、eval
模式。(#37383)paddle.scatter
的 index
越界导致 core dump 的问题,加强了越界检查,并完善对应的报错信息。(#37431)paddle.top_k
,根据 k
的大小和 input_width
大小进行选择不同的实现方案,当 k>=75% input_width 时选择 cub 实现,否则选择手写 kernel 实现。(#37325)paddle.fluid.optimizer.LarsMomentumOptimizer
,通过 optimizer 算子融合 + CUDA Cooperative Groups的方式提高OP性能。(#37109)paddle.nn.ELU
与 paddle.nn.functional.elu
的计算公式,解决 alpha<0 时结果错误的问题;paddle.nn.functional.elu_
不支持 alpha<0 的场景,在 alpha<0 时会报错。(#37437)paddle.slice
反向执行时出现 out_of_range
的问题。(#37584)paddle.shape
没有反向,显式设置 stop_gradient
为 True
。(#37412)paddle.arange
没有反向,显式设置 stop_gradient
为 True
。(#37486)paddle.shard_index
在输入数据的最后一维不为1时进行报错提示。(#37421)paddle.matmul
使用int8量化,反量化时维度错误的问题。(#36982)paddle.nn.Dropout
在 eval
模式下不计算梯度的问题。(#37305)paddle.nn.functional.dropout
在静态图下输入 Tenor
形状中有 -1 并指定 drop 该维时报错的问题。(#37223)paddle.nn.LSTM
,paddle.nn.GRU
, paddle.nn.SimpleRNN
在CPU训练时多层RNN(dropout设置为0)反向计算出错的问题。(#37086)paddle.incubate.FusedTransformerEncoderLayer
反向计算梯度错误、pre_layer_norm 处理不正确、参数处理不正确,漏传参数、 add_bias 计算错误等问题。 (#37229)paddle.incubate.fused_multi_head_attention
不支持 bias
为None
的问题。(#37411, #37566)paddle.vision.datasets.Cifar10
, paddle.vision.datasets.Cifar100
加载数据没有顺序的问题。 (#37528)Tensor
在使用省略号(...)索引时维度检测异常报错的问题。(#37192)Tensor
索引赋值(setitem
)梯度属性无法传播的问题,详见issue。(#37028)fleet.load_model
: 修复参数服务器模式下模型加载API不可用问题。(#37461)fleet.save_inference_model
: 修复参数服务器模式下模型保存 dense 参数前,未从 server 端拉取参数的问题。(#37461)This version fixed some function and performance issues of PaddlePaddle 2.2.0, and optimized some functions. The highlights are as follows:
paddle.linalg.triangular_solve
to calculate linear equations with triangular coefficient matrices.paddle.device.cuda.graphs.CUDAGraph
API that supports the CUDA Graph function of NVIDIA. Note that this API is still experimental and not yet stable.paddle.linalg.triangular_solve
API to calculate linear equations with triangular coefficient matrices. (#36714)paddle.device.cuda.graphs.CUDAGraph
API that supports the CUDA Graph function of NVIDIA by capturing all GPU calculations into a single CUDA Graph and calling them for later use, which not only cuts the extra overhead but also improves the runtime performance. Note that the API is still experimental and not yet stable. (#37109)paddle.incubate.graph_send_recv
API for graph learning to reduce the loss of intermediate variables in memory or video memory during message passing. It contains four update modes, namely, SUM, MEAN, MIN, and MAX. (#37205)paddle.incubate.operators.ResNetUnit
API to integrate the convolution, batch normalization, and shortcut/bottleneck operation in the ResNet network. (#37109)paddle.incubate.FusedTransformerEncoderLayer
adds src_mask=None
and supports pure fp16.(#37229)@paddle.jit.to_static
to decorate single function, train()、eval()
functions are provided to support the switch to train、eval
mode. (#37383)index
of ``paddle.scatter` that causes core dump, and improve the corresponding error reporting message. (#37431)paddle.top_k
by enabling it to choose different implementations according to the size of k
and input_width
: cub implementation when k>=75% input_width, otherwise the handwritten kernel implementation.(#37325)paddle.fluid.optimizer.LarsMomentumOptimizer
to improve OP performance by integrating optimizer operator and CUDA Cooperative Groups. (#37109)paddle.nn.ELU
and paddle.nn.functional.elu
when alpha<0;please note the inplace version:paddle.nn.functional.elu_
will raise error when alpha<0. ([#37437]out_of_range
when the paddle.slice
is reversely executed. (#37584)paddle.shape
doesn't support backward, explicitly set stop_gradient
to True
. (#37412)paddle.arange
doesn't support backward, explicitly set stop_gradient
to True
.(#37486)paddle.shard_index
reports an error if the last dimension of the input data is not 1. (#37421)paddle.matmul
adopts int8 quantization. (#36982)paddle.nn.Dropout
, under eval
, does not calculate the gradient. (#37305)paddle.nn.functional.dropout
, in static graph mode, reports an error when -1 is included in the input shape of Tensor
and it is specified to drop this dimension. (#37223)paddle.nn.LSTM
,paddle.nn.GRU
, paddle.nn.SimpleRNN
. (#37086)paddle.incubate.FusedTransformerEncoderLayer
backward calculation, incorrect processing of pre_layer_norm, incorrect parameter processing, missing parameters, calculation errors of add_bias, etc. (#37229)paddle.incubate.fused_multi_head_attention
does not support bias
as None
.(#37411, #37566)paddle.vision.datasets.Cifar10
, paddle.vision.datasets.Cifar100
. (#37528)Tensor
reports an exception error of dimension detection when using ellipsis(...) indexing. (#37192)Tensor
cannot be spread during indexing and assignment (setitem
), see issue for details. (#37028)fleet.load_model
: Fix the unavailable API loaded by the model in parameter server mode.(#37461)fleet.save_inference_model
: Fix the issue that the model does not pull parameters from the server side before saving dense parameters in parameter server mode. (#37461)本版本主要是对2.1.1中一些功能和性能问题的修复,重点如下:
paddle.vision
路径下部分API无法访问的问题。(#34489)paddle.concat
在应用到多个大shape
的Tensor时溢出的问题。(#34396)paddle.flip
支持输入axis为整型,并提升了动态图模式下的性能。(#34477)paddle.slice
输入输出地址相同时越界访问问题。(#34265)paddle.nn.Unfold
的输入参数顺序错误的问题。(#34251)Tensor
的若干接口,如 size()、detach()
等。 (#33330)Tensor.grad
的 Warning内容中增加了不兼容升级的说明。(#34262)paddle.save
保存 Layer
的功能。(#34039)paddle.jit.save
在Mac系统上保存的模型,在Linux平台上无法对模型进行重训练的问题。(#34154)layer_norm
在大 size
输入时 cuda kernel
参数错误的问题。(#33893)paddle.io.DataLoader
误报不兼容升级warning问题。(#34001)paddle.io.DataLoader
内存泄漏问题。(#34301)Sequential
容器类嵌套使用时的语法支持。(#34246)Python3 type hint
语法的兼容支持。(#33745)@to_static
中 input_spec
参数新增支持非 Tensor
类型,如 int、float、string、bool
等。(#33464)batch_size > 1
时ERNIE模型计算结果错误的问题。(#33784)TensortRT
推理路径用右斜杠分割导致的崩溃。(#33885)elementwise
系列OP的X不支持广播的问题。(#33845)cmake
文件,统一更新昆仑的算子库。(#34000)This release mainly fixes some features and performance issues in 2.1.1. See the following highlights:
paddle.vision
are not accessible issues. (#34489)paddle.concat
overflow when applied to multiple Tensor with large shape
. (#34396)paddle.flip
supports input axis as integer, and improves performance in dynamic graph mode. (#34477)paddle.slice
out-of-bounds access problem when input and output addresses are the same. (#34265)paddle.nn.Unfold
. (#34251)Tensor
under static graphs such as size(), detach()
, etc. (#33330)Tensor.grad
.(#34262)paddle.save
to save the function of Layer
. (#34039)paddle.jit.save
for saving models on Mac systems that cannot be retrained on Linux platforms. (#34154)layer_norm
with wrong cuda kernel
parameters for large size
input. (#33893)paddle.io.DataLoader
error reporting incompatible upgrade warning issue. (#34001)paddle.io.DataLoader
memory leak problem. (#34301)Sequential
container classes. (#34246)Python3 type hint
syntax. (#33745)Tensor
types including int, float, string, bool
in the input_spec
argument of @to_static
. (#33464)batch_size > 1
. (#33784)TensortRT
inference path with right slash under windows.(#33885)elementwise
series OP's X does not support broadcast .(#33845)gast>=0.3.3, <=0.4.0
). (#33850)Avx/No-Avx
related installation error messages, reduce redundant Warning messages. (#33885)cmake
file of Kunlun to unify and update its operator library.(#34000)This release contains contributions from:
0x45f、Aurelius84、Chen Weihang、chentianyu03、HexToString、iducn、Jacek Czaja、Kaipeng Deng、Leo Chen、lzzyzlbb、Peihan、taixiurong、tianshuo78520a、WeiXin、wenbin、Wilber、wuhuachaocoding、xiongkun、Zhou Wei、 winter-wang .
本版本主要是对2.1.0中一些功能和性能问题的修复,并对部分功能点做了增强,重点如下:
paddle.distributed、paddle.device、paddle.vision
目录API的可见性优化。paddle.nn.Sequential
容器内 sublayer 的用户代码的动静转换。SyncBatchNorm
对AMP的支持,提升动态图 SyncBatchNorm
层在AMP模式的性能。paddle.distributed、paddle.device、paddle.vision
等层级新增推荐使用方式,推荐使用方式的具体说明请见下文2.1.0 Release Note。(#33420)paddle.is_compiled_with_rocm
。(#33228)paddle.strided_slice
bool type输入的支持。(#33373)paddle.equal_all、paddle.equal、paddle.greater_equal、paddle.greater_than、paddle.less_equal、paddle.less_than、paddle.not_equal
bool type输入的支持。 (#33551)paddle.utils.download
在ConnectionError异常时不进行Retry逻辑。(#33454)paddle.gather
在axis不等于0下,infershape错误的问题。(#33553)paddle.io.DataLoader
在 num_workers=0
且 Dataset
生成GPU Tensor
送入DataLoader
时导致的段错误。(#33487, #33249)slice
操作结果作为左值使用inplace操作时,反向运行报错提示与错误无关的问题。(#32981)paddle.concat
动态图支持 uint8 出错的问题。(#33667)paddle.grid_sample
显存溢出和输出结果异常的问题。(#33100、#33232)roi_align
中align=True模式下输入为0时的问题。(#33446)log_softmax
会把输入改为nan的问题。(#32937)paddle.nn.Sequential
容器内 sublayer 的用户代码的动静转换。(#33065)param_guard
逻辑代码,全面解决动静态图 Tensor
类型互转问题。(#32985)paddle.distributed.spawn
在使用默认 nprocs
参数时出错的问题。(#33249)Program
的问题。(#33511)TensorParallel
的精度问题。改变 TensorParallel
的参数初始化方式,保证参数切分后的随机性。(#33087)PipeLineParallel
的精度问题。解决 PipeLineParallel
的 microbatch
使用不正确的问题。(#33097)new_group
API创建多个通信组,会hang的问题。(#33553)SyncBatchNorm
对AMP的支持,提升动态图 SyncBatchNorm
层在AMP模式的性能,在PaddleSeg的DeepLabV3P
模型上8卡AMP模式加速比提升19%。(#33709)GLIBCXX_USE_CXX11_ABI=1
以解决GCC版本过低导致编译时可能报错的问题。(#33185)-std=c++14
编译选项。 (#33227)LoDTensorArray
作为Op输入时,训练会随机出段错误的问题。(#32984)paddle.ParamAttr
的 regularizer 和 paddle.optimizer.Momentum
的 weight_decay
同时被指定为 L2Decay
时,参数正则化被执行2次的问题。(#32881)layer_norm
动态shape plugin,提升模型动态shape推理性能。(#33448)fused_fc_elementwise_layernorm
在海光DCU下的线程数过大导致的计算结果错误问题。 (#33299)multihead_matmul
修复当seq_len > 1024的计算错误。(#33365)paddle.static.io.normalize_program
没有导出 paddle.static.normalize_program
的问题。(#33408)conv2d_transpose op converter
维度错误设置问题。(#33242)layer_norm
计算精度,修复大数据输入时输出Nan的问题。(#33420)gather
op,新增支持 logsumexp
。 (#32931)This version fixed some function and performance issues of PaddlePaddle 2.1.0, and optimized some function. The important updates are as following:
paddle.distributed、paddle.device、paddle.vision
.paddle.nn.Sequential
.SyncBatchNorm
support for AMP in dynamic graph, to improve the performance of dynamic graph SyncBatchNorm
layer in AMP mode,paddle.distributed、paddle.device、paddle.vision
, for more information, please see 2.1.0 Release Note. (#33420)paddle.is_compiled_with_rocm
. (#33228)paddle.strided_slice
to support bool type.(#33373)paddle.equal_all、paddle.equal、paddle.greater_equal、paddle.greater_than、paddle.less_equal、paddle.less_than、paddle.not_equal
to support bool type. (#33551)paddle.utils.download
does not perform Retry when ConnectionError is abnormal.(#33454)paddle.gather
axis is not equal to 0.(#33553)paddle.io.DataLoader
when num_workers=0
and Dataset
returns GPU Tensor
and sends it to DataLoader
.(#33487, #33249)slice
result as an lvalue of inplace operation, the error message of backward is not related to the error. (#32981)paddle.concat
support uint8 in dynamic graph.(#33667)paddle.grid_sample
GPU memory overflow and abnormal output. (#33100、#33232)paddle.nn.Sequential
.(#33065)param_guard
logic code to comprehensively solve the dynamic to static graph Tensor
type conversion problem.(#32985)paddle.distributed.spawn
when using the default nprocs
argument.(#33249)Program
directly.(#33511)TensorParallel
. Change the parameter initialization method of TensorParallel
to ensure the randomness of the parameter after slicing.(#33087)PipeLineParallel
. Fix the incorrect use of microbatch
for PipeLineParallel
.(#33097)new_group
API will hang when creating multiple communication groups.(#33553)SyncBatchNorm
support for AMP in Dynamic graph, to improve the performance of dynamic graph SyncBatchNorm
layer in AMP mode, and improve the 8-card AMP mode speedup ratio by 19% on DeepLabV3P
model of [PaddleSeg].(#33709)PADDLE_WITH_MKLDNN
macro for custom OP compilation.(#32903)GLIBCXX_USE_CXX11_ABI=1
to resolve the issue of low GCC version that may cause compile-time errors.(#33185)-std=c++14
compile option by default. (#33227)LoDTensorArray
is input of Op under multi-threading.(#32984)paddle.ParamAttr
and the weight_decay
of paddle.optimize
are specified as L2Decay
.(#32881)layer_norm
does not save out_threahold
attribute when quantized model is saved.(#33610)gather_nd
和 reduce_sum
in Paddle-TRT.(#33365)reshape
in Paddle-TRT.(#33372) layer_norm
to improve model dynamic shape inference performance.(#33448)fused_fc_elementwise_layernorm
caused by too large number of threads under DCU. (#33299)multihead_matmul plugin
.(#33365)paddle.static.io.normalize_program
failed to export paddle.static.normalize_program
.(#33408)conv2d_transpose op converter
dimension error setting. Now the model of conv2d_transpose
op can work normally on TRT.(#33242)layer_norm
and fix the problem of outputting Nan when input is large data. (#33420)gather
op, add support of logsumexp op. (#32931)This release contains contributions from: Aurelius84, cc, ceci3, Chen Weihang, danleifeng, feng_shuai, houj04, jiangcheng, JZ-LIANG, Kaipeng Deng, lidanqing, LielinJiang, Lijunhui, lilong12, liuyuhui, liym27, Pei Yang, Peihan, Qi Li, Ren Wei (任卫), Roc, Shang Zhizhou, ShenLiang, Shibo Tao, TeslaZhao, tianshuo78520a, TTerror, wangguanzhong, Wangzheee, wawltor, WeiXin, wenbin, Wenyu, whs, Wilber, wuhuanzhou, Zhang Ting, zhiboniu, Zhou Wei, zhoujun, 李季, 王明冬
本版本主要是对2.0.1中一些功能和性能问题的修复,并对部分功能点做了增强,重点如下:
paddle.nn.functional.cross_entropy
新增了 use_softmax
参数,控制是否在计算交叉熵前先进行softmax运算;并给paddle.nn.functional.softmax_with_cross_entropy
添加了 deprecated 标志,该API将在未来的版本中废弃。paddle.io.random_split
与 paddle.io.Subset
。(#32090)paddle.nn.MaxPool3D
和 paddle.nn.AvgPool3D
的stride
和 padding
没有默认值的问题。(#32014)paddle.nn.functional.cross_entropy
的 soft_label
为 True,并指定 weight
参数时报错的问题;新增参数 use_softmax
,用于控制是否在计算交叉熵前先进行softmax运算;同时,给 paddle.nn.functional.softmax_with_cross_entropy
添加 deprecated 说明,该API将会在未来的版本中废弃。(#31953、#32105、#32035)paddle.nn.ClipByNorm
在梯度全部为零时产生NaN数值的问题,该问题会导致使用混合精度训练时不收敛。(#32038)paddle.stack
内存越界访问的问题。(#32005)exe.train_from_dataset
输出格式。(#32009)This version fixed some function and performance issues of PaddlePaddle 2.0.1, and optimized some function. The important updates are as following:
use_softmax
parameter to paddle.nn.functional.cross_entropy
, which controls whether to perform softmax operation before calculating the cross entropy; add the deprecated mark to paddle.nn.functional.softmax_with_cross_entropy
, for this API will be deprecated in the future version.paddle.io.random_split
and paddle.io.Subset
. (#32090)stride
and padding
of paddle.nn.MaxPool3D
and paddle.nn.AvgPool3D
do not have default values. (#32014)soft_label
of paddle.nn.functional.cross_entropy
is True, and the weight
parameter is specified, an error will be reported; add the use_softmax
parameter to paddle.nn.functional.cross_entropy
, which controls whether to perform softmax operation before calculating the cross entropy; add the deprecated mark to paddle.nn.functional.softmax_with_cross_entropy
, for this API will be deprecated in the future version. (#31953, #32105, #32035)paddle.nn.ClipByNorm
generating NaN values as the gradients are all zero, which will lead to non-convergence when using mixed precision training. (#32038)paddle.stack
. (#32005)exe.train_from_dataset
.(#32009)本版本主要对2.0.0中一些功能和性能问题的修复,并对部分功能点做了增强,重点如下:
paddle.save/paddle.static.save
支持用户选择pickle版本,在Python 3下提升模型保存效率。roi_align
新增 aligned
参数,generate_proposals、distribute_fpn_proposals
中新增 pixel_offset
参数。paddle.nn.functional.cross_entropy
支持昆仑设备下的float类型label。paddle.nn.functional.softmax_with_cross_entropy
新增label错误检查和报错信息优化。paddle.nn.LayerList
支持 paddle.nn.LayerList([None])
。tuple
作为循环变量的支持。Tensor
索引变量诸如x[:],x[2:]
, 这种不定起始或终点的支持。Tensor
slice左值功能,动态图使用slice后可正确动静转换。支持通过索引或切片修改 Tensor
数据:支持索引类型是 Python.int
、Tensor
、 Python.slice
;支持步长是1、大于1或者是负数;支持赋值数据类型是 Numpy.array
、 Tensor
。paddle.nn.LayerNorm
,减少cast
的次数,提升训练效率。paddle.distributed.fleet.DistributedStrategy
amp 添加pure fp16策略。paddle.distributed.ProbabilityEntry
和 paddle.distributed.CountFilterEntry
用于稀疏参数训练。paddle.save
、paddle.static.save
支持用户选择pickle版本,默认pickle 版本为2。Python 3下,选择Pickle 4+版本,可以提升保存速度,突破单文件4G大小限制,但注意此时保存的模型也需要在Python3加载使用。paddle.static.normalize_program
。paddle.abs
算子新增支持Complex64和 Complex128类型功能。paddle.optimizer.AdamW
的multi_precision功能,确保正则处理作用对象为FP32类型的master weights,防止收敛异常。Tensor.backward()
进行梯度累加时,动态图多卡启动时的梯度计算错误。paddle.nn.functional.softmax_with_cross_entropy
在处理元素个数超过2^31的 Tensor
时,存在的整数溢出问题。paddle.nn.Sequential
进行for遍历会发生溢出崩溃的Bug。paddle.nn.functional.local_response_norm
在静态图及动转静中,无法使用batch_size=-1的问题。paddle.nn.LayerNorm
在float64时计算错误。metric_learning finetune
报错的问题。This version fixed some function and performance issues of PaddlePaddle 2.0.0, and optimized some function. The important updates are as following:
paddle.save/paddle.static.save
supports users to choose the pickle version, which can improve the efficiency of saving models under Python 3.aligned
in roi_align
, and pixel_offset
in generate_proposals、distribute_fpn_proposals
to improve performance.paddle.nn.functional.cross_entropy
supports float type label in XPU accelerator.paddle.nn.functional.softmax_with_cross_entropy
.paddle.nn.LayerList
supports paddle.nn.LayerList([None])
.tuple
as loop variable in for-loop.Tensor
support to be indexed by unspecific start and stop variables, such as x[:], x[2:]
.Tensor
supports slicing with lvalue in static graph. In dynamic graph, Tensor
uses slicing can correctly turn into static graph. Tensor
can be modified by indexing or slicing. Python.Int
、Tensor
、Python.slice
can be used for indexing. The stride could be 1, greater than 1 or negative. NumPy.array
, Tensor
types could be used as rvalue.paddle.nn.LayerNorm
, improving efficiency by reducing the number of cast
.paddle.distributed.fleet.DistributedStrategy
amp adds pure fp16 strategy.paddle.distributed.ProbabilityEntry
and paddle.distributed.CountFilterEntry
are added for sparse parameters training.count/unseen_day
could be saved into model.paddle.save
and paddle.static.save
allow users to select the pickle version, and the default version is 2. For Python 3, users can choose Pickle version 4+. In this way, saving speed could be increased and single file could over 4G. But, please notice that models saved this way must be loaded and used under Python 3.paddle.static.normalize_program
to obtain the pruned computation graph.paddle.abs
supports Complex64 and Complex128 types.muti_precision
function of paddle.optimizer.AdamW
to ensure the master weights in FP32 type, which are regularized, in order to prevent possible diverge.paddle.nn.ELU
is nan, the output is nan.Tensor.backward()
for gradient accumulation, in dynamic graph mulit-card training.paddle.nn.functional.softmax_with_cross_entropy
processes a Tensor
with over 2^31 elements.paddle.nn.Sequential
.paddle.nn.functional.local_response_norm
is used in static graph or dynamic graph to static graph converting.paddle.nn.LayerNorm
computation error when data type is float64.metric_learning finetune
under PaddlePaddle/models.paddle_infer::Config::EnableTensorRtDLA()
. At the stage of inference, users can apply DLA of NVIDIA while using TensorRT.group_norm
op, and speed up solov2_r50_fpn_1x
as following: compared with TRT v2.0.0, on T4, CUDA11, cuDNN8.1 and TRT7.1.3, the performance of TRT FP32 improves by 13%, from 87.019ms to 75.13ms, and the performance of TRT FP16 improves by 65%, from 72.9253ms to 44.149ms.auto_growth
is regarded as default distribution policy of memory, tackling problems that some models cannot run with limited memory.mask_rcnn_r50_1x_coco
model when this static graph model is converted from dynamic graph.pip --pre
for installation.libpaddle_fluid.so
to libpaddle_inference.so
.