update gradient clip english doc for new gradient clipping strategy

梯度裁剪的策略进行了升级,配合修复相应的 裁剪API、minimize、ParamAttr 的API英文文档。

对应API变动的文档: #23224 

对应中文文档PR:PaddlePaddle/FluidDoc#1942
revert-23830-2.0-beta
Zhou Wei 6 years ago committed by GitHub
parent 426912df5a
commit e8efaee92d
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

File diff suppressed because it is too large Load Diff

@ -802,10 +802,11 @@ class Optimizer(object):
will be updated. will be updated.
no_grad_set (set, optional): Set of ``Variable`` or ``Variable.name`` that don't need no_grad_set (set, optional): Set of ``Variable`` or ``Variable.name`` that don't need
to be updated. The default value is None. to be updated. The default value is None.
grad_clip (GradClipBase, optional) : Gradient clipping strategy, static grad_clip (GradientClipBase, optional): Gradient cliping strategy, it's an instance of
graph mode does not need to use this argument. Currently, this argument some derived class of ``GradientClipBase`` . There are three cliping strategies
only supports gradient clipping in dygraph mode. In the future, this ( :ref:`api_fluid_clip_GradientClipByGlobalNorm` , :ref:`api_fluid_clip_GradientClipByNorm` ,
argument my be adjusted. The default value is None. :ref:`api_fluid_clip_GradientClipByValue` ). Default value: None, and there is no
gradient clipping.
Returns: Returns:
tuple: tuple (optimize_ops, params_grads), A list of operators appended tuple: tuple (optimize_ops, params_grads), A list of operators appended

@ -32,6 +32,12 @@ class ParamAttr(object):
name, initializer, learning rate, regularizer, trainable, gradient clip, name, initializer, learning rate, regularizer, trainable, gradient clip,
and model average. and model average.
Note:
``gradient_clip`` of ``ParamAttr`` HAS BEEN DEPRECATED since 2.0.
It is recommended to use ``minimize(loss, grad_clip=clip)`` to clip gradient.
There are three clipping strategies: :ref:`api_fluid_clip_GradientClipByGlobalNorm` ,
:ref:`api_fluid_clip_GradientClipByNorm` , :ref:`api_fluid_clip_GradientClipByValue` .
Parameters: Parameters:
name (str, optional): The parameter's name. Default None, meaning that the name name (str, optional): The parameter's name. Default None, meaning that the name
would be created automatically. would be created automatically.
@ -44,8 +50,6 @@ class ParamAttr(object):
regularizer (WeightDecayRegularizer, optional): Regularization factor. Default None, meaning regularizer (WeightDecayRegularizer, optional): Regularization factor. Default None, meaning
there is no regularization. there is no regularization.
trainable (bool): Whether this parameter is trainable. Default True. trainable (bool): Whether this parameter is trainable. Default True.
gradient_clip (BaseGradientClipAttr, optional): The method to clip this parameter's
gradient. Default None, meaning that there is no gradient clip.
do_model_average (bool): Whether this parameter should do model average do_model_average (bool): Whether this parameter should do model average
when model average is enabled. Default False. when model average is enabled. Default False.
@ -191,6 +195,12 @@ class WeightNormParamAttr(ParamAttr):
Training of Deep Neural Networks Training of Deep Neural Networks
<https://arxiv.org/pdf/1602.07868.pdf>`_. <https://arxiv.org/pdf/1602.07868.pdf>`_.
Note:
``gradient_clip`` of ``WeightNormParamAttr`` HAS BEEN DEPRECATED since 2.0.
It is recommended to use ``minimize(loss, grad_clip=clip)`` to clip gradient.
There are three clipping strategies: :ref:`api_fluid_clip_GradientClipByGlobalNorm` ,
:ref:`api_fluid_clip_GradientClipByNorm` , :ref:`api_fluid_clip_GradientClipByValue` .
Args: Args:
dim(int): Dimension over which to compute the norm. Dim is a non-negative dim(int): Dimension over which to compute the norm. Dim is a non-negative
number which is less than the rank of weight Tensor. For Example, dim can number which is less than the rank of weight Tensor. For Example, dim can
@ -209,9 +219,6 @@ class WeightNormParamAttr(ParamAttr):
``regularizer = fluid.regularizer.L2DecayRegularizer(regularization_coeff=0.1)``. ``regularizer = fluid.regularizer.L2DecayRegularizer(regularization_coeff=0.1)``.
Default None, meaning that there is no regularization. Default None, meaning that there is no regularization.
trainable(bool, optional): Whether this parameter is trainable. Default True. trainable(bool, optional): Whether this parameter is trainable. Default True.
gradient_clip: The method to clip this parameter's gradient, such as
``gradient_clip = fluid.clip.GradientClipByNorm(clip_norm=2.0))`` .
Default None, meaning that there is no gradient clip.
do_model_average(bool, optional): Whether this parameter should do model average. do_model_average(bool, optional): Whether this parameter should do model average.
Default False. Default False.
@ -229,7 +236,6 @@ class WeightNormParamAttr(ParamAttr):
learning_rate=1.0, learning_rate=1.0,
regularizer=fluid.regularizer.L2DecayRegularizer(regularization_coeff=0.1), regularizer=fluid.regularizer.L2DecayRegularizer(regularization_coeff=0.1),
trainable=True, trainable=True,
gradient_clip=fluid.clip.GradientClipByNorm(clip_norm=2.0),
do_model_average=False)) do_model_average=False))
""" """

Loading…
Cancel
Save