update gradient clip english doc for new gradient clipping strategy

梯度裁剪的策略进行了升级,配合修复相应的 裁剪API、minimize、ParamAttr 的API英文文档。

对应API变动的文档: #23224 

对应中文文档PR:PaddlePaddle/FluidDoc#1942
revert-23830-2.0-beta
Zhou Wei 6 years ago committed by GitHub
parent 426912df5a
commit e8efaee92d
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

File diff suppressed because it is too large Load Diff

@ -801,11 +801,12 @@ class Optimizer(object):
to minimize ``loss``. The default value is None, at this time all parameters
will be updated.
no_grad_set (set, optional): Set of ``Variable`` or ``Variable.name`` that don't need
to be updated. The default value is None.
grad_clip (GradClipBase, optional) : Gradient clipping strategy, static
graph mode does not need to use this argument. Currently, this argument
only supports gradient clipping in dygraph mode. In the future, this
argument my be adjusted. The default value is None.
to be updated. The default value is None.
grad_clip (GradientClipBase, optional): Gradient cliping strategy, it's an instance of
some derived class of ``GradientClipBase`` . There are three cliping strategies
( :ref:`api_fluid_clip_GradientClipByGlobalNorm` , :ref:`api_fluid_clip_GradientClipByNorm` ,
:ref:`api_fluid_clip_GradientClipByValue` ). Default value: None, and there is no
gradient clipping.
Returns:
tuple: tuple (optimize_ops, params_grads), A list of operators appended

@ -31,6 +31,12 @@ class ParamAttr(object):
Create a object to represent the attribute of parameter. The attributes are:
name, initializer, learning rate, regularizer, trainable, gradient clip,
and model average.
Note:
``gradient_clip`` of ``ParamAttr`` HAS BEEN DEPRECATED since 2.0.
It is recommended to use ``minimize(loss, grad_clip=clip)`` to clip gradient.
There are three clipping strategies: :ref:`api_fluid_clip_GradientClipByGlobalNorm` ,
:ref:`api_fluid_clip_GradientClipByNorm` , :ref:`api_fluid_clip_GradientClipByValue` .
Parameters:
name (str, optional): The parameter's name. Default None, meaning that the name
@ -44,8 +50,6 @@ class ParamAttr(object):
regularizer (WeightDecayRegularizer, optional): Regularization factor. Default None, meaning
there is no regularization.
trainable (bool): Whether this parameter is trainable. Default True.
gradient_clip (BaseGradientClipAttr, optional): The method to clip this parameter's
gradient. Default None, meaning that there is no gradient clip.
do_model_average (bool): Whether this parameter should do model average
when model average is enabled. Default False.
@ -190,6 +194,12 @@ class WeightNormParamAttr(ParamAttr):
paper: `Weight Normalization: A Simple Reparameterization to Accelerate
Training of Deep Neural Networks
<https://arxiv.org/pdf/1602.07868.pdf>`_.
Note:
``gradient_clip`` of ``WeightNormParamAttr`` HAS BEEN DEPRECATED since 2.0.
It is recommended to use ``minimize(loss, grad_clip=clip)`` to clip gradient.
There are three clipping strategies: :ref:`api_fluid_clip_GradientClipByGlobalNorm` ,
:ref:`api_fluid_clip_GradientClipByNorm` , :ref:`api_fluid_clip_GradientClipByValue` .
Args:
dim(int): Dimension over which to compute the norm. Dim is a non-negative
@ -209,9 +219,6 @@ class WeightNormParamAttr(ParamAttr):
``regularizer = fluid.regularizer.L2DecayRegularizer(regularization_coeff=0.1)``.
Default None, meaning that there is no regularization.
trainable(bool, optional): Whether this parameter is trainable. Default True.
gradient_clip: The method to clip this parameter's gradient, such as
``gradient_clip = fluid.clip.GradientClipByNorm(clip_norm=2.0))`` .
Default None, meaning that there is no gradient clip.
do_model_average(bool, optional): Whether this parameter should do model average.
Default False.
@ -229,7 +236,6 @@ class WeightNormParamAttr(ParamAttr):
learning_rate=1.0,
regularizer=fluid.regularizer.L2DecayRegularizer(regularization_coeff=0.1),
trainable=True,
gradient_clip=fluid.clip.GradientClipByNorm(clip_norm=2.0),
do_model_average=False))
"""

Loading…
Cancel
Save