!10519 update documentation of warmup_lr, F1, RMSProp, Batchnorm2d and add some pictures of links of activation function.

From: @wangshuide2020
Reviewed-by: @liangchenghui,@wuxuejian
Signed-off-by: @liangchenghui
pull/10519/MERGE
mindspore-ci-bot 4 years ago committed by Gitee
commit 4dfd143483

@ -343,13 +343,12 @@ def warmup_lr(learning_rate, total_step, step_per_epoch, warmup_epoch):
Args:
learning_rate (float): The initial value of learning rate.
warmup_steps (int): The warm up steps of learning rate.
Inputs:
Tensor. The current step number.
total_step (int): The total number of steps.
step_per_epoch (int): The number of steps in per epoch.
warmup_epoch (int): A value that determines the epochs of the learning rate is warmed up.
Returns:
Tensor. The learning rate value for the current step.
list[float]. The size of list is `total_step`.
Examples:
>>> learning_rate = 0.1

@ -142,6 +142,9 @@ class ELU(Cell):
\text{alpha} * (\exp(x_i) - 1), &\text{otherwise.}
\end{cases}
The picture about ELU looks like this `ELU <https://en.wikipedia.org/wiki/
Activation_function#/media/File:Activation_elu.svg>`_.
Args:
alpha (float): The coefficient of negative factor whose type is float. Default: 1.0.
@ -178,6 +181,9 @@ class ReLU(Cell):
element-wise :math:`\max(0, x)`, specially, the neurons with the negative output
will be suppressed and the active neurons will stay the same.
The picture about ReLU looks like this `ReLU <https://en.wikipedia.org/wiki/
Activation_function#/media/File:Activation_rectified_linear.svg>`_.
Inputs:
- **input_data** (Tensor) - The input of ReLU.
@ -335,6 +341,9 @@ class GELU(Cell):
:math:`GELU(x_i) = x_i*P(X < x_i)`, where :math:`P` is the cumulative distribution function
of standard Gaussian distribution and :math:`x_i` is the element of the input.
The picture about GELU looks like this `GELU <https://en.wikipedia.org/wiki/
Activation_function#/media/File:Activation_gelu.png>`_.
Inputs:
- **input_data** (Tensor) - The input of GELU.
@ -410,6 +419,9 @@ class Sigmoid(Cell):
Sigmoid function is defined as:
:math:`\text{sigmoid}(x_i) = \frac{1}{1 + \exp(-x_i)}`, where :math:`x_i` is the element of the input.
The picture about Sigmoid looks like this `Sigmoid <https://en.wikipedia.org/wiki/
Sigmoid_function#/media/File:Logistic-curve.svg>`_.
Inputs:
- **input_data** (Tensor) - The input of Tanh.
@ -448,6 +460,9 @@ class PReLU(Cell):
Parameter :math:`w` has dimensionality of the argument channel. If called without argument
channel, a single parameter :math:`w` will be shared across all channels.
The picture about PReLU looks like this `PReLU <https://en.wikipedia.org/wiki/
Activation_function#/media/File:Activation_prelu.svg>`_.
Args:
channel (int): The dimension of input. Default: 1.
w (float): The initial value of w. Default: 0.25.

@ -340,6 +340,9 @@ class BatchNorm2d(_BatchNorm):
Note:
The implementation of BatchNorm is different in graph mode and pynative mode, therefore that mode can not be
changed after net was initilized.
Note that the formula for updating the running_mean and running_var is
:math:`\hat{x}_\text{new} = (1 - \text{momentum}) \times x_t + \text{momentum} \times \hat{x}`,
where :math:`\hat{x}` is the estimated statistic and :math:`x_t` is the new observed value.
Args:
num_features (int): `C` from an expected input of size (N, C, H, W).

@ -122,7 +122,7 @@ class Fbeta(Metric):
class F1(Fbeta):
r"""
Calculates the F1 score. F1 is a special case of Fbeta when beta is 1.
Refer to class `Fbeta` for more details.
Refer to class :class: `mindspore.nn.Fbeta` for more details.
.. math::
F_1=\frac{2\cdot true\_positive}{2\cdot true\_positive + false\_negative + false\_positive}

@ -42,51 +42,51 @@ class RMSProp(Optimizer):
"""
Implements Root Mean Squared Propagation (RMSProp) algorithm.
Note:
When separating parameter groups, the weight decay in each group will be applied on the parameters if the
weight decay is positive. When not separating parameter groups, the `weight_decay` in the API will be applied
on the parameters without 'beta' or 'gamma' in their names if `weight_decay` is positive.
Update `params` according to the RMSProp algorithm.
To improve parameter groups performance, the customized order of parameters can be supported.
The equation is as follows:
Update `params` according to the RMSProp algorithm.
.. math::
s_{t} = \\rho s_{t-1} + (1 - \\rho)(\\nabla Q_{i}(w))^2
The equation is as follows:
.. math::
m_{t} = \\beta m_{t-1} + \\frac{\\eta} {\\sqrt{s_{t} + \\epsilon}} \\nabla Q_{i}(w)
.. math::
s_{t} = \\rho s_{t-1} + (1 - \\rho)(\\nabla Q_{i}(w))^2
.. math::
w = w - m_{t}
.. math::
m_{t} = \\beta m_{t-1} + \\frac{\\eta} {\\sqrt{s_{t} + \\epsilon}} \\nabla Q_{i}(w)
The first equation calculates moving average of the squared gradient for
each weight. Then dividing the gradient by :math:`\\sqrt{ms_{t} + \\epsilon}`.
.. math::
w = w - m_{t}
if centered is True:
The first equation calculates moving average of the squared gradient for
each weight. Then dividing the gradient by :math:`\\sqrt{ms_{t} + \\epsilon}`.
.. math::
g_{t} = \\rho g_{t-1} + (1 - \\rho)\\nabla Q_{i}(w)
if centered is True:
.. math::
s_{t} = \\rho s_{t-1} + (1 - \\rho)(\\nabla Q_{i}(w))^2
.. math::
g_{t} = \\rho g_{t-1} + (1 - \\rho)\\nabla Q_{i}(w)
.. math::
m_{t} = \\beta m_{t-1} + \\frac{\\eta} {\\sqrt{s_{t} - g_{t}^2 + \\epsilon}} \\nabla Q_{i}(w)
.. math::
s_{t} = \\rho s_{t-1} + (1 - \\rho)(\\nabla Q_{i}(w))^2
.. math::
w = w - m_{t}
.. math::
m_{t} = \\beta m_{t-1} + \\frac{\\eta} {\\sqrt{s_{t} - g_{t}^2 + \\epsilon}} \\nabla Q_{i}(w)
where :math:`w` represents `params`, which will be updated.
:math:`g_{t}` is mean gradients, :math:`g_{t-1}` is the last moment of :math:`g_{t}`.
:math:`s_{t}` is the mean square gradients, :math:`s_{t-1}` is the last moment of :math:`s_{t}`,
:math:`m_{t}` is moment, the delta of `w`, :math:`m_{t-1}` is the last moment of :math:`m_{t}`.
:math:`\\rho` represents `decay`. :math:`\\beta` is the momentum term, represents `momentum`.
:math:`\\epsilon` is a smoothing term to avoid division by zero, represents `epsilon`.
:math:`\\eta` is learning rate, represents `learning_rate`. :math:`\\nabla Q_{i}(w)` is gradientse,
represents `gradients`.
.. math::
w = w - m_{t}
Note:
When separating parameter groups, the weight decay in each group will be applied on the parameters if the
weight decay is positive. When not separating parameter groups, the `weight_decay` in the API will be applied
on the parameters without 'beta' or 'gamma' in their names if `weight_decay` is positive.
where :math:`w` represents `params`, which will be updated.
:math:`g_{t}` is mean gradients, :math:`g_{t-1}` is the last moment of :math:`g_{t}`.
:math:`s_{t}` is the mean square gradients, :math:`s_{t-1}` is the last moment of :math:`s_{t}`,
:math:`m_{t}` is moment, the delta of `w`, :math:`m_{t-1}` is the last moment of :math:`m_{t}`.
:math:`\\rho` represents `decay`. :math:`\\beta` is the momentum term, represents `momentum`.
:math:`\\epsilon` is a smoothing term to avoid division by zero, represents `epsilon`.
:math:`\\eta` is learning rate, represents `learning_rate`. :math:`\\nabla Q_{i}(w)` is gradientse,
represents `gradients`.
To improve parameter groups performance, the customized order of parameters can be supported.
Args:
params (Union[list[Parameter], list[dict]]): When the `params` is a list of `Parameter` which will be updated,

Loading…
Cancel
Save