!4388 Third round of enhancement of API comment & README_CN

Merge pull request !4388 from Simson/enhancement-API
mindspore-ci-bot 5 years ago committed by Gitee
commit 15496ff5a4

@ -1,7 +1,9 @@
![MindSpore Logo](docs/MindSpore-logo.png "MindSpore logo")
- [What Is MindSpore?](#what-is-mindspore)
- [What Is MindSpore](#what-is-mindspore)
- [Automatic Differentiation](#automatic-differentiation)
- [Automatic Parallel](#automatic-parallel)
- [Installation](#installation)

@ -0,0 +1,220 @@
![MindSpore标志](docs/MindSpore-logo.png "MindSpore logo")
[View English](./README.md)
- [MindSpore介绍](#mindspore介绍)
- [自动微分](#自动微分)
- [自动并行](#自动并行)
- [安装](#安装)
- [二进制文件](#二进制文件)
- [来源](#来源)
- [Docker镜像](#docker镜像)
- [快速入门](#快速入门)
- [文档](#文档)
- [社区](#社区)
- [治理](#治理)
- [交流](#交流)
- [贡献](#贡献)
- [版本说明](#版本说明)
- [许可证](#许可证)
## MindSpore介绍
MindSpore提供了友好的设计和高效的执行旨在提升数据科学家和算法工程师的开发体验并为Ascend AI处理器提供原生支持以及软硬件协同优化。
<img src="docs/MindSpore-architecture.png" alt="MindSpore Architecture" width="600"/>
### 自动微分
- **基于静态计算图的转换**:编译时将网络转换为静态数据流图,将链式法则应用于数据流图,实现自动微分。
- **基于动态计算图的转换**:记录算子过载正向执行时网络的运行轨迹,对动态生成的数据流图应用链式法则,实现自动微分。
- **基于源码的转换**该技术是从功能编程框架演进而来以即时编译Just-in-time CompilationJIT的形式对中间表达式程序在编译过程中的表达式进行自动差分转换支持复杂的控制流场景、高阶函数和闭包。
<img src="docs/Automatic-differentiation.png" alt="Automatic Differentiation" width="600"/>
MindSpore自动微分的实现可以理解为程序本身的符号微分。MindSpore IR是一个函数中间表达式它与基础代数中的复合函数具有直观的对应关系。复合函数的公式由任意可推导的基础函数组成。MindSpore IR中的每个原语操作都可以对应基础代数中的基本功能从而可以建立更复杂的流控制。
### 自动并行
<img src="docs/Automatic-parallel.png" alt="Automatic Parallel" width="600"/>
## 安装
### 二进制文件
| 硬件平台 | 操作系统 | 状态 |
| :------------ | :-------------- | :--- |
| Ascend 910 | Ubuntu-x86 | ✔️ |
| | EulerOS-x86 | ✔️ |
| | EulerOS-aarch64 | ✔️ |
| GPU CUDA 10.1 | Ubuntu-x86 | ✔️ |
| CPU | Ubuntu-x86 | ✔️ |
| | Windows-x86 | ✔️ |
1. 请从[MindSpore下载页面](https://www.mindspore.cn/versions)下载并安装whl包。
pip install https://ms-release.obs.cn-north-4.myhuaweicloud.com/0.6.0-beta/MindSpore/cpu/ubuntu_x86/mindspore-0.6.0-cp37-cp37m-linux_x86_64.whl
2. 执行以下命令,验证安装结果。
import numpy as np
import mindspore.context as context
import mindspore.nn as nn
from mindspore import Tensor
from mindspore.ops import operations as P
context.set_context(mode=context.GRAPH_MODE, device_target="CPU")
class Mul(nn.Cell):
def __init__(self):
super(Mul, self).__init__()
self.mul = P.Mul()
def construct(self, x, y):
return self.mul(x, y)
x = Tensor(np.array([1.0, 2.0, 3.0]).astype(np.float32))
y = Tensor(np.array([4.0, 5.0, 6.0]).astype(np.float32))
mul = Mul()
print(mul(x, y))
[ 4. 10. 18.]
### 来源
### Docker镜像
MindSpore的Docker镜像托管在[Docker Hub](https://hub.docker.com/r/mindspore)上。
| 硬件平台 | Docker镜像仓库 | 标签 | 说明 |
| :----- | :------------------------ | :----------------------- | :--------------------------------------- |
| CPU | `mindspore/mindspore-cpu` | `x.y.z` | 已经预安装MindSpore `x.y.z` CPU版本的生产环境。 |
| | | `devel` | 提供开发环境从源头构建MindSpore`CPU`后端。安装详情请参考https://www.mindspore.cn/install。 |
| | | `runtime` | 提供运行时环境安装MindSpore二进制包`CPU`后端)。 |
| GPU | `mindspore/mindspore-gpu` | `x.y.z` | 已经预安装MindSpore `x.y.z` GPU版本的生产环境。 |
| | | `devel` | 提供开发环境从源头构建MindSpore`GPU CUDA10.1`后端。安装详情请参考https://www.mindspore.cn/install。 |
| | | `runtime` | 提供运行时环境安装MindSpore二进制包`GPU CUDA10.1`后端)。 |
| Ascend | <center>&mdash;</center> | <center>&mdash;</center> | 即将推出,敬请期待。 |
> **注意:** 不建议从源头构建GPU `devel` Docker镜像后直接安装whl包。我们强烈建议您在GPU `runtime` Docker镜像中传输并安装whl包。
docker pull mindspore/mindspore-cpu:0.6.0-beta
docker run -it mindspore/mindspore-cpu:0.6.0-beta /bin/bash
DISTRIBUTION=$(. /etc/os-release; echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$DISTRIBUTION/nvidia-docker.list | tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit nvidia-docker2
sudo systemctl restart docker
docker pull mindspore/mindspore-gpu:0.6.0-beta
docker run -it --runtime=nvidia --privileged=true mindspore/mindspore-gpu:0.6.0-beta /bin/bash
import numpy as np
import mindspore.context as context
from mindspore import Tensor
from mindspore.ops import functional as F
x = Tensor(np.ones([1,3,3,4]).astype(np.float32))
y = Tensor(np.ones([1,3,3,4]).astype(np.float32))
print(F.tensor_add(x, y))
[[[ 2. 2. 2. 2.],
[ 2. 2. 2. 2.],
[ 2. 2. 2. 2.]],
[[ 2. 2. 2. 2.],
[ 2. 2. 2. 2.],
[ 2. 2. 2. 2.]],
[[ 2. 2. 2. 2.],
[ 2. 2. 2. 2.],
[ 2. 2. 2. 2.]]]
如果您想了解更多关于MindSpore Docker镜像的构建过程请查看[docker](docker/README.md) repo了解详细信息。
## 快速入门
## 文档
## 社区
### 治理
### 交流
- [MindSpore Slack](https://join.slack.com/t/mindspore/shared_invite/zt-dgk65rli-3ex4xvS4wHX7UDmsQmfu8w) 开发者交流平台。
- `#mindspore`IRC频道仅用于会议记录
- 视频会议:待定
- 邮件列表:<https://mailweb.mindspore.cn/postorius/lists>
## 贡献
## 版本说明
## 许可证
[Apache License 2.0](LICENSE)

@ -150,7 +150,7 @@ TensorPtr TensorPy::MakeTensor(const py::array &input, const TypePtr &type_ptr)
// Get tensor shape.
std::vector<int> shape(buf.shape.begin(), buf.shape.end());
if (data_type == buf_type) {
// Use memory copy if input data type is same as the required type.
// Use memory copy if input data type is the same as the required type.
return std::make_shared<Tensor>(data_type, shape, buf.ptr, buf.size * buf.itemsize);
// Create tensor with data type converted.

@ -546,9 +546,11 @@ def set_context(**kwargs):
Attribute name is required for setting attributes.
The mode is not recommended to be changed after net was initilized because the implementations of some
operations are different in graph mode and pynative mode. Default: PYNATIVE_MODE.
mode (int): Running in GRAPH_MODE(0) or PYNATIVE_MODE(1). Default: PYNATIVE_MODE.
mode (int): Running in GRAPH_MODE(0) or PYNATIVE_MODE(1).
device_target (str): The target device to run, support "Ascend", "GPU", "CPU". Default: "Ascend".
device_id (int): Id of target device, the value must be in [0, device_num_per_host-1],
while device_num_per_host should no more than 4096. Default: 0.

@ -148,7 +148,7 @@ class Cell:
def update_cell_type(self, cell_type):
Update the current cell type mainly identify if quantization aware training network.
The current cell type is updated when a quantization aware training network is encountered.
After being invoked, it can set the cell type to 'cell_type'.
@ -936,7 +936,7 @@ class GraphKernel(Cell):
Base class for GraphKernel.
A `GraphKernel` a composite of basic primitives and can be compiled into a fused kernel automatically when
enable_graph_kernel in context is set to True.
>>> class Relu(GraphKernel):

@ -661,7 +661,7 @@ class LogSoftmax(GraphKernel):
Log Softmax activation function.
Applies the Log Softmax function to the input tensor on the specified axis.
Suppose a slice along the given aixs :math:`x` then for each element :math:`x_i`
Suppose a slice in the given aixs :math:`x` then for each element :math:`x_i`
the Log Softmax function is shown as follows:
.. math::
@ -987,10 +987,10 @@ class LayerNorm(Cell):
Applies Layer Normalization over a mini-batch of inputs.
Layer normalization is widely used in recurrent neural networks. It applies
normalization over a mini-batch of inputs for each single training case as described
normalization on a mini-batch of inputs for each single training case as described
in the paper `Layer Normalization <https://arxiv.org/pdf/1607.06450.pdf>`_. Unlike batch
normalization, layer normalization performs exactly the same computation at training and
testing times. It can be described using the following formula. It is applied across all channels
testing time. It can be described using the following formula. It is applied across all channels
and pixel but only one batch size.
.. math::
@ -1139,9 +1139,9 @@ class LambNextMV(GraphKernel):
Tuple of 2 Tensor.
- **add3** (Tensor) - The shape is same as the shape after broadcasting, and the data type is
- **add3** (Tensor) - The shape is the same as the shape after broadcasting, and the data type is
the one with high precision or high digits among the inputs.
- **realdiv4** (Tensor) - The shape is same as the shape after broadcasting, and the data type is
- **realdiv4** (Tensor) - The shape is the same as the shape after broadcasting, and the data type is
the one with high precision or high digits among the inputs.

@ -55,7 +55,7 @@ class Softmax(Cell):
.. math::
\text{softmax}(x_{i}) = \frac{\exp(x_i)}{\sum_{j=0}^{n-1}\exp(x_j)},
where :math:`x_{i}` is the :math:`i`-th slice along the given dim of the input Tensor.
where :math:`x_{i}` is the :math:`i`-th slice in the given dimension of the input Tensor.
axis (Union[int, tuple[int]]): The axis to apply Softmax operation, -1 means the last dimension. Default: -1.
@ -87,11 +87,11 @@ class LogSoftmax(Cell):
Applies the LogSoftmax function to n-dimensional input tensor.
The input is transformed with Softmax function and then with log function to lie in range[-inf,0).
The input is transformed by the Softmax function and then by the log function to lie in range[-inf,0).
Logsoftmax is defined as:
:math:`\text{logsoftmax}(x_i) = \log \left(\frac{\exp(x_i)}{\sum_{j=0}^{n-1} \exp(x_j)}\right)`,
where :math:`x_{i}` is the :math:`i`-th slice along the given dim of the input Tensor.
where :math:`x_{i}` is the :math:`i`-th slice in the given dimension of the input Tensor.
axis (int): The axis to apply LogSoftmax operation, -1 means the last dimension. Default: -1.
@ -123,7 +123,7 @@ class ELU(Cell):
Exponential Linear Uint activation function.
Applies the exponential linear unit function element-wise.
The activation function defined as:
The activation function is defined as:
.. math::
E_{i} =
@ -162,7 +162,7 @@ class ReLU(Cell):
Applies the rectified linear unit function element-wise. It returns
element-wise :math:`\max(0, x)`, specially, the neurons with the negative output
will suppressed and the active neurons will stay the same.
will be suppressed and the active neurons will stay the same.
- **input_data** (Tensor) - The input of ReLU.
@ -197,7 +197,7 @@ class ReLU6(Cell):
- **input_data** (Tensor) - The input of ReLU6.
Tensor, which has the same type with `input_data`.
Tensor, which has the same type as `input_data`.
>>> input_x = Tensor(np.array([-1, -2, 0, 2, 1]), mindspore.float16)
@ -234,7 +234,7 @@ class LeakyReLU(Cell):
- **input_x** (Tensor) - The input of LeakyReLU.
Tensor, has the same type and shape with the `input_x`.
Tensor, has the same type and shape as the `input_x`.
>>> input_x = Tensor(np.array([[-1.0, 4.0, -8.0], [2.0, -5.0, 9.0]]), mindspore.float32)
@ -365,7 +365,7 @@ class PReLU(Cell):
PReLU is defined as: :math:`prelu(x_i)= \max(0, x_i) + w * \min(0, x_i)`, where :math:`x_i`
is an element of an channel of the input.
Here :math:`w` is an learnable parameter with default initial value 0.25.
Here :math:`w` is a learnable parameter with a default initial value 0.25.
Parameter :math:`w` has dimensionality of the argument channel. If called without argument
channel, a single parameter :math:`w` will be shared across all channels.
@ -413,7 +413,7 @@ class PReLU(Cell):
class HSwish(Cell):
rHard swish activation function.
Hard swish activation function.
Applies hswish-type activation element-wise. The input is a Tensor with any valid shape.
@ -422,7 +422,7 @@ class HSwish(Cell):
.. math::
\text{hswish}(x_{i}) = x_{i} * \frac{ReLU6(x_{i} + 3)}{6},
where :math:`x_{i}` is the :math:`i`-th slice along the given dim of the input Tensor.
where :math:`x_{i}` is the :math:`i`-th slice in the given dimension of the input Tensor.
- **input_data** (Tensor) - The input of HSwish.
@ -456,7 +456,7 @@ class HSigmoid(Cell):
.. math::
\text{hsigmoid}(x_{i}) = max(0, min(1, \frac{x_{i} + 3}{6})),
where :math:`x_{i}` is the :math:`i`-th slice along the given dim of the input Tensor.
where :math:`x_{i}` is the :math:`i`-th slice in the given dimension of the input Tensor.
- **input_data** (Tensor) - The input of HSigmoid.

@ -65,7 +65,7 @@ class Dropout(Cell):
dtype (:class:`mindspore.dtype`): Data type of input. Default: mindspore.float32.
ValueError: If keep_prob is not in range (0, 1).
ValueError: If `keep_prob` is not in range (0, 1).
- **input** (Tensor) - An N-D Tensor.
@ -373,8 +373,8 @@ class OneHot(Cell):
axis is created at dimension `axis`.
axis (int): Features x depth if axis == -1, depth x features
if axis == 0. Default: -1.
axis (int): Features x depth if axis is -1, depth x features
if axis is 0. Default: -1.
depth (int): A scalar defining the depth of the one hot dimension. Default: 1.
on_value (float): A scalar defining the value to fill in output[i][j]
when indices[j] = i. Default: 1.0.
@ -492,18 +492,18 @@ class Unfold(Cell):
The input tensor must be a 4-D tensor and the data format is NCHW.
ksizes (Union[tuple[int], list[int]]): The size of sliding window, should be a tuple or list of int,
ksizes (Union[tuple[int], list[int]]): The size of sliding window, should be a tuple or a list of integers,
and the format is [1, ksize_row, ksize_col, 1].
strides (Union[tuple[int], list[int]]): Distance between the centers of the two consecutive patches,
should be a tuple or list of int, and the format is [1, stride_row, stride_col, 1].
rates (Union[tuple[int], list[int]]): In each extracted patch, the gap between the corresponding dim
pixel positions, should be a tuple or list of int, and the format is [1, rate_row, rate_col, 1].
rates (Union[tuple[int], list[int]]): In each extracted patch, the gap between the corresponding dimension
pixel positions, should be a tuple or a list of integers, and the format is [1, rate_row, rate_col, 1].
padding (str): The type of padding algorithm, is a string whose value is "same" or "valid",
not case sensitive. Default: "valid".
- same: Means that the patch can take the part beyond the original image, and this part is filled with 0.
- valid: Means that the patch area taken must be completely contained in the original image.
- valid: Means that the taken patch area must be completely covered in the original image.
- **input_x** (Tensor) - A 4-D tensor whose shape is [in_batch, in_depth, in_row, in_col] and
@ -511,7 +511,7 @@ class Unfold(Cell):
Tensor, a 4-D tensor whose data type is same as 'input_x',
and the shape is [out_batch, out_depth, out_row, out_col], the out_batch is same as the in_batch.
and the shape is [out_batch, out_depth, out_row, out_col], the out_batch is the same as the in_batch.
>>> net = Unfold(ksizes=[1, 2, 2, 1], strides=[1, 1, 1, 1], rates=[1, 1, 1, 1])
@ -556,11 +556,11 @@ class MatrixDiag(Cell):
Returns a batched diagonal tensor with a given batched diagonal values.
- **x** (Tensor) - The diagonal values. It can be of the following data types:
float32, float16, int32, int8, uint8.
- **x** (Tensor) - The diagonal values. It can be one of the following data types:
float32, float16, int32, int8, and uint8.
Tensor, same type as input `x`. The shape should be x.shape + (x.shape[-1], ).
Tensor, has the same type as input `x`. The shape should be x.shape + (x.shape[-1], ).
>>> x = Tensor(np.array([1, -1]), mstype.float32)
@ -587,11 +587,11 @@ class MatrixDiagPart(Cell):
Returns the batched diagonal part of a batched tensor.
- **x** (Tensor) - The batched tensor. It can be of the following data types:
float32, float16, int32, int8, uint8.
- **x** (Tensor) - The batched tensor. It can be one of the following data types:
float32, float16, int32, int8, and uint8.
Tensor, same type as input `x`. The shape should be x.shape[:-2] + [min(x.shape[-2:])].
Tensor, has the same type as input `x`. The shape should be x.shape[:-2] + [min(x.shape[-2:])].
>>> x = Tensor([[[-1, 0], [0, 1]], [[-1, 0], [0, 1]], [[-1, 0], [0, 1]]], mindspore.float32)
@ -617,12 +617,12 @@ class MatrixSetDiag(Cell):
Modify the batched diagonal part of a batched tensor.
- **x** (Tensor) - The batched tensor. It can be of the following data types:
float32, float16, int32, int8, uint8.
- **x** (Tensor) - The batched tensor. It can be one of the following data types:
float32, float16, int32, int8, and uint8.
- **diagonal** (Tensor) - The diagonal values.
Tensor, same type as input `x`. The shape same as `x`.
Tensor, has the same type and shape as input `x`.
>>> x = Tensor([[[-1, 0], [0, 1]], [[-1, 0], [0, 1]], [[-1, 0], [0, 1]]], mindspore.float32)

@ -72,7 +72,7 @@ class SequentialCell(Cell):
args (list, OrderedDict): List of subclass of Cell.
TypeError: If arg is not of type list or OrderedDict.
TypeError: If the type of the argument is not list or OrderedDict.
- **input** (Tensor) - Tensor with shape according to the first Cell in the sequence.

@ -131,7 +131,7 @@ class Conv2d(_Conv):
in_channels (int): The number of input channel :math:`C_{in}`.
out_channels (int): The number of output channel :math:`C_{out}`.
kernel_size (Union[int, tuple[int]]): The data type is int or tuple with 2 integers. Specifies the height
kernel_size (Union[int, tuple[int]]): The data type is int or a tuple of 2 integers. Specifies the height
and width of the 2D convolution window. Single int means the value is for both the height and the width of
the kernel. A tuple of 2 ints means the first value is for the height and the other is for the
width of the kernel.
@ -147,7 +147,7 @@ class Conv2d(_Conv):
last extra padding will be done from the bottom and the right side. If this mode is set, `padding`
must be 0.
- valid: Adopts the way of discarding. The possibly largest height and width of output will be returned
- valid: Adopts the way of discarding. The possible largest height and width of output will be returned
without padding. Extra pixels will be discarded. If this mode is set, `padding`
must be 0.
@ -158,7 +158,7 @@ class Conv2d(_Conv):
the padding of top, bottom, left and right is the same, equal to padding. If `padding` is a tuple
with four integers, the padding of top, bottom, left and right will be equal to padding[0],
padding[1], padding[2], and padding[3] accordingly. Default: 0.
dilation (Union[int, tuple[int]]): The data type is int or tuple with 2 integers. Specifies the dilation rate
dilation (Union[int, tuple[int]]): The data type is int or a tuple of 2 integers. Specifies the dilation rate
to use for dilated convolution. If set to be :math:`k > 1`, there will
be :math:`k - 1` pixels skipped for each sampling location. Its value should
be greater or equal to 1 and bounded by the height and width of the
@ -451,7 +451,7 @@ class Conv2dTranspose(_Conv):
in_channels (int): The number of channels in the input space.
out_channels (int): The number of channels in the output space.
kernel_size (Union[int, tuple]): int or tuple with 2 integers, which specifies the height
kernel_size (Union[int, tuple]): int or a tuple of 2 integers, which specifies the height
and width of the 2D convolution window. Single int means the value is for both the height and the width of
the kernel. A tuple of 2 ints means the first value is for the height and the other is for the
width of the kernel.
@ -825,7 +825,7 @@ class DepthwiseConv2d(Cell):
in_channels (int): The number of input channel :math:`C_{in}`.
out_channels (int): The number of output channel :math:`C_{out}`.
kernel_size (Union[int, tuple[int]]): The data type is int or tuple with 2 integers. Specifies the height
kernel_size (Union[int, tuple[int]]): The data type is int or a tuple of 2 integers. Specifies the height
and width of the 2D convolution window. Single int means the value is for both the height and the width of
the kernel. A tuple of 2 ints means the first value is for the height and the other is for the
width of the kernel.
@ -841,7 +841,7 @@ class DepthwiseConv2d(Cell):
last extra padding will be done from the bottom and the right side. If this mode is set, `padding`
must be 0.
- valid: Adopts the way of discarding. The possibly largest height and width of output will be returned
- valid: Adopts the way of discarding. The possible largest height and width of output will be returned
without padding. Extra pixels will be discarded. If this mode is set, `padding`
must be 0.
@ -849,16 +849,16 @@ class DepthwiseConv2d(Cell):
Tensor borders. `padding` should be greater than or equal to 0.
padding (int): Implicit paddings on both sides of the input. Default: 0.
dilation (Union[int, tuple[int]]): The data type is int or tuple with 2 integers. Specifies the dilation rate
dilation (Union[int, tuple[int]]): The data type is int or a tuple of 2 integers. Specifies the dilation rate
to use for dilated convolution. If set to be :math:`k > 1`, there will
be :math:`k - 1` pixels skipped for each sampling location. Its value should
be greater or equal to 1 and bounded by the height and width of the
be greater than or equal to 1 and bounded by the height and width of the
input. Default: 1.
group (int): Split filter into groups, `in_ channels` and `out_channels` should be
divisible by the number of groups. Default: 1.
has_bias (bool): Specifies whether the layer uses a bias vector. Default: False.
weight_init (Union[Tensor, str, Initializer, numbers.Number]): Initializer for the convolution kernel.
It can be a Tensor, a string, an Initializer or a numbers.Number. When a string is specified,
It can be a Tensor, a string, an Initializer or a number. When a string is specified,
values from 'TruncatedNormal', 'Normal', 'Uniform', 'HeUniform' and 'XavierUniform' distributions as well
as constant 'One' and 'Zero' distributions are possible. Alias 'xavier_uniform', 'he_uniform', 'ones'
and 'zeros' are acceptable. Uppercase and lowercase are both acceptable. Refer to the values of

@ -36,7 +36,7 @@ class Embedding(Cell):
the corresponding word embeddings.
When 'use_one_hot' is set to True, the input should be of type mindspore.int32.
When 'use_one_hot' is set to True, the type of the input should be mindspore.int32.
vocab_size (int): Size of the dictionary of embeddings.
@ -48,9 +48,9 @@ class Embedding(Cell):
dtype (:class:`mindspore.dtype`): Data type of input. Default: mindspore.float32.
- **input** (Tensor) - Tensor of shape :math:`(\text{batch_size}, \text{input_length})`. The element of
the Tensor should be integer and not larger than vocab_size. else the corresponding embedding vector is zero
if larger than vocab_size.
- **input** (Tensor) - Tensor of shape :math:`(\text{batch_size}, \text{input_length})`. The elements of
the Tensor should be integer and not larger than vocab_size. Otherwise the corresponding embedding vector will
be zero.
Tensor of shape :math:`(\text{batch_size}, \text{input_length}, \text{embedding_size})`.

@ -253,7 +253,7 @@ class MSSSIM(Cell):
max_val (Union[int, float]): The dynamic range of the pixel values (255 for 8-bit grayscale images).
Default: 1.0.
power_factors (Union[tuple, list]): Iterable of weights for each of the scales.
power_factors (Union[tuple, list]): Iterable of weights for each scal e.
Default: (0.0448, 0.2856, 0.3001, 0.2363, 0.1333). Default values obtained by Wang et al.
filter_size (int): The size of the Gaussian filter. Default: 11.
filter_sigma (float): The standard deviation of Gaussian kernel. Default: 1.5.

@ -35,7 +35,7 @@ class LSTM(Cell):
Applies a LSTM to the input.
There are two pipelines connecting two consecutive cells in a LSTM model; one is cell state pipeline
and another is hidden state pipeline. Denote two consecutive time nodes as :math:`t-1` and :math:`t`.
and the other is hidden state pipeline. Denote two consecutive time nodes as :math:`t-1` and :math:`t`.
Given an input :math:`x_t` at time :math:`t`, an hidden state :math:`h_{t-1}` and an cell
state :math:`c_{t-1}` of the layer at time :math:`{t-1}`, the cell state and hidden state at
time :math:`t` is computed using an gating mechanism. Input gate :math:`i_t` is designed to protect the cell
@ -68,18 +68,17 @@ class LSTM(Cell):
input_size (int): Number of features of input.
hidden_size (int): Number of features of hidden layer.
num_layers (int): Number of layers of stacked LSTM . Default: 1.
has_bias (bool): Specifies whether has bias `b_ih` and `b_hh`. Default: True.
has_bias (bool): Whether the cell has bias `b_ih` and `b_hh`. Default: True.
batch_first (bool): Specifies whether the first dimension of input is batch_size. Default: False.
dropout (float, int): If not 0, append `Dropout` layer on the outputs of each
LSTM layer except the last layer. Default 0. The range of dropout is [0.0, 1.0].
bidirectional (bool): Specifies whether this is a bidirectional LSTM. If set True,
number of directions will be 2 otherwise number of directions is 1. Default: False.
bidirectional (bool): Specifies whether it is a bidirectional LSTM. Default: False.
- **input** (Tensor) - Tensor of shape (seq_len, batch_size, `input_size`).
- **hx** (tuple) - A tuple of two Tensors (h_0, c_0) both of data type mindspore.float32 or
mindspore.float16 and shape (num_directions * `num_layers`, batch_size, `hidden_size`).
Data type of `hx` should be the same of `input`.
Data type of `hx` should be the same as `input`.
Tuple, a tuple constains (`output`, (`h_n`, `c_n`)).
@ -205,7 +204,7 @@ class LSTMCell(Cell):
Applies a LSTM layer to the input.
There are two pipelines connecting two consecutive cells in a LSTM model; one is cell state pipeline
and another is hidden state pipeline. Denote two consecutive time nodes as :math:`t-1` and :math:`t`.
and the other is hidden state pipeline. Denote two consecutive time nodes as :math:`t-1` and :math:`t`.
Given an input :math:`x_t` at time :math:`t`, an hidden state :math:`h_{t-1}` and an cell
state :math:`c_{t-1}` of the layer at time :math:`{t-1}`, the cell state and hidden state at
time :math:`t` is computed using an gating mechanism. Input gate :math:`i_t` is designed to protect the cell
@ -238,7 +237,7 @@ class LSTMCell(Cell):
input_size (int): Number of features of input.
hidden_size (int): Number of features of hidden layer.
layer_index (int): index of current layer of stacked LSTM . Default: 0.
has_bias (bool): Specifies whether has bias `b_ih` and `b_hh`. Default: True.
has_bias (bool): Whether the cell has bias `b_ih` and `b_hh`. Default: True.
batch_first (bool): Specifies whether the first dimension of input is batch_size. Default: False.
dropout (float, int): If not 0, append `Dropout` layer on the outputs of each
LSTM layer except the last layer. Default 0. The range of dropout is [0.0, 1.0].

@ -243,6 +243,10 @@ class BatchNorm1d(_BatchNorm):
.. math::
y = \frac{x - \mathrm{E}[x]}{\sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta
The implementation of BatchNorm is different in graph mode and pynative mode, therefore the mode is not
recommended to be changed after net was initilized.
num_features (int): `C` from an expected input of size (N, C).
eps (float): A value added to the denominator for numerical stability. Default: 1e-5.
@ -319,6 +323,10 @@ class BatchNorm2d(_BatchNorm):
.. math::
y = \frac{x - \mathrm{E}[x]}{\sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta
The implementation of BatchNorm is different in graph mode and pynative mode, therefore that mode can not be
changed after net was initilized.
num_features (int): `C` from an expected input of size (N, C, H, W).
eps (float): A value added to the denominator for numerical stability. Default: 1e-5.
@ -384,8 +392,8 @@ class GlobalBatchNorm(_BatchNorm):
Global normalization layer over a N-dimension input.
Global Normalization is cross device synchronized batch normalization. Batch Normalization implementation
only normalize the data within each device. Global normalization will normalize the input within the group.
Global Normalization is cross device synchronized batch normalization. The implementation of Batch Normalization
only normalizes the data within each device. Global normalization will normalize the input within the group.
It has been described in the paper `Batch Normalization: Accelerating Deep Network Training by
Reducing Internal Covariate Shift <https://arxiv.org/abs/1502.03167>`_. It rescales and recenters the
feature using a mini-batch of data and the learned parameters which can be described in the following formula.
@ -467,10 +475,10 @@ class LayerNorm(Cell):
Applies Layer Normalization over a mini-batch of inputs.
Layer normalization is widely used in recurrent neural networks. It applies
normalization over a mini-batch of inputs for each single training case as described
normalization on a mini-batch of inputs for each single training case as described
in the paper `Layer Normalization <https://arxiv.org/pdf/1607.06450.pdf>`_. Unlike batch
normalization, layer normalization performs exactly the same computation at training and
testing times. It can be described using the following formula. It is applied across all channels
testing time. It can be described using the following formula. It is applied across all channels
and pixel but only one batch size.
.. math::
@ -545,7 +553,7 @@ class GroupNorm(Cell):
Group Normalization over a mini-batch of inputs.
Group normalization is widely used in recurrent neural networks. It applies
normalization over a mini-batch of inputs for each single training case as described
normalization on a mini-batch of inputs for each single training case as described
in the paper `Group Normalization <https://arxiv.org/pdf/1803.08494.pdf>`_. Group normalization
divides the channels into groups and computes within each group the mean and variance for normalization,
and it performs very stable over a wide range of batch size. It can be described using the following formula.
@ -557,7 +565,7 @@ class GroupNorm(Cell):
num_groups (int): The number of groups to be divided along the channel dimension.
num_channels (int): The number of channels per group.
eps (float): A value added to the denominator for numerical stability. Default: 1e-5.
affine (bool): A bool value, this layer will has learnable affine parameters when set to true. Default: True.
affine (bool): A bool value, this layer will have learnable affine parameters when set to true. Default: True.
gamma_init (Union[Tensor, str, Initializer, numbers.Number]): Initializer for the gamma weight.
The values of str refer to the function `initializer` including 'zeros', 'ones', 'xavier_uniform',
'he_uniform', etc. Default: 'ones'.

@ -61,7 +61,7 @@ class Conv2dBnAct(Cell):
in_channels (int): The number of input channel :math:`C_{in}`.
out_channels (int): The number of output channel :math:`C_{out}`.
kernel_size (Union[int, tuple]): The data type is int or tuple with 2 integers. Specifies the height
kernel_size (Union[int, tuple]): The data type is int or a tuple of 2 integers. Specifies the height
and width of the 2D convolution window. Single int means the value is for both height and width of
the kernel. A tuple of 2 ints means the first value is for the height and the other is for the
width of the kernel.
@ -292,19 +292,19 @@ class BatchNormFoldCell(Cell):
class FakeQuantWithMinMax(Cell):
Quantization aware op. This OP provide Fake quantization observer function on data with min and max.
Quantization aware op. This OP provides the fake quantization observer function on data with min and max.
min_init (int, float): The dimension of channel or 1(layer). Default: -6.
max_init (int, float): The dimension of channel or 1(layer). Default: 6.
ema (bool): Exponential Moving Average algorithm update min and max. Default: False.
ema (bool): The exponential Moving Average algorithm updates min and max. Default: False.
ema_decay (float): Exponential Moving Average algorithm parameter. Default: 0.999.
per_channel (bool): Quantization granularity based on layer or on channel. Default: False.
channel_axis (int): Quantization by channel axis. Default: 1.
num_channels (int): declarate the min and max channel size, Default: 1.
num_bits (int): The quantization number bit, support 4 and 8bit. Default: 8.
symmetric (bool): The quantization algorithm is symmetric or not. Default: False.
narrow_range (bool): The quantization algorithm uses narrow range or not. Default: False.
num_bits (int): The bit number of quantization, supporting 4 and 8bits. Default: 8.
symmetric (bool): Whether the quantization algorithm is symmetric or not. Default: False.
narrow_range (bool): Whether the quantization algorithm uses narrow range or not. Default: False.
quant_delay (int): Quantization delay parameters according to the global step. Default: 0.
@ -431,7 +431,7 @@ class Conv2dBnFoldQuant(Cell):
variance vector. Default: 'ones'.
fake (bool): Whether Conv2dBnFoldQuant Cell adds FakeQuantWithMinMax op. Default: True.
per_channel (bool): FakeQuantWithMinMax Parameters. Default: False.
num_bits (int): The quantization number bit, support 4 and 8bit. Default: 8.
num_bits (int): The bit number of quantization, supporting 4 and 8bits. Default: 8.
symmetric (bool): The quantization algorithm is symmetric or not. Default: False.
narrow_range (bool): The quantization algorithm uses narrow range or not. Default: False.
quant_delay (int): The Quantization delay parameters according to the global step. Default: 0.
@ -614,7 +614,7 @@ class Conv2dBnWithoutFoldQuant(Cell):
Default: 'normal'.
bias_init (Union[Tensor, str, Initializer, numbers.Number]): Initializer for the bias vector. Default: 'zeros'.
per_channel (bool): FakeQuantWithMinMax Parameters. Default: False.
num_bits (int): The quantization number bit, support 4 and 8bit. Default: 8.
num_bits (int): The bit number of quantization, supporting 4 and 8bits. Default: 8.
symmetric (bool): The quantization algorithm is symmetric or not. Default: False.
narrow_range (bool): The quantization algorithm uses narrow range or not. Default: False.
quant_delay (int): Quantization delay parameters according to the global step. Default: 0.
@ -736,7 +736,7 @@ class Conv2dQuant(Cell):
Default: 'normal'.
bias_init (Union[Tensor, str, Initializer, numbers.Number]): Initializer for the bias vector. Default: 'zeros'.
per_channel (bool): FakeQuantWithMinMax Parameters. Default: False.
num_bits (int): The quantization number bit, support 4 and 8bit. Default: 8.
num_bits (int): The bit number of quantization, supporting 4 and 8bits. Default: 8.
symmetric (bool): The quantization algorithm is symmetric or not. Default: False.
narrow_range (bool): The quantization algorithm uses narrow range or not. Default: False.
quant_delay (int): Quantization delay parameters according to the global step. Default: 0.
@ -845,7 +845,7 @@ class DenseQuant(Cell):
has_bias (bool): Specifies whether the layer uses a bias vector. Default: True.
activation (str): The regularization function applied to the output of the layer, eg. 'relu'. Default: None.
per_channel (bool): FakeQuantWithMinMax Parameters. Default: False.
num_bits (int): The quantization number bit, support 4 and 8bit. Default: 8.
num_bits (int): The bit number of quantization, supporting 4 and 8bits. Default: 8.
symmetric (bool): The quantization algorithm is symmetric or not. Default: False.
narrow_range (bool): The quantization algorithm uses narrow range or not. Default: False.
quant_delay (int): Quantization delay parameters according to the global step. Default: 0.
@ -947,15 +947,14 @@ class ActQuant(_QuantActivation):
Quantization aware training activation function.
Add Fake Quant OP after activation. Not Recommand to used these cell for Fake Quant Op
Will climp the max range of the activation and the relu6 do the same operation.
This part is a more detailed overview of ReLU6 op.
Add the fake quant op to the end of activation op, by which the output of activation op will be truncated.
Please check `FakeQuantWithMinMax` for more details.
activation (Cell): Activation cell class.
ema_decay (float): Exponential Moving Average algorithm parameter. Default: 0.999.
per_channel (bool): Quantization granularity based on layer or on channel. Default: False.
num_bits (int): The quantization number bit, support 4 and 8bit. Default: 8.
num_bits (int): The bit number of quantization, supporting 4 and 8bits. Default: 8.
symmetric (bool): The quantization algorithm is symmetric or not. Default: False.
narrow_range (bool): The quantization algorithm uses narrow range or not. Default: False.
quant_delay (int): Quantization delay parameters according to the global steps. Default: 0.
@ -1010,7 +1009,7 @@ class LeakyReLUQuant(_QuantActivation):
activation (Cell): Activation cell class.
ema_decay (float): Exponential Moving Average algorithm parameter. Default: 0.999.
per_channel (bool): Quantization granularity based on layer or on channel. Default: False.
num_bits (int): The quantization number bit, support 4 and 8bit. Default: 8.
num_bits (int): The bit number of quantization, supporting 4 and 8bits. Default: 8.
symmetric (bool): The quantization algorithm is symmetric or not. Default: False.
narrow_range (bool): The quantization algorithm uses narrow range or not. Default: False.
quant_delay (int): Quantization delay parameters according to the global step. Default: 0.
@ -1080,9 +1079,9 @@ class HSwishQuant(_QuantActivation):
activation (Cell): Activation cell class.
ema_decay (float): Exponential Moving Average algorithm parameter. Default: 0.999.
per_channel (bool): Quantization granularity based on layer or on channel. Default: False.
num_bits (int): The quantization number bit, support 4 and 8bit. Default: 8.
symmetric (bool): The quantization algorithm is symmetric or not. Default: False.
narrow_range (bool): The quantization algorithm uses narrow range or not. Default: False.
num_bits (int): The bit number of quantization, supporting 4 and 8bits. Default: 8.
symmetric (bool): Whether the quantization algorithm is symmetric or not. Default: False.
narrow_range (bool): Whether the quantization algorithm uses narrow range or not. Default: False.
quant_delay (int): Quantization delay parameters according to the global step. Default: 0.
@ -1149,9 +1148,9 @@ class HSigmoidQuant(_QuantActivation):
activation (Cell): Activation cell class.
ema_decay (float): Exponential Moving Average algorithm parameter. Default: 0.999.
per_channel (bool): Quantization granularity based on layer or on channel. Default: False.
num_bits (int): The quantization number bit, support 4 and 8bit. Default: 8.
symmetric (bool): The quantization algorithm is symmetric or not. Default: False.
narrow_range (bool): The quantization algorithm uses narrow range or not. Default: False.
num_bits (int): The bit number of quantization, supporting 4 and 8bits. Default: 8.
symmetric (bool): Whether the quantization algorithm is symmetric or not. Default: False.
narrow_range (bool): Whether the quantization algorithm uses narrow range or not. Default: False.
quant_delay (int): Quantization delay parameters according to the global step. Default: 0.
@ -1217,7 +1216,7 @@ class TensorAddQuant(Cell):
ema_decay (float): Exponential Moving Average algorithm parameter. Default: 0.999.
per_channel (bool): Quantization granularity based on layer or on channel. Default: False.
num_bits (int): The quantization number bit, support 4 and 8bit. Default: 8.
num_bits (int): The bit number of quantization, supporting 4 and 8bits. Default: 8.
symmetric (bool): The quantization algorithm is symmetric or not. Default: False.
narrow_range (bool): The quantization algorithm uses narrow range or not. Default: False.
quant_delay (int): Quantization delay parameters according to the global step. Default: 0.
@ -1269,7 +1268,7 @@ class MulQuant(Cell):
ema_decay (float): Exponential Moving Average algorithm parameter. Default: 0.999.
per_channel (bool): Quantization granularity based on layer or on channel. Default: False.
num_bits (int): The quantization number bit, support 4 and 8bit. Default: 8.
num_bits (int): The bit number of quantization, supporting 4 and 8bits. Default: 8.
symmetric (bool): The quantization algorithm is symmetric or not. Default: False.
narrow_range (bool): The quantization algorithm uses narrow range or not. Default: False.
quant_delay (int): Quantization delay parameters according to the global step. Default: 0.

@ -80,7 +80,7 @@ class L1Loss(_Loss):
When argument reduction is 'sum', the sum of :math:`L(x, y)` will be returned. :math:`N` is the batch size.
reduction (str): Type of reduction to apply to loss. The optional values are "mean", "sum", "none".
reduction (str): Type of reduction to be applied to loss. The optional values are "mean", "sum", and "none".
Default: "mean".
@ -107,7 +107,7 @@ class L1Loss(_Loss):
class MSELoss(_Loss):
MSELoss create a criterion to measures the mean squared error (squared L2-norm) between :math:`x` and :math:`y`
MSELoss creates a criterion to measure the mean squared error (squared L2-norm) between :math:`x` and :math:`y`
by element, where :math:`x` is the input and :math:`y` is the target.
For simplicity, let :math:`x` and :math:`y` be 1-dimensional Tensor with length :math:`N`,
@ -120,7 +120,7 @@ class MSELoss(_Loss):
When argument reduction is 'sum', the sum of :math:`L(x, y)` will be returned. :math:`N` is the batch size.
reduction (str): Type of reduction to apply to loss. The optional values are "mean", "sum", "none".
reduction (str): Type of reduction to be applied to loss. The optional values are "mean", "sum", and "none".
Default: "mean".
@ -210,14 +210,14 @@ class SoftmaxCrossEntropyWithLogits(_Loss):
While the target classes are mutually exclusive, i.e., only one class is positive in the target, the predicted
probabilities need not be exclusive. All that is required is that the predicted probability distribution
probabilities need not to be exclusive. It is only required that the predicted probability distribution
of entry is a valid one.
is_grad (bool): Specifies whether calculate grad only. Default: True.
sparse (bool): Specifies whether labels use sparse format or not. Default: False.
reduction (Union[str, None]): Type of reduction to apply to loss. Support 'sum' or 'mean' If None,
do not reduction. Default: None.
reduction (Union[str, None]): Type of reduction to be applied to loss. Support 'sum' and 'mean'. If None,
do not perform reduction. Default: None.
smooth_factor (float): Label smoothing factor. It is a optional input which should be in range [0, 1].
Default: 0.
num_classes (int): The number of classes in the task. It is a optional input Default: 2.
@ -225,7 +225,7 @@ class SoftmaxCrossEntropyWithLogits(_Loss):
- **logits** (Tensor) - Tensor of shape (N, C).
- **labels** (Tensor) - Tensor of shape (N, ). If `sparse` is True, The type of
`labels` is mindspore.int32. If `sparse` is False, the type of `labels` is same as the type of `logits`.
`labels` is mindspore.int32. If `sparse` is False, the type of `labels` is the same as the type of `logits`.
Tensor, a tensor of the same shape as logits with the component-wise
@ -282,8 +282,8 @@ class SoftmaxCrossEntropyExpand(Cell):
where :math:`x_i` is a 1D score Tensor, :math:`t_i` is the target class.
When argument sparse is set to True, the format of label is the index
range from :math:`0` to :math:`C - 1` instead of one-hot vectors.
When argument sparse is set to True, the format of the label is the index
ranging from :math:`0` to :math:`C - 1` instead of one-hot vectors.
sparse(bool): Specifies whether labels use sparse format or not. Default: False.

@ -69,7 +69,7 @@ def names():
def get_metric_fn(name, *args, **kwargs):
Gets the metric method base on the input name.
Gets the metric method based on the input name.
name (str): The name of metric method. Refer to the '__factory__'

@ -82,7 +82,7 @@ class Metric(metaclass=ABCMeta):
def clear(self):
A interface describes the behavior of clearing the internal evaluation result.
An interface describes the behavior of clearing the internal evaluation result.
All subclasses should override this interface.
@ -92,7 +92,7 @@ class Metric(metaclass=ABCMeta):
def eval(self):
A interface describes the behavior of computing the evaluation result.
An interface describes the behavior of computing the evaluation result.
All subclasses should override this interface.
@ -102,7 +102,7 @@ class Metric(metaclass=ABCMeta):
def update(self, *inputs):
A interface describes the behavior of updating the internal evaluation result.
An interface describes the behavior of updating the internal evaluation result.
All subclasses should override this interface.

@ -36,8 +36,8 @@ def _update_run_op(beta1, beta2, eps, lr, weight_decay, param, m, v, gradient, d
Update parameters.
beta1 (Tensor): The exponential decay rate for the 1st moment estimates. Should be in range (0.0, 1.0).
beta2 (Tensor): The exponential decay rate for the 2nd moment estimates. Should be in range (0.0, 1.0).
beta1 (Tensor): The exponential decay rate for the 1st moment estimations. Should be in range (0.0, 1.0).
beta2 (Tensor): The exponential decay rate for the 2nd moment estimations. Should be in range (0.0, 1.0).
eps (Tensor): Term added to the denominator to improve numerical stability. Should be greater than 0.
lr (Tensor): Learning rate.
weight_decay (Number): Weight decay. Should be equal to or greater than 0.
@ -180,12 +180,12 @@ class Adam(Optimizer):
the order will be followed in the optimizer. There are no other keys in the `dict` and the parameters
which in the 'order_params' should be in one of group parameters.
learning_rate (Union[float, Tensor, Iterable, LearningRateSchedule]): A value or graph for the learning rate.
When the learning_rate is a Iterable or a Tensor with dimension of 1, use the dynamic learning rate, then
learning_rate (Union[float, Tensor, Iterable, LearningRateSchedule]): A value or a graph for the learning rate.
When the learning_rate is an Iterable or a Tensor in a 1D dimension, use the dynamic learning rate, then
the i-th step will take the i-th value as the learning rate. When the learning_rate is LearningRateSchedule,
use dynamic learning rate, the i-th learning rate will be calculated during the process of training
according to the formula of LearningRateSchedule. When the learning_rate is a float or a Tensor with
dimension of 0, use fixed learning rate. Other cases are not supported. The float learning rate should be
according to the formula of LearningRateSchedule. When the learning_rate is a float or a Tensor in a zero
dimension, use fixed learning rate. Other cases are not supported. The float learning rate should be
equal to or greater than 0. If the type of `learning_rate` is int, it will be converted to float.
Default: 1e-3.
beta1 (float): The exponential decay rate for the 1st moment estimations. Should be in range (0.0, 1.0).
@ -195,11 +195,11 @@ class Adam(Optimizer):
eps (float): Term added to the denominator to improve numerical stability. Should be greater than 0. Default:
use_locking (bool): Whether to enable a lock to protect updating variable tensors.
If True, updating of the var, m, and v tensors will be protected by a lock.
If False, the result is unpredictable. Default: False.
If true, updates of the var, m, and v tensors will be protected by a lock.
If false, the result is unpredictable. Default: False.
use_nesterov (bool): Whether to use Nesterov Accelerated Gradient (NAG) algorithm to update the gradients.
If True, update the gradients using NAG.
If False, update the gradients without using NAG. Default: False.
If true, update the gradients using NAG.
If false, update the gradients without using NAG. Default: False.
weight_decay (float): Weight decay (L2 penalty). It should be equal to or greater than 0. Default: 0.0.
loss_scale (float): A floating point value for the loss scale. Should be greater than 0. Default: 1.0.
@ -304,12 +304,12 @@ class AdamWeightDecay(Optimizer):
the order will be followed in the optimizer. There are no other keys in the `dict` and the parameters
which in the 'order_params' should be in one of group parameters.
learning_rate (Union[float, Tensor, Iterable, LearningRateSchedule]): A value or graph for the learning rate.
When the learning_rate is a Iterable or a Tensor with dimension of 1, use the dynamic learning rate, then
learning_rate (Union[float, Tensor, Iterable, LearningRateSchedule]): A value or a graph for the learning rate.
When the learning_rate is an Iterable or a Tensor in a 1D dimension, use the dynamic learning rate, then
the i-th step will take the i-th value as the learning rate. When the learning_rate is LearningRateSchedule,
use dynamic learning rate, the i-th learning rate will be calculated during the process of training
according to the formula of LearningRateSchedule. When the learning_rate is a float or a Tensor with
dimension of 0, use fixed learning rate. Other cases are not supported. The float learning rate should be
according to the formula of LearningRateSchedule. When the learning_rate is a float or a Tensor in a zero
dimension, use fixed learning rate. Other cases are not supported. The float learning rate should be
equal to or greater than 0. If the type of `learning_rate` is int, it will be converted to float.
Default: 1e-3.
beta1 (float): The exponential decay rate for the 1st moment estimations. Default: 0.9.

@ -114,12 +114,12 @@ class FTRL(Optimizer):
than or equal to zero. Use fixed learning rate if lr_power is zero. Default: -0.5.
l1 (float): l1 regularization strength, must be greater than or equal to zero. Default: 0.0.
l2 (float): l2 regularization strength, must be greater than or equal to zero. Default: 0.0.
use_locking (bool): If True use locks for update operation. Default: False.
use_locking (bool): If True, use locks for updating operation. Default: False.
loss_scale (float): Value for the loss scale. It should be equal to or greater than 1.0. Default: 1.0.
weight_decay (float): Weight decay value to multiply weight, must be zero or positive value. Default: 0.0.
- **grads** (tuple[Tensor]) - The gradients of `params` in optimizer, the shape is as same as the `params`
- **grads** (tuple[Tensor]) - The gradients of `params` in the optimizer, the shape is the same as the `params`
in optimizer.

@ -39,8 +39,8 @@ def _update_run_op(beta1, beta2, eps, global_step, lr, weight_decay, param, m, v
Update parameters.
beta1 (Tensor): The exponential decay rate for the 1st moment estimates. Should be in range (0.0, 1.0).
beta2 (Tensor): The exponential decay rate for the 2nd moment estimates. Should be in range (0.0, 1.0).
beta1 (Tensor): The exponential decay rate for the 1st moment estimations. Should be in range (0.0, 1.0).
beta2 (Tensor): The exponential decay rate for the 2nd moment estimations. Should be in range (0.0, 1.0).
eps (Tensor): Term added to the denominator to improve numerical stability. Should be greater than 0.
lr (Tensor): Learning rate.
weight_decay (Number): Weight decay. Should be equal to or greater than 0.
@ -122,8 +122,8 @@ def _update_run_op_graph_kernel(beta1, beta2, eps, global_step, lr, weight_decay
Update parameters.
beta1 (Tensor): The exponential decay rate for the 1st moment estimates. Should be in range (0.0, 1.0).
beta2 (Tensor): The exponential decay rate for the 2nd moment estimates. Should be in range (0.0, 1.0).
beta1 (Tensor): The exponential decay rate for the 1st moment estimations. Should be in range (0.0, 1.0).
beta2 (Tensor): The exponential decay rate for the 2nd moment estimations. Should be in range (0.0, 1.0).
eps (Tensor): Term added to the denominator to improve numerical stability. Should be greater than 0.
lr (Tensor): Learning rate.
weight_decay (Number): Weight decay. Should be equal to or greater than 0.
@ -184,7 +184,7 @@ def _check_param_value(beta1, beta2, eps, prim_name):
class Lamb(Optimizer):
Lamb Dynamic LR.
Lamb Dynamic Learning Rate.
LAMB is an optimization algorithm employing a layerwise adaptive large batch
optimization technique. Refer to the paper `LARGE BATCH OPTIMIZATION FOR DEEP LEARNING: TRAINING BERT IN 76
@ -214,16 +214,16 @@ class Lamb(Optimizer):
the order will be followed in optimizer. There are no other keys in the `dict` and the parameters which
in the value of 'order_params' should be in one of group parameters.
learning_rate (Union[float, Tensor, Iterable, LearningRateSchedule]): A value or graph for the learning rate.
When the learning_rate is a Iterable or a Tensor with dimension of 1, use dynamic learning rate, then
learning_rate (Union[float, Tensor, Iterable, LearningRateSchedule]): A value or a graph for the learning rate.
When the learning_rate is an Iterable or a Tensor in a 1D dimension, use dynamic learning rate, then
the i-th step will take the i-th value as the learning rate. When the learning_rate is LearningRateSchedule,
use dynamic learning rate, the i-th learning rate will be calculated during the process of training
according to the formula of LearningRateSchedule. When the learning_rate is a float or a Tensor with
dimension of 0, use fixed learning rate. Other cases are not supported. The float learning rate should be
according to the formula of LearningRateSchedule. When the learning_rate is a float or a Tensor in a zero
dimension, use fixed learning rate. Other cases are not supported. The float learning rate should be
equal to or greater than 0. If the type of `learning_rate` is int, it will be converted to float.
beta1 (float): The exponential decay rate for the 1st moment estimates. Default: 0.9.
beta1 (float): The exponential decay rate for the 1st moment estimations. Default: 0.9.
Should be in range (0.0, 1.0).
beta2 (float): The exponential decay rate for the 2nd moment estimates. Default: 0.999.
beta2 (float): The exponential decay rate for the 2nd moment estimations. Default: 0.999.
Should be in range (0.0, 1.0).
eps (float): Term added to the denominator to improve numerical stability. Default: 1e-6.
Should be greater than 0.

@ -58,12 +58,12 @@ class LARS(Optimizer):
epsilon (float): Term added to the denominator to improve numerical stability. Default: 1e-05.
coefficient (float): Trust coefficient for calculating the local learning rate. Default: 0.001.
use_clip (bool): Whether to use clip operation for calculating the local learning rate. Default: False.
lars_filter (Function): A function to determine whether apply lars algorithm. Default:
lars_filter (Function): A function to determine whether apply the LARS algorithm. Default:
lambda x: 'LayerNorm' not in x.name and 'bias' not in x.name.
- **gradients** (tuple[Tensor]) - The gradients of `params` in optimizer, the shape is
as same as the `params` in optimizer.
- **gradients** (tuple[Tensor]) - The gradients of `params` in the optimizer, the shape is the
as same as the `params` in the optimizer.
Union[Tensor[bool], tuple[Parameter]], it depends on the output of `optimizer`.

@ -127,26 +127,26 @@ class LazyAdam(Optimizer):
the order will be followed in optimizer. There are no other keys in the `dict` and the parameters which
in the value of 'order_params' should be in one of group parameters.
learning_rate (Union[float, Tensor, Iterable, LearningRateSchedule]): A value or graph for the learning rate.
When the learning_rate is a Iterable or a Tensor with dimension of 1, use dynamic learning rate, then
learning_rate (Union[float, Tensor, Iterable, LearningRateSchedule]): A value or a graph for the learning rate.
When the learning_rate is an Iterable or a Tensor in a 1D dimension, use dynamic learning rate, then
the i-th step will take the i-th value as the learning rate. When the learning_rate is LearningRateSchedule,
use dynamic learning rate, the i-th learning rate will be calculated during the process of training
according to the formula of LearningRateSchedule. When the learning_rate is a float or a Tensor with
dimension of 0, use fixed learning rate. Other cases are not supported. The float learning rate should be
according to the formula of LearningRateSchedule. When the learning_rate is a float or a Tensor in a zero
dimension, use fixed learning rate. Other cases are not supported. The float learning rate should be
equal to or greater than 0. If the type of `learning_rate` is int, it will be converted to float.
Default: 1e-3.
beta1 (float): The exponential decay rate for the 1st moment estimates. Should be in range (0.0, 1.0). Default:
beta2 (float): The exponential decay rate for the 2nd moment estimates. Should be in range (0.0, 1.0). Default:
beta1 (float): The exponential decay rate for the 1st moment estimations. Should be in range (0.0, 1.0).
Default: 0.9.
beta2 (float): The exponential decay rate for the 2nd moment estimations. Should be in range (0.0, 1.0).
Default: 0.999.
eps (float): Term added to the denominator to improve numerical stability. Should be greater than 0. Default:
use_locking (bool): Whether to enable a lock to protect updating variable tensors.
If True, updating of the var, m, and v tensors will be protected by a lock.
If False, the result is unpredictable. Default: False.
If true, updates of the var, m, and v tensors will be protected by a lock.
If false, the result is unpredictable. Default: False.
use_nesterov (bool): Whether to use Nesterov Accelerated Gradient (NAG) algorithm to update the gradients.
If True, updates the gradients using NAG.
If False, updates the gradients without using NAG. Default: False.
If true, update the gradients using NAG.
If true, update the gradients without using NAG. Default: False.
weight_decay (float): Weight decay (L2 penalty). Default: 0.0.
loss_scale (float): A floating point value for the loss scale. Should be equal to or greater than 1. Default:

@ -83,12 +83,12 @@ class Momentum(Optimizer):
the order will be followed in optimizer. There are no other keys in the `dict` and the parameters which
in the value of 'order_params' should be in one of group parameters.
learning_rate (Union[float, Tensor, Iterable, LearningRateSchedule]): A value or graph for the learning rate.
When the learning_rate is a Iterable or a Tensor with dimension of 1, use dynamic learning rate, then
learning_rate (Union[float, Tensor, Iterable, LearningRateSchedule]): A value or a graph for the learning rate.
When the learning_rate is an Iterable or a Tensor in a 1D dimension, use dynamic learning rate, then
the i-th step will take the i-th value as the learning rate. When the learning_rate is LearningRateSchedule,
use dynamic learning rate, the i-th learning rate will be calculated during the process of training
according to the formula of LearningRateSchedule. When the learning_rate is a float or a Tensor with
dimension of 0, use fixed learning rate. Other cases are not supported. The float learning rate should be
according to the formula of LearningRateSchedule. When the learning_rate is a float or a Tensor in a zero
dimension, use fixed learning rate. Other cases are not supported. The float learning rate should be
equal to or greater than 0. If the type of `learning_rate` is int, it will be converted to float.
momentum (float): Hyperparameter of type float, means momentum for the moving average.
It should be at least 0.0.

@ -40,8 +40,6 @@ class Optimizer(Cell):
Base class for all optimizers.
This class defines the API to add Ops to train a model.
This class defines the API to add Ops to train a model. Never use
this class directly, but instead instantiate one of its subclasses.
@ -55,12 +53,12 @@ class Optimizer(Cell):
To improve parameter groups performance, the customized order of parameters can be supported.
learning_rate (Union[float, Tensor, Iterable, LearningRateSchedule]): A value or graph for the learning
rate. When the learning_rate is a Iterable or a Tensor with dimension of 1, use dynamic learning rate, then
learning_rate (Union[float, Tensor, Iterable, LearningRateSchedule]): A value or a graph for the learning
rate. When the learning_rate is an Iterable or a Tensor in a 1D dimension, use dynamic learning rate, then
the i-th step will take the i-th value as the learning rate. When the learning_rate is LearningRateSchedule,
use dynamic learning rate, the i-th learning rate will be calculated during the process of training
according to the formula of LearningRateSchedule. When the learning_rate is a float or a Tensor with
dimension of 0, use fixed learning rate. Other cases are not supported. The float learning rate should be
according to the formula of LearningRateSchedule. When the learning_rate is a float or a Tensor in a zero
dimension, use fixed learning rate. Other cases are not supported. The float learning rate should be
equal to or greater than 0. If the type of `learning_rate` is int, it will be converted to float.
parameters (Union[list[Parameter], list[dict]]): When the `parameters` is a list of `Parameter` which will be
updated, the element in `parameters` should be class `Parameter`. When the `parameters` is a list of `dict`,
@ -84,8 +82,8 @@ class Optimizer(Cell):
type of `loss_scale` input is int, it will be converted to float. Default: 1.0.
ValueError: If the learning_rate is a Tensor, but the dims of tensor is greater than 1.
TypeError: If the learning_rate is not any of the three types: float, Tensor, Iterable.
ValueError: If the learning_rate is a Tensor, but the dimension of tensor is greater than 1.
TypeError: If the learning_rate is not any of the three types: float, Tensor, nor Iterable.
def __init__(self, learning_rate, parameters, weight_decay=0.0, loss_scale=1.0):
@ -179,7 +177,7 @@ class Optimizer(Cell):
An approach to reduce the overfitting of a deep learning neural network model.
gradients (tuple[Tensor]): The gradients of `self.parameters`, and have the same shape with
gradients (tuple[Tensor]): The gradients of `self.parameters`, and have the same shape as
@ -204,7 +202,7 @@ class Optimizer(Cell):
gradients (tuple[Tensor]): The gradients of `self.parameters`, and have the same shape with
gradients (tuple[Tensor]): The gradients of `self.parameters`, and have the same shape as

Some files were not shown because too many files have changed in this diff Show More
