Merge branch 'develop' into cross_entropy_over_beam

revert-3824-remove_grad_op_type
caoying03 8 years ago
commit 3d1b87193d

@ -51,7 +51,7 @@ ExternalProject_Add(
${EXTERNAL_PROJECT_LOG_ARGS}
DEPENDS ${MKLDNN_DEPENDS}
GIT_REPOSITORY "https://github.com/01org/mkl-dnn.git"
GIT_TAG "v0.9"
GIT_TAG "v0.10"
PREFIX ${MKLDNN_SOURCES_DIR}
UPDATE_COMMAND ""
CMAKE_ARGS -DCMAKE_INSTALL_PREFIX=${MKLDNN_INSTALL_DIR}

@ -28,7 +28,7 @@ INCLUDE(ExternalProject)
SET(MKLML_PROJECT "extern_mklml")
SET(MKLML_VER "mklml_lnx_2018.0.20170720")
SET(MKLML_URL "https://github.com/01org/mkl-dnn/releases/download/v0.9/${MKLML_VER}.tgz")
SET(MKLML_URL "https://github.com/01org/mkl-dnn/releases/download/v0.10/${MKLML_VER}.tgz")
SET(MKLML_SOURCE_DIR "${THIRD_PARTY_PATH}/mklml")
SET(MKLML_DOWNLOAD_DIR "${MKLML_SOURCE_DIR}/src/${MKLML_PROJECT}")
SET(MKLML_DST_DIR "mklml")
@ -54,7 +54,8 @@ ExternalProject_Add(
${EXTERNAL_PROJECT_LOG_ARGS}
PREFIX ${MKLML_SOURCE_DIR}
DOWNLOAD_DIR ${MKLML_DOWNLOAD_DIR}
DOWNLOAD_COMMAND wget --no-check-certificate -qO- ${MKLML_URL} | tar xz -C ${MKLML_DOWNLOAD_DIR}
DOWNLOAD_COMMAND wget --no-check-certificate ${MKLML_URL} -c -q -O ${MKLML_VER}.tgz
&& tar zxf ${MKLML_VER}.tgz
DOWNLOAD_NO_PROGRESS 1
UPDATE_COMMAND ""
CMAKE_ARGS -DCMAKE_INSTALL_PREFIX=${MKLML_INSTALL_ROOT}

@ -1,11 +0,0 @@
关于PaddlePaddle
================
PaddlePaddle是一个最早由百度科学家和工程师共同研发的并行分布式深度学习平台兼备易用性、高效性、灵活性和可扩展性目前已被百度内部多个产品线广泛使用。
PaddlePaddle目前已经开放源码, 但是远未完善,我们希望能在这个基础上不断的改进、扩展和延伸。
同时我们希望广大开发者积极提供反馈和贡献源代码,建立一个活跃的开源社区。
致谢
--------
在此特别感谢PaddlePaddle的[所有贡献者](https://github.com/PaddlePaddle/Paddle/graphs/contributors)。

@ -1,14 +0,0 @@
ABOUT
=======
PaddlPaddle is an easy-to-use, efficient, flexible and scalable deep learning platform,
which is originally developed by Baidu scientists and engineers for the purpose of applying deep learning to many products at Baidu.
PaddlePaddle is now open source but far from complete, which is intended to be built upon, improved, scaled, and extended.
We hope to build an active open source community both by providing feedback and by actively contributing to the source code.
Credits
--------
We owe many thanks to `all contributors and developers <https://github.com/PaddlePaddle/Paddle/graphs/contributors>`_ of PaddlePaddle!

@ -419,9 +419,14 @@ multi_binary_label_cross_entropy_cost
.. autoclass:: paddle.v2.layer.multi_binary_label_cross_entropy_cost
:noindex:
huber_cost
----------
.. autoclass:: paddle.v2.layer.huber_cost
huber_regression_cost
-------------------------
.. autoclass:: paddle.v2.layer.huber_regression_cost
:noindex:
huber_classification_cost
-------------------------
.. autoclass:: paddle.v2.layer.huber_classification_cost
:noindex:
lambda_cost

@ -6,14 +6,12 @@
安装流程
++++++++
PaddlePaddle提供数个预编译的二进制来进行安装包括Docker镜像ubuntu的deb安装包等。我们推荐使用Docker镜像来部署环境,同时欢迎贡献更多的安装包
PaddlePaddle提供Docker镜像来部署环境。
.. toctree::
:maxdepth: 1
docker_install_cn.rst
ubuntu_install_cn.rst
编译流程

@ -8,14 +8,13 @@ Install PaddlePaddle
:maxdepth: 1
docker_install_en.rst
ubuntu_install_en.rst
Build from Source
-----------------
.. warning::
Please use :code:`deb` package or :code:`docker` image to install paddle. The building guide is used for hacking or contributing PaddlePaddle source code.
Please use :code:`docker` image to install paddle. The building guide is used for hacking or contributing PaddlePaddle source code.
.. toctree::
:maxdepth: 1

@ -1,71 +0,0 @@
Ubuntu部署PaddlePaddle
===================================
PaddlePaddle提供了ubuntu 14.04 deb安装包。
安装
------
安装包的下载地址是\: https://github.com/PaddlePaddle/Paddle/releases
它包含四个版本\:
* cpu版本: 支持主流x86处理器平台, 使用了avx指令集。
* cpu-noavx版本支持主流x86处理器平台没有使用avx指令集。
* gpu版本支持主流x86处理器平台支持nvidia cuda平台使用了avx指令集。
* gpu-noavx版本支持主流x86处理器平台支持nvidia cuda平台没有使用avx指令集。
下载完相关安装包后,执行:
.. code-block:: shell
sudo apt-get install gdebi
gdebi paddle-*-cpu.deb
或者:
.. code-block:: shell
dpkg -i paddle-*-cpu.deb
apt-get install -f
:code:`dpkg -i` 的时候如果报一些依赖未找到的错误是正常的,
:code:`apt-get install -f` 里会继续安装 PaddlePaddle。
安装完成后,可以使用命令 :code:`paddle version` 查看安装后的paddle 版本:
.. code-block:: shell
PaddlePaddle 0.8.0b1, compiled with
with_avx: ON
with_gpu: OFF
with_double: OFF
with_python: ON
with_rdma: OFF
with_timer: OFF
with_predict_sdk:
可能遇到的问题
--------------
libcudart.so/libcudnn.so找不到
++++++++++++++++++++++++++++++
安装完成后,运行 :code:`paddle train` 报错\:
.. code-block:: shell
0831 12:36:04.151525 1085 hl_dso_loader.cc:70] Check failed: nullptr != *dso_handle For Gpu version of PaddlePaddle, it couldn't find CUDA library: libcudart.so Please make sure you already specify its path.Note: for training data on Cpu using Gpu version of PaddlePaddle,you must specify libcudart.so via LD_LIBRARY_PATH.
原因是未设置cuda运行时环境变量。 如果使用GPU版本的PaddlePaddle请安装CUDA 7.5 和CUDNN 5到本地环境中并设置
.. code-block:: shell
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/lib:$LD_LIBRARY_PATH
export PATH=/usr/local/cuda/bin:$PATH

@ -1,25 +0,0 @@
Debian Package installation guide
=================================
PaddlePaddle supports :code:`deb` pacakge. The installation of this :code:`deb` package is tested in ubuntu 14.04, but it should be support other debian based linux, too.
There are four versions of debian package, :code:`cpu`, :code:`gpu`, :code:`cpu-noavx`, :code:`gpu-noavx`. And :code:`noavx` version is used to support CPU which does not contain :code:`AVX` instructions. The download url of :code:`deb` package is \: https://github.com/baidu/Paddle/releases/
After downloading PaddlePaddle deb packages, you can use :code:`gdebi` install.
.. code-block:: bash
gdebi paddle-*.deb
If :code:`gdebi` is not installed, you can use :code:`sudo apt-get install gdebi` to install it.
Or you can use following commands to install PaddlePaddle.
.. code-block:: bash
dpkg -i paddle-*.deb
apt-get install -f
And if you use GPU version deb package, you need to install CUDA toolkit and cuDNN, and set related environment variables(such as LD_LIBRARY_PATH) first. It is normal when `dpkg -i` get errors. `apt-get install -f` will continue install paddle, and install dependences.

@ -5,12 +5,13 @@
- [定义ProtoMaker类](#定义ProtoMaker类)
- [定义Operator类](#定义Operator类)
- [定义OpKernel类](#定义OpKernel类)
- [注册类](#注册类)
- [注册Operator](#注册Operator)
- [编译](#编译)
- [绑定Python](#绑定Python)
- [实现单元测试](#实现单元测试)
- [前向Operator单测](#前向Operator单测)
- [反向Operator单测](#反向Operator单测)
- [编译和执行](#编译和执行)
## 概念简介
@ -22,19 +23,17 @@
- `framework::OperatorWithKernel`继承自OperatorBaseOp有计算函数称作有Kernel。
- `class OpProtoAndCheckerMaker`描述该Op的输入、输出、属性、注释,主要用于Python API接口生成
依据是否包含kernel将Op分为两种包含Kernel的Op和不包含kernel的Op前者Op的定义继承自`OperatorBase`,后者继承自`OperatorWithKernel`。本教程主要介绍带Kernel的Op如何写简单总结如下
依据是否包含kernel将Op分为两种包含Kernel的Op和不包含kernel的Op前者Op的定义继承自`OperatorBase`,后者继承自`OperatorWithKernel`。本教程主要介绍带Kernel的Op如何写简单总结Op需要包含的内容如下:
Forward Op需要包含
- OpProtoMake定义
- Op定义
- Kernel实现
内容 | 定义位置
-------------- | :----------------------
OpProtoMake定义 | `.cc`文件Backward Op不需要定义OpProtoMake
Op定义 | `.cc`文件
Kernel实现 | CPU、GPU共享Kernel在`.h`文件否则CPU可以在`.cc`文件GPU可在`.cu`文件。
注册Op | Op注册在`.cc`文件Kernel注册CPU在`.cc`文件GPU在`.cu`文件
与之对应的Backward Op包含
- Op定义
- Kernel实现
下面以矩阵乘操作,即[MulOp](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/mul_op.cc)为例来介绍如何写带Kernel的Operator。
@ -137,8 +136,9 @@ MulOp(const std::string &type, const framework::VariableNameMap &inputs,
```
还需要重写`InferShape`接口。`InferShape`为const函数不能修改Op的成员变量参数为`const framework::InferShapeContext &ctx`,通过该参数可获取到输入输出以及属性。它的功能是:
- 1). 做检查, 尽早报错:检查输入数据维度、类型等是否合法
- 2). 设置输出Tensor的形状
- 1). 做检查, 尽早报错:检查输入数据维度、类型等是否合法。
- 2). 设置输出Tensor的形状。
通常`OpProtoMaker`和`Op`类的定义写在`.cc`文件中,和要讲到的注册函数一起放在`.cc`中
@ -172,7 +172,7 @@ class MulKernel : public framework::OpKernel {
到此前向Op实现完成需要在`.cc`文件中注册该op和kernel。反向Op类的定义和Kernel定义与前向Op类似这里不再重复。但注意反向Op没有`ProtoMaker`。
### 4. 注册
### 4. 注册Operator
在`.cc`文件中注册前向、反向Op类注册CPU Kernel。
@ -297,4 +297,28 @@ class TestMulOp(unittest.TestCase):
- 调用`create_op("mul")`创建反向Op对应的前向Op。
- 定义输入`inputs`。
- 调用`compare_grad`函数对比CPU、GPU计算结果。
- 调用`check_grad`检查梯度稳定性。
- 调用`check_grad`检查梯度稳定性,这里采用数值法检测梯度正确性。
- 第一个参数`op` : 前向op。
- 第二个参数`inputs` : 输入词典词典的Key和`ProtoMaker`定义保持一致。
- 第三个参数`set(["X", "Y"])` : 指定对输入变量`X`、`Y`做梯度检测。
- 第四个参数`"Out"` : 指定前向网络最终的输出目标变量`Out`
### 编译和执行
单测完成之后,在[`python/paddle/v2/framework/tests/CMakeLists.txt`](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/framework/tests/CMakeLists.txt)里添加编译:
```
py_test(test_mul_op SRCS test_mul_op.py)
```
编译时需要打开`WITH_TESTING`, 即 `cmake paddle_dir -DWITH_TESTING=ON`,编译成功之后执行单测命令为:
```
make test ARGS="-R test_mul_op -V"
```
或者:
```
ctest -R test_mul_op
```

@ -7,4 +7,3 @@ PaddlePaddle Documentation
getstarted/index_en.rst
howto/index_en.rst
api/index_en.rst
about/index_en.rst

@ -124,6 +124,9 @@ static std::unique_ptr<OperatorBase> BackwardRecursive(
std::list<Pos> insert_position;
for (auto& dup_output_op : dup_output_ops) {
const std::string& name = dup_output_op.first;
// duplicate @Empty@ don't need to be added
if (name == kEmptyVarName) continue;
auto& dup_op = dup_output_op.second;
// no duplicate output
if (dup_op.size() == 1) continue;
@ -209,7 +212,7 @@ std::unique_ptr<OperatorBase> Backward(
const OperatorBase& forwardOp,
const std::unordered_set<std::string>& no_grad_vars) {
std::unordered_set<std::string> no_grad_names;
no_grad_names.reserve(no_grad_vars.size());
no_grad_names.reserve(no_grad_vars.size() + 1);
no_grad_names.insert(std::string(kEmptyVarName) + kGradVarSuffix);

@ -1,23 +1,53 @@
## Operator/expression 's Backward
# Operator/expression 's Backward
### Motivation
## Motivation
In Neural Network, the backpropagation algorithm follows the chain rule, so we need to compound the fundmental gradient operators/expressions together with chain rule . Every forward network need a backward network to construct the full computation lineage, the operator/ expression's Backward feature will generate the backward pass respect to forward pass.
In Neural Network, the backpropagation algorithm follows the chain rule, so we need to compound the fundmental gradient operators/expressions together with chain rule . Every forward network need a backward network to construct the full computation graph, the operator/expression's backward pass will be generated respect to forward pass.
## Backward Operator Registry
### Implement : gradient operator registry
A backward network is built up with several backward operators. Backward operators take forward operators' inputs, outputs and output gradients and then calculate its input gradients.
| | forward operator | backward operator |
| ---------------------- | ---------------- | -------------------------------- |
| **Operator::inputs_** | Inputs | Inputs, Outputs, OutputGradients |
| **Operator::outputs_** | Outputs | InputGradients |
| | forward operator | backward operator
| ---------------------- | ---------------- |------------------------- |
| **Operator::inputs_** | Inputs | Inputs, Outputs, OutputGradients |
| **Operator::outputs_** | Outputs | InputGradients |
Inputs/Outputs means the input/output of the operator, InputGradients/OutputGradients is the gradient respect to forward opeartor. Forward operator and Backward operator are isomorphic, save their corresponding needs into member attribute.
In most cases, there is a one-to-one correspondence between forward and backward operators. These correspondences are recorded by a global hash map(`OpInfoMap`). To follow the philosophy of minimum core and make operators pluggable, the registry mechanism is introduced.
We use a global hash map record the gradient operators available, follow the philosophy of minimum core, make operator pluggable unit. Each gradient is an operator and it needs to regist itself.
For example, we have got a `mul_op`, and we can register it's information and corresponding backward operator by the following macro:
grad_op_builder(fengjiayi)
```cpp
REGISTER_OP(mul, MulOp, MulOpMaker, mul_grad, MulOpGrad);
```
### Implement : Backward network
`mul` is the operator's type. `MulOp` and `MulOpMaker` are the operator class and the operator maker class respectively.
`mul_grad` is the type of backward operator, and `MulOpGrad` is its class name.
## Backward Opeartor Creating
Given a certain forward operator, we can get its corresponding backward opeartor by calling:
```cpp
OperatorBase* bwd_op = BuildGradOp(const OperatorBase* fwd_op);
```
The function `BuildGradOp` will sequentially execute following processes:
1. Get the `type_` of given forward operator, and then get the corresponding backward operator's type by looking up the `OpInfoMap`.
2. Build two maps named `inputs` and `outputs` to temporary storage backward operator's inputs and outputs. Copy forward operator's `inputs_` and `outputs_` to map `inputs`, except these are not necessary for gradient computing.
3. Add forward inputs' gradient variables into map `output`, adding forward outputs' gradient variables into map `input`.
4. Building backward operator with `inputs`, `outputs` and forward operator's attributes.
## Backward Network Building
A backward network is a series of backward operators. The main idea of building a backward network is creating backward operators in the inverted sequence and put them together.
In our design, the network itself is also a kind of operator. So the operators contained by a big network may be some small network.
given a forward network, it generates the backward network. We only care about the Gradients—`OutputGradients`,`InputGradients`.

@ -21,6 +21,8 @@ if(USE_NNPACK)
endif()
endif()
list(APPEND cpp_files neon/NeonDepthwiseConv.cpp)
add_library(paddle_function STATIC ${cpp_files} ${cu_objs})
add_dependencies(paddle_function ${external_project_dependencies})
add_dependencies(paddle_function paddle_proto)
@ -42,11 +44,11 @@ if(WITH_GPU)
add_simple_unittest(RowConvOpTest)
add_simple_unittest(BlockExpandOpTest)
add_simple_unittest(CropOpTest)
add_simple_unittest(DepthwiseConvOpTest)
endif()
add_simple_unittest(Im2ColTest)
add_simple_unittest(GemmConvOpTest)
add_simple_unittest(DepthwiseConvOpTest)
endif()
add_style_check_target(paddle_function ${h_files})

@ -34,4 +34,13 @@ TEST(DepthwiseConv, BackwardFilter) {
}
#endif
#if defined(__ARM_NEON__) || defined(__ARM_NEON)
TEST(DepthwiseConv, Forward) {
DepthwiseConvolution<DEVICE_TYPE_CPU, DEVICE_TYPE_CPU>(
"GemmConv-CPU", "NeonDepthwiseConv-CPU", forward);
}
#endif
} // namespace paddle

@ -16,6 +16,7 @@ limitations under the License. */
#include "TensorShape.h"
#include "TensorType.h"
#include "neon/neon_util.h"
namespace paddle {
@ -93,4 +94,95 @@ public:
int paddingWidth);
};
template <class T>
struct Padding {
static void run(const T* src,
T* dest,
int channels,
int inputHeight,
int inputWidth,
int paddingHeight,
int paddingWidth) {
const int destWidth = inputWidth + 2 * paddingWidth;
for (int c = 0; c < channels; c++) {
if (paddingHeight > 0) {
memset(dest, 0, destWidth * paddingHeight * sizeof(T));
dest += destWidth * paddingHeight;
}
for (int i = 0; i < inputHeight; i++) {
// padding head
for (int j = 0; j < paddingWidth; j++) {
*dest++ = T(0);
}
memcpy(dest, src, inputWidth * sizeof(T));
dest += inputWidth;
src += inputWidth;
// padding tail
for (int j = 0; j < paddingWidth; j++) {
*dest++ = T(0);
}
}
if (paddingHeight > 0) {
memset(dest, 0, destWidth * paddingHeight * sizeof(T));
dest += destWidth * paddingHeight;
}
}
}
};
#if defined(__ARM_NEON__) || defined(__ARM_NEON)
template <>
struct Padding<float> {
static void run(const float* src,
float* dest,
int channels,
int inputHeight,
int inputWidth,
int paddingHeight,
int paddingWidth) {
const int destWidth = inputWidth + 2 * paddingWidth;
for (int c = 0; c < channels; c++) {
if (paddingHeight > 0) {
memset(dest, 0, destWidth * paddingHeight * sizeof(float));
dest += destWidth * paddingHeight;
}
for (int i = 0; i < inputHeight; i++) {
// padding head
for (int j = 0; j < paddingWidth; j++) {
*dest++ = float(0);
}
int step = inputWidth >> 2;
int remain = inputWidth & 3;
for (int s = 0; s < step; s++) {
float32x4_t s0 = vld1q_f32(src);
vst1q_f32(dest, s0);
src += 4;
dest += 4;
}
for (int r = 0; r < remain; r++) {
*dest++ = *src++;
}
// padding tail
for (int j = 0; j < paddingWidth; j++) {
*dest++ = float(0);
}
}
if (paddingHeight > 0) {
memset(dest, 0, destWidth * paddingHeight * sizeof(float));
dest += destWidth * paddingHeight;
}
}
}
};
#endif
} // namespace paddle

File diff suppressed because it is too large Load Diff

@ -0,0 +1,47 @@
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#if defined(__ARM_NEON__) || defined(__ARM_NEON)
#include <arm_neon.h>
namespace paddle {
namespace neon {
inline float32x4_t vld1q_f32_aligned(const float* p) {
return vld1q_f32(
(const float*)__builtin_assume_aligned(p, sizeof(float32x4_t)));
}
#ifndef __aarch64__
inline float32_t vaddvq_f32(float32x4_t a) {
float32x2_t v = vadd_f32(vget_high_f32(a), vget_low_f32(a));
return vget_lane_f32(vpadd_f32(v, v), 0);
}
inline float32x4_t vmlaq_laneq_f32(float32x4_t a,
float32x4_t b,
float32x4_t v,
const int lane) {
return vmlaq_n_f32(a, b, vgetq_lane_f32(v, lane));
}
#endif
} // namespace neon
} // namespace paddle
#endif

@ -572,13 +572,8 @@ void MultiBinaryLabelCrossEntropy::backwardImp(Matrix& output,
}
}
//
// Huber loss for robust 2-classes classification
//
REGISTER_LAYER(huber, HuberTwoClass);
bool HuberTwoClass::init(const LayerMap& layerMap,
const ParameterMap& parameterMap) {
bool HuberCost::init(const LayerMap& layerMap,
const ParameterMap& parameterMap) {
CostLayer::init(layerMap, parameterMap);
if (useGpu_) {
tmpCpuInput_.reserve(inputLayers_.size());
@ -589,7 +584,7 @@ bool HuberTwoClass::init(const LayerMap& layerMap,
return true;
}
void HuberTwoClass::forwardImp(Matrix& output, Argument& label, Matrix& cost) {
void HuberCost::forwardImp(Matrix& output, Argument& label, Matrix& cost) {
if (useGpu_) {
for (size_t i = 0; i < inputLayers_.size(); i++) {
tmpCpuInput_[i].resizeAndCopyFrom(
@ -597,13 +592,87 @@ void HuberTwoClass::forwardImp(Matrix& output, Argument& label, Matrix& cost) {
}
hl_stream_synchronize(HPPL_STREAM_DEFAULT);
}
forwardImpIn(output, label, cost);
}
void HuberTwoClass::forwardImpIn(Matrix& output,
Argument& label,
Matrix& target) {
//
// Huber loss for robust regression.
//
REGISTER_LAYER(huber_regression, HuberRegressionLoss);
bool HuberRegressionLoss::init(const LayerMap& layerMap,
const ParameterMap& parameterMap) {
HuberCost::init(layerMap, parameterMap);
delta_ = config_.delta();
return true;
}
void HuberRegressionLoss::forwardImp(Matrix& output,
Argument& label,
Matrix& target) {
HuberCost::forwardImp(output, label, target);
size_t numSamples = target.getHeight();
size_t dim = output.getWidth();
CHECK(label.value);
CHECK_EQ((*label.value).getHeight(), numSamples);
CHECK_EQ(output.getHeight(), numSamples);
CHECK_EQ(dim, (*label.value).getWidth());
CHECK_EQ(target.getWidth(), (size_t)1);
real* out = useGpu_ ? tmpCpuInput_[0].value->getData() : output.getData();
real* lbl =
useGpu_ ? tmpCpuInput_[1].value->getData() : (*label.value).getData();
std::vector<real> cost(numSamples, 0);
for (size_t i = 0; i < numSamples; ++i) {
for (size_t j = 0; j < dim; ++j) {
int index = i * dim + j;
real a = std::abs(lbl[index] - out[index]);
if (a <= delta_)
cost[i] += a * a / 2;
else
cost[i] += delta_ * (a - delta_ / 2);
}
}
target.copyFrom(cost.data(), numSamples);
}
void HuberRegressionLoss::backwardImp(Matrix& output,
Argument& label,
Matrix& outputG) {
size_t numSamples = output.getHeight();
size_t dim = output.getWidth();
real* out = useGpu_ ? tmpCpuInput_[0].value->getData() : output.getData();
real* lbl =
useGpu_ ? tmpCpuInput_[1].value->getData() : (*label.value).getData();
real* grad = useGpu_ ? tmpCpuInput_[0].grad->getData() : outputG.getData();
for (size_t i = 0; i < numSamples; ++i) {
for (size_t j = 0; j < dim; ++j) {
int index = i * dim + j;
real a = lbl[index] - out[index];
if (std::abs(a) <= delta_)
grad[index] += -a;
else
grad[index] += a > 0 ? -delta_ : delta_;
}
}
if (useGpu_) outputG.copyFrom(grad, numSamples * dim);
}
//
// Huber loss for robust 2-classes classification
//
REGISTER_LAYER(huber_classification, HuberTwoClassification);
bool HuberTwoClassification::init(const LayerMap& layerMap,
const ParameterMap& parameterMap) {
return HuberCost::init(layerMap, parameterMap);
}
void HuberTwoClassification::forwardImp(Matrix& output,
Argument& label,
Matrix& target) {
HuberCost::forwardImp(output, label, target);
size_t numSamples = target.getHeight();
CHECK(label.ids);
CHECK_EQ((*label.ids).getSize(), numSamples);
CHECK_EQ(output.getHeight(), numSamples);
CHECK_EQ(output.getWidth(), (size_t)1);
@ -611,47 +680,35 @@ void HuberTwoClass::forwardImpIn(Matrix& output,
real* out = useGpu_ ? tmpCpuInput_[0].value->getData() : output.getData();
int* lbl = useGpu_ ? tmpCpuInput_[1].ids->getData() : (*label.ids).getData();
std::vector<real> cost(numSamples);
std::vector<real> cost(numSamples, 0);
for (size_t i = 0; i < numSamples; ++i) {
int y = 2 * lbl[i] - 1;
if (out[i] * y < -1)
cost[i] = -4 * out[i] * y;
else if (out[i] * y < 1)
cost[i] = (1 - out[i] * y) * (1 - out[i] * y);
else
cost[i] = 0;
real a = out[i] * y;
if (a < -1)
cost[i] = -4 * a;
else if (a < 1)
cost[i] = (1 - a) * (1 - a);
}
target.copyFrom(cost.data(), numSamples);
}
void HuberTwoClass::backwardImp(Matrix& outputValue,
Argument& label,
Matrix& outputGrad) {
if (useGpu_) {
backwardImpIn(
*tmpCpuInput_[0].value, tmpCpuInput_[1], *tmpCpuInput_[0].grad);
outputGrad.copyFrom(*tmpCpuInput_[0].grad);
} else {
backwardImpIn(outputValue, label, outputGrad);
}
}
void HuberTwoClass::backwardImpIn(Matrix& output,
Argument& label,
Matrix& outputG) {
void HuberTwoClassification::backwardImp(Matrix& output,
Argument& label,
Matrix& outputG) {
size_t numSamples = output.getHeight();
real* out = output.getData();
real* grad = outputG.getData();
int* lbl = (*label.ids).getData();
real* out = useGpu_ ? tmpCpuInput_[0].value->getData() : output.getData();
int* lbl = useGpu_ ? tmpCpuInput_[1].ids->getData() : (*label.ids).getData();
real* grad = useGpu_ ? tmpCpuInput_[0].grad->getData() : outputG.getData();
for (size_t i = 0; i < numSamples; ++i) {
int y = 2 * lbl[i] - 1;
if (y * out[i] < -1)
real a = out[i] * y;
if (a < -1)
grad[i] += -4 * y;
else if (y * out[i] < 1)
grad[i] += -2 * (1 - y * out[i]) * y;
else if (a < 1)
grad[i] += -2 * (1 - a) * y;
}
if (useGpu_) outputG.copyFrom(grad, numSamples);
}
/**
* This cost layer compute the sum of its input as loss.
* \f[

@ -304,37 +304,70 @@ public:
Matrix& outputGrad) override;
};
/**
* Huber loss for robust 2-classes classification.
*
* For label={0, 1}, let y=2*label-1. Given output f, the loss is:
* \f[
* Loss =
* \left\{\begin{matrix}
* 4 * y * f & \textit{if} \ \ y* f < -1 \\
* (1 - y * f)^2 & \textit{if} \ \ -1 < y * f < 1 \\
* 0 & \textit{otherwise}
* \end{matrix}\right.
* \f]
/*
* A base layer for HuberRegressionLoss and HuberTwoClassification.
*/
class HuberTwoClass : public CostLayer {
class HuberCost : public CostLayer {
public:
std::vector<Argument> tmpCpuInput_;
public:
explicit HuberTwoClass(const LayerConfig& config) : CostLayer(config) {}
explicit HuberCost(const LayerConfig& config) : CostLayer(config) {}
bool init(const LayerMap& layerMap,
const ParameterMap& parameterMap) override;
void forwardImp(Matrix& output, Argument& label, Matrix& cost) override;
void forwardImpIn(Matrix& output, Argument& label, Matrix& cost);
void backwardImp(Matrix& outputValue,
Argument& label,
Matrix& outputGrad) override {}
};
/**
* Huber loss for robust regression.
*
* Given output f(x), label y and delta, the loss is:
* Loss = 0.5 * (1 - y * f)^2, if abs(y - f) <= delta \\
* Loss = delta * abs(y - f) - 0.5 * delta^2, otherwise
*/
class HuberRegressionLoss : public HuberCost {
public:
explicit HuberRegressionLoss(const LayerConfig& config) : HuberCost(config) {}
bool init(const LayerMap& layerMap,
const ParameterMap& parameterMap) override;
void forwardImp(Matrix& output, Argument& label, Matrix& cost) override;
void backwardImp(Matrix& outputValue,
Argument& label,
Matrix& outputGrad) override;
void backwardImpIn(Matrix& outputValue, Argument& label, Matrix& outputGrad);
protected:
real delta_;
};
/**
* Huber loss for robust 2-classes classification.
*
* For label={0, 1}, let y=2*label-1. Given output f(x), the loss is:
* Loss = 4 * y * f, if y* f < -1 \\
* Loss = (1 - y * f)^2, if -1 < y * f < 1 \\
* Loss = 0, otherwise
*/
class HuberTwoClassification : public HuberCost {
public:
explicit HuberTwoClassification(const LayerConfig& config)
: HuberCost(config) {}
bool init(const LayerMap& layerMap,
const ParameterMap& parameterMap) override;
void forwardImp(Matrix& output, Argument& label, Matrix& cost) override;
void backwardImp(Matrix& outputValue,
Argument& label,
Matrix& outputGrad) override;
};
typedef std::shared_ptr<CostLayer> CostLayerPtr;

@ -29,6 +29,10 @@ namespace paddle {
REGISTER_LAYER(exconv, ExpandConvLayer);
REGISTER_LAYER(exconvt, ExpandConvLayer);
inline bool isDepthwiseConv(int channels, int groups) {
return channels == groups;
}
bool ExpandConvLayer::init(const LayerMap &layerMap,
const ParameterMap &parameterMap) {
/* Initialize the basic convolutional parent class */
@ -47,14 +51,27 @@ bool ExpandConvLayer::init(const LayerMap &layerMap,
std::vector<size_t> paddings = {(size_t)paddingY_[i], (size_t)padding_[i]};
std::vector<size_t> strides = {(size_t)strideY_[i], (size_t)stride_[i]};
if (useGpu_ && (size_t)groups_[i] == (size_t)channels_[i] && !isDeconv_) {
// Convolution Layer uses the GemmConv function by default.
convType = "GemmConv";
convGradInputType = "GemmConvGradInput";
convGradFilterType = "GemmConvGradFilter";
// If depth wise convolution and useGpu == true
if (useGpu_ && isDepthwiseConv(channels_[i], groups_[i]) && !isDeconv_) {
convType = "DepthwiseConv";
convGradInputType = "DepthwiseConvGradInput";
convGradFilterType = "DepthwiseConvGradFilter";
} else {
convType = "GemmConv";
convGradInputType = "GemmConvGradInput";
convGradFilterType = "GemmConvGradFilter";
}
// If depth wise convolution and useGpu == false and ARM-NEON
if (!useGpu_ && isDepthwiseConv(channels_[i], groups_[i]) && !isDeconv_) {
#if defined(__ARM_NEON__) || defined(__ARM_NEON)
if ((filterSize_[i] == filterSizeY_[i]) &&
(filterSize_[i] == 3 || filterSize_[i] == 4) &&
(stride_[i] == strideY_[i]) && (stride_[i] == 1 || stride_[i] == 2)) {
convType = "NeonDepthwiseConv";
}
#endif
}
if (FLAGS_use_nnpack && !isDeconv_) {

@ -41,7 +41,7 @@ namespace paddle {
Layer::Layer(const LayerConfig& config, bool useGpu)
: config_(config),
useGpu_(useGpu),
deviceId_(-1),
deviceId_(CPU_DEVICE),
needSequenceInfo_(true) {}
bool Layer::init(const LayerMap& layerMap, const ParameterMap& parameterMap) {

@ -59,7 +59,12 @@ protected:
LayerConfig config_;
/// whether to use GPU
bool useGpu_;
/// Device Id. CPU is -1, and GPU is 0, 1, 2 ...
/// Paddle device ID, MKLDNN is -2, CPU is -1
enum PADDLE_DEVICE_ID {
MKLDNN_DEVICE = -2,
CPU_DEVICE = -1,
};
/// Device Id. MKLDNN is -2, CPU is -1, and GPU is 0, 1, 2 ...
int deviceId_;
/// Input layers
std::vector<LayerPtr> inputLayers_;
@ -77,6 +82,7 @@ protected:
Argument output_;
/// Several outputs stored on different devices, used in 'parallel_nn' case,
/// and record them by deviceId_.
/// Also used in 'use_mkldnn' case.
std::vector<Argument> outputOtherDevice_;
/// If there are several outputs, map them by each name.
std::map<std::string, Argument*> outputMap_;
@ -172,6 +178,13 @@ protected:
return inputLayer.getOutput(deviceId_);
}
/**
* Get the argument of input layer with deviceId.
*/
const Argument& getInput(size_t inputIndex, int deviceId) const {
return inputLayers_[inputIndex]->getOutput(deviceId);
}
/**
* Get the forward-input value.
*/
@ -186,6 +199,13 @@ protected:
return inputLayer.getOutput(deviceId_).value;
}
/**
* Get the forward-input value with deviceId.
*/
const MatrixPtr& getInputValue(int inputIndex, int deviceId) {
return inputLayers_[inputIndex]->getOutput(deviceId).value;
}
/**
* Get the forward-input grad.
*/
@ -200,6 +220,13 @@ protected:
return inputLayer.getOutput(deviceId_).grad;
}
/**
* Get the forward-input grad.
*/
const MatrixPtr& getInputGrad(int inputIndex, int deviceId) {
return inputLayers_[inputIndex]->getOutput(deviceId).grad;
}
/**
* Get the forward-input label.
*/

File diff suppressed because it is too large Load Diff

@ -32,16 +32,13 @@ protected:
// if has already init the weight
bool hasInitedWgt_;
// if input layer has image size info (ih>1 && iw>1)
bool hasSpatial_;
// fc weight and bias
std::unique_ptr<Weight> weight_;
std::unique_ptr<Weight> biases_;
public:
explicit MKLDNNFcLayer(const LayerConfig& config)
: MKLDNNLayer(config), hasInitedWgt_(false), hasSpatial_(true) {}
: MKLDNNLayer(config), hasInitedWgt_(false) {}
~MKLDNNFcLayer() {}
@ -75,6 +72,8 @@ protected:
* only would be called when needed
*/
void resetBwd();
void convertOutputToOtherDevice() override;
};
} // namespace paddle

Some files were not shown because too many files have changed in this diff Show More

Loading…
Cancel
Save