Merge branch 'develop' into cmake_protobuf

refactor_docs
Liu Yiqun 8 years ago
commit c27e71e2fd

@ -1,11 +1,103 @@
# Release v0.10.0 # Release v0.10.0
We are glad to release version 0.10.0. In this version, we are happy to release the new
[Python API](http://research.baidu.com/paddlepaddles-new-api-simplifies-deep-learning-programs/).
- Our old Python API is kind of out of date. It's hard to learn and hard to
use. To write a PaddlePaddle program using the old API, we'd have to write
at least two Python files: one `data provider` and another one that defines
the network topology. Users start a PaddlePaddle job by running the
`paddle_trainer` C++ program, which calls Python interpreter to run the
network topology configuration script and then start the training loop,
which iteratively calls the data provider function to load minibatches.
This prevents us from writing a Python program in a modern way, e.g., in the
Jupyter Notebook.
- The new API, which we often refer to as the *v2 API*, allows us to write
much shorter Python programs to define the network and the data in a single
.py file. Also, this program can run in Jupyter Notebook, since the entry
point is in Python program and PaddlePaddle runs as a shared library loaded
and invoked by this Python program.
Basing on the new API, we delivered an online interative
book, [Deep Learning 101](http://book.paddlepaddle.org/index.en.html)
and [its Chinese version](http://book.paddlepaddle.org/).
We also worked on updating our online documentation to describe the new API.
But this is an ongoing work. We will release more documentation improvements
in the next version.
We also worked on bring the new API to distributed model training (via MPI and
Kubernetes). This work is ongoing. We will release more about it in the next
version.
## New Features ## New Features
* We release [new Python API](http://research.baidu.com/paddlepaddles-new-api-simplifies-deep-learning-programs/).
* Deep Learning 101 book in [English](http://book.paddlepaddle.org/index.en.html) and [Chinese](http://book.paddlepaddle.org/).
* Support rectangle input for CNN.
* Support stride pooling for seqlastin and seqfirstin.
* Expose `seq_concat_layer/seq_reshape_layer` in `trainer_config_helpers`.
* Add dataset package: CIFAR, MNIST, IMDB, WMT14, CONLL05, movielens, imikolov.
* Add Priorbox layer for Single Shot Multibox Detection.
* Add smooth L1 cost.
* Add data reader creator and data reader decorator for v2 API.
* Add the CPU implementation of cmrnorm projection.
## Improvements ## Improvements
* Support Python virtualenv for `paddle_trainer`.
* Add pre-commit hooks, used for automatically format our code.
* Upgrade protobuf to version 3.x.
* Add an option to check data type in Python data provider.
* Speedup the backward of average layer on GPU.
* Documentation refinement.
* Check dead links in documents using Travis-CI.
* Add a example for explaining `sparse_vector`.
* Add ReLU in layer_math.py
* Simplify data processing flow for Quick Start.
* Support CUDNN Deconv.
* Add data feeder in v2 API.
* Support predicting the samples from sys.stdin for sentiment demo.
* Provide multi-proccess interface for image preprocessing.
* Add benchmark document for v1 API.
* Add ReLU in `layer_math.py`.
* Add packages for automatically downloading public datasets.
* Rename `Argument::sumCost` to `Argument::sum` since class `Argument` is nothing with cost.
* Expose Argument::sum to Python
* Add a new `TensorExpression` implementation for matrix-related expression evaluations.
* Add lazy assignment for optimizing the calculation of a batch of multiple expressions.
* Add abstract calss `Function` and its implementation:
* `PadFunc` and `PadGradFunc`.
* `ContextProjectionForwardFunc` and `ContextProjectionBackwardFunc`.
* `CosSimBackward` and `CosSimBackwardFunc`.
* `CrossMapNormalFunc` and `CrossMapNormalGradFunc`.
* `MulFunc`.
* Add class `AutoCompare` and `FunctionCompare`, which make it easier to write unit tests for comparing gpu and cpu version of a function.
* Generate `libpaddle_test_main.a` and remove the main function inside the test file.
* Support dense numpy vector in PyDataProvider2.
* Clean code base, remove some copy-n-pasted code snippets:
* Extract `RowBuffer` class for `SparseRowMatrix`.
* Clean the interface of `GradientMachine`.
* Use `override` keyword in layer.
* Simplify `Evaluator::create`, use `ClassRegister` to create `Evaluator`s.
* Check MD5 checksum when downloading demo's dataset.
* Add `paddle::Error` which intentially replace `LOG(FATAL)` in Paddle.
## Bug Fixes ## Bug Fixes
* Check layer input types for `recurrent_group`.
* Don't run `clang-format` with .cu source files.
* Fix bugs with `LogActivation`.
* Fix the bug that runs `test_layerHelpers` multiple times.
* Fix the bug that the seq2seq demo exceeds protobuf message size limit.
* Fix the bug in dataprovider converter in GPU mode.
* Fix a bug in `GatedRecurrentLayer`.
* Fix bug for `BatchNorm` when testing more than one models.
* Fix broken unit test of paramRelu.
* Fix some compile-time warnings about `CpuSparseMatrix`.
* Fix `MultiGradientMachine` error when `trainer_count > batch_size`.
* Fix bugs that prevents from asynchronous data loading in `PyDataProvider2`.
# Release v0.9.0 # Release v0.9.0

@ -44,7 +44,6 @@ if(MKL_INC_DIR AND MKL_CORE_LIB AND MKL_SEQUENTIAL_LIB AND MKL_INTEL_LP64)
message(STATUS "Found MKL (include: ${CBLAS_INC_DIR}, library: ${CBLAS_LIBRARIES})") message(STATUS "Found MKL (include: ${CBLAS_INC_DIR}, library: ${CBLAS_LIBRARIES})")
set(CBLAS_FOUND ON) set(CBLAS_FOUND ON)
if(${MKL_LAPACK_INC_DIR}) if(${MKL_LAPACK_INC_DIR})
add_definitions(-DPADDLE_USE_LAPACK)
message(STATUS "Found lapack in MKL (include: ${MKL_LAPACK_INC_DIR})") message(STATUS "Found lapack in MKL (include: ${MKL_LAPACK_INC_DIR})")
endif() endif()
return() # return file. return() # return file.
@ -80,7 +79,6 @@ if(ATLAS_INC_DIR AND ATLAS_CBLAS_LIB AND ATLAS_LIB AND NOT CBLAS_FOUND)
message(STATUS "Found ATLAS (include: ${CBLAS_INC_DIR}, library: ${CBLAS_LIBRARIES})") message(STATUS "Found ATLAS (include: ${CBLAS_INC_DIR}, library: ${CBLAS_LIBRARIES})")
set(CBLAS_FOUND ON) set(CBLAS_FOUND ON)
if(ATLAS_CLAPACK_INC_DIR) if(ATLAS_CLAPACK_INC_DIR)
add_definitions(-DPADDLE_USE_LAPACK)
set(CBLAS_INC_DIR ${CBLAS_INC_DIR} ${ATLAS_CLAPACK_INC_DIR}) set(CBLAS_INC_DIR ${CBLAS_INC_DIR} ${ATLAS_CLAPACK_INC_DIR})
message(STATUS "Found lapack in ATLAS (include: ${ATLAS_CLAPACK_INC_DIR})") message(STATUS "Found lapack in ATLAS (include: ${ATLAS_CLAPACK_INC_DIR})")
endif() endif()
@ -115,7 +113,6 @@ if(OPENBLAS_INC_DIR AND OPENBLAS_LIB)
message(STATUS "Found OpenBLAS (include: ${CBLAS_INC_DIR}, library: ${CBLAS_LIBRARIES})") message(STATUS "Found OpenBLAS (include: ${CBLAS_INC_DIR}, library: ${CBLAS_LIBRARIES})")
set(CBLAS_FOUND ON) set(CBLAS_FOUND ON)
if(OPENBLAS_LAPACKE_INC_DIR) if(OPENBLAS_LAPACKE_INC_DIR)
add_definitions(-DPADDLE_USE_LAPACK)
message(STATUS "Found lapack in OpenBLAS (include: ${OPENBLAS_LAPACKE_INC_DIR})") message(STATUS "Found lapack in OpenBLAS (include: ${OPENBLAS_LAPACKE_INC_DIR})")
endif() endif()
return() return()

@ -24,45 +24,17 @@ IF(NOT ${CBLAS_FOUND})
SET(CBLAS_LIBRARIES "${CBLAS_INSTALL_DIR}/lib/${LIBRARY_PREFIX}openblas${STATIC_LIBRARY_SUFFIX}" SET(CBLAS_LIBRARIES "${CBLAS_INSTALL_DIR}/lib/${LIBRARY_PREFIX}openblas${STATIC_LIBRARY_SUFFIX}"
CACHE FILEPATH "openblas library." FORCE) CACHE FILEPATH "openblas library." FORCE)
# check fortran compiler and library SET(COMMON_ARGS CC=${CMAKE_C_COMPILER} NO_LAPACK=1 NO_SHARED=1)
IF(ANDROID) IF(ANDROID)
SET(OPENBLAS_COMMIT "b5c96fcfcdc82945502a2303116a64d89985daf5") SET(OPENBLAS_COMMIT "b5c96fcfcdc82945502a2303116a64d89985daf5")
SET(OPTIONAL_ARGS HOSTCC=${HOST_C_COMPILER} TARGET=ARMV7 ARM_SOFTFP_ABI=1 NOFORTRAN=1 USE_THREAD=0 libs) SET(OPTIONAL_ARGS HOSTCC=${HOST_C_COMPILER} TARGET=ARMV7 ARM_SOFTFP_ABI=1 USE_THREAD=0 libs)
ELSEIF(RPI) ELSEIF(RPI)
SET(OPENBLAS_COMMIT "v0.2.19") SET(OPENBLAS_COMMIT "v0.2.19")
SET(OPTIONAL_ARGS HOSTCC=${HOST_C_COMPILER} TARGET=ARMV7 NOFORTRAN=1 USE_THREAD=0 libs) SET(OPTIONAL_ARGS HOSTCC=${HOST_C_COMPILER} TARGET=ARMV7 USE_THREAD=0 libs)
ELSE() ELSE()
IF(CMAKE_COMPILER_IS_GNUCC)
ENABLE_LANGUAGE(Fortran)
if (NOT CMAKE_Fortran_COMPILER_VERSION)
# cmake < 3.4 cannot get CMAKE_Fortran_COMPILER_VERSION directly.
execute_process(COMMAND ${CMAKE_Fortran_COMPILER} -dumpversion
OUTPUT_VARIABLE CMAKE_Fortran_COMPILER_VERSION)
endif()
string(REGEX MATCHALL "[0-9]+" Fortran_VERSION ${CMAKE_Fortran_COMPILER_VERSION})
list(GET Fortran_VERSION 0 Fortran_MAJOR)
list(GET Fortran_VERSION 1 Fortran_MINOR)
find_library(GFORTRAN_LIBRARY NAMES gfortran PATHS
/lib
/usr/lib
/usr/lib/gcc/x86_64-linux-gnu/${Fortran_MAJOR}.${Fortran_MINOR}/
/usr/lib/gcc/x86_64-linux-gnu/${Fortran_MAJOR}/)
if (NOT GFORTRAN_LIBRARY)
message(FATAL_ERROR "Cannot found gfortran library which it is used by openblas")
endif()
find_package(Threads REQUIRED)
LIST(APPEND CBLAS_LIBRARIES ${GFORTRAN_LIBRARY} ${CMAKE_THREAD_LIBS_INIT})
ENDIF(CMAKE_COMPILER_IS_GNUCC)
IF(NOT CMAKE_Fortran_COMPILER)
MESSAGE(FATAL_ERROR "To build lapack in libopenblas, "
"you need to set gfortran compiler: cmake .. -DCMAKE_Fortran_COMPILER=...")
ENDIF(NOT CMAKE_Fortran_COMPILER)
ADD_DEFINITIONS(-DPADDLE_USE_LAPACK)
SET(OPENBLAS_COMMIT "v0.2.19") SET(OPENBLAS_COMMIT "v0.2.19")
SET(OPENBLAS_ARGS FC=${CMAKE_Fortran_COMPILER} DYNAMIC_ARCH=1 libs netlib) SET(OPENBLAS_ARGS DYNAMIC_ARCH=1 libs)
ENDIF() ENDIF()
ExternalProject_Add( ExternalProject_Add(
@ -73,7 +45,7 @@ IF(NOT ${CBLAS_FOUND})
PREFIX ${CBLAS_SOURCES_DIR} PREFIX ${CBLAS_SOURCES_DIR}
INSTALL_DIR ${CBLAS_INSTALL_DIR} INSTALL_DIR ${CBLAS_INSTALL_DIR}
BUILD_IN_SOURCE 1 BUILD_IN_SOURCE 1
BUILD_COMMAND ${CMAKE_MAKE_PROGRAM} CC=${CMAKE_C_COMPILER} NO_SHARED=1 ${OPTIONAL_ARGS} BUILD_COMMAND ${CMAKE_MAKE_PROGRAM} ${COMMON_ARGS} ${OPTIONAL_ARGS}
INSTALL_COMMAND ${CMAKE_MAKE_PROGRAM} install NO_SHARED=1 PREFIX=<INSTALL_DIR> INSTALL_COMMAND ${CMAKE_MAKE_PROGRAM} install NO_SHARED=1 PREFIX=<INSTALL_DIR>
UPDATE_COMMAND "" UPDATE_COMMAND ""
CONFIGURE_COMMAND "" CONFIGURE_COMMAND ""

@ -1,5 +1,4 @@
set(CPACK_PACKAGE_NAME paddle) set(CPACK_PACKAGE_NAME paddle)
set(CPACK_PACKAGE_DESCRIPTION_SUMMARY "")
set(CPACK_PACKAGE_VERSION_MAJOR ${PADDLE_MAJOR_VERSION}) set(CPACK_PACKAGE_VERSION_MAJOR ${PADDLE_MAJOR_VERSION})
set(CPACK_PACKAGE_VERSION_MINOR ${PADDLE_MINOR_VERSION}) set(CPACK_PACKAGE_VERSION_MINOR ${PADDLE_MINOR_VERSION})
set(CPACK_PACKAGE_VERSION_PATCH ${PADDLE_PATCH_VERSION}) set(CPACK_PACKAGE_VERSION_PATCH ${PADDLE_PATCH_VERSION})
@ -10,8 +9,9 @@ set(CPACK_DEBIAN_PACKAGE_ARCHITECTURE amd64)
set(CPACK_DEBIAN_PACKAGE_MAINTAINER PaddlePaddle Dev <paddle-dev@baidu.com>) set(CPACK_DEBIAN_PACKAGE_MAINTAINER PaddlePaddle Dev <paddle-dev@baidu.com>)
set(CPACK_PACKAGE_DESCRIPTION_SUMMARY "Paddle") set(CPACK_PACKAGE_DESCRIPTION_SUMMARY "Paddle")
set(CPACK_PACKAGE_DESCRIPTION "") set(CPACK_PACKAGE_DESCRIPTION "")
set(CPACK_DEBIAN_PACKAGE_DEPENDS "libatlas3-base, libgflags2, libgoogle-glog0, libprotobuf8, libpython2.7, libstdc++6, python-numpy, python-pip, python-pip-whl, python-protobuf") set(CPACK_DEBIAN_PACKAGE_DEPENDS "libpython2.7-dev, libstdc++6, python-pip, curl, libgfortran3, python-pip-whl")
set(CPACK_DEBIAN_PACKAGE_SECTION Devel) set(CPACK_DEBIAN_PACKAGE_SECTION Devel)
set(CPACK_DEBIAN_PACKAGE_VERSION ${PADDLE_VERSION})
set(CPACK_DEBIAN_PACKAGE_CONTROL_EXTRA "${PROJ_ROOT}/paddle/scripts/deb/postinst") set(CPACK_DEBIAN_PACKAGE_CONTROL_EXTRA "${PROJ_ROOT}/paddle/scripts/deb/postinst")
#set(CPACK_GENERATOR "DEB") #set(CPACK_GENERATOR "DEB")
# Start cpack # Start cpack

@ -29,7 +29,7 @@ settings(
batch_size=128, batch_size=128,
learning_rate=2e-3, learning_rate=2e-3,
learning_method=AdamOptimizer(), learning_method=AdamOptimizer(),
average_window=0.5, model_average=ModelAverage(0.5),
regularization=L2Regularization(8e-4), regularization=L2Regularization(8e-4),
gradient_clipping_threshold=25) gradient_clipping_threshold=25)

@ -69,7 +69,8 @@ def gru_encoder_decoder(data_conf,
encoder_size=512, encoder_size=512,
decoder_size=512, decoder_size=512,
beam_size=3, beam_size=3,
max_length=250): max_length=250,
error_clipping=50):
""" """
A wrapper for an attention version of GRU Encoder-Decoder network A wrapper for an attention version of GRU Encoder-Decoder network
is_generating: whether this config is used for generating is_generating: whether this config is used for generating
@ -90,9 +91,19 @@ def gru_encoder_decoder(data_conf,
input=src_word_id, input=src_word_id,
size=word_vector_dim, size=word_vector_dim,
param_attr=ParamAttr(name='_source_language_embedding')) param_attr=ParamAttr(name='_source_language_embedding'))
src_forward = simple_gru(input=src_embedding, size=encoder_size) src_forward = simple_gru(
input=src_embedding,
size=encoder_size,
naive=True,
gru_layer_attr=ExtraLayerAttribute(
error_clipping_threshold=error_clipping))
src_backward = simple_gru( src_backward = simple_gru(
input=src_embedding, size=encoder_size, reverse=True) input=src_embedding,
size=encoder_size,
reverse=True,
naive=True,
gru_layer_attr=ExtraLayerAttribute(
error_clipping_threshold=error_clipping))
encoded_vector = concat_layer(input=[src_forward, src_backward]) encoded_vector = concat_layer(input=[src_forward, src_backward])
with mixed_layer(size=decoder_size) as encoded_proj: with mixed_layer(size=decoder_size) as encoded_proj:
@ -117,11 +128,13 @@ def gru_encoder_decoder(data_conf,
decoder_inputs += full_matrix_projection(input=context) decoder_inputs += full_matrix_projection(input=context)
decoder_inputs += full_matrix_projection(input=current_word) decoder_inputs += full_matrix_projection(input=current_word)
gru_step = gru_step_layer( gru_step = gru_step_naive_layer(
name='gru_decoder', name='gru_decoder',
input=decoder_inputs, input=decoder_inputs,
output_mem=decoder_mem, output_mem=decoder_mem,
size=decoder_size) size=decoder_size,
layer_attr=ExtraLayerAttribute(
error_clipping_threshold=error_clipping))
with mixed_layer( with mixed_layer(
size=target_dict_dim, bias_attr=True, size=target_dict_dim, bias_attr=True,

@ -2,7 +2,8 @@
============ ============
.. toctree:: .. toctree::
:maxdepth: 2 :maxdepth: 1
build_and_install/index_cn.rst build_and_install/index_cn.rst
basic_usage/index_cn.rst
- `深度学习入门课程 <http://book.paddlepaddle.org/>`_

@ -2,7 +2,8 @@ GET STARTED
============ ============
.. toctree:: .. toctree::
:maxdepth: 2 :maxdepth: 1
build_and_install/index_en.rst build_and_install/index_en.rst
basic_usage/index_en.rst
- `Deep Learning 101 <http://book.paddlepaddle.org/index.en.html>`_

@ -19,18 +19,18 @@
在 PaddlePaddle中下面这些Layer能够接受双层序列作为输入完成相应的计算。 在 PaddlePaddle中下面这些Layer能够接受双层序列作为输入完成相应的计算。
pooling_layer pooling
============== ========
pooling_layer 的使用示例如下,详细见 :ref:`api_trainer_config_helpers_layers_pooling_layer` 配置API。 pooling 的使用示例如下,详细见 :ref:`api_v2.layer_pooling` 配置API。
.. code-block:: bash .. code-block:: bash
seq_pool = pooling_layer(input=layer, seq_pool = pooling(input=layer,
pooling_type=AvgPooling(), pooling_type=pooling.Max(),
agg_level=AggregateLevel.EACH_SEQUENCE) agg_level=AggregateLevel.EACH_SEQUENCE)
- `pooling_type` 目前支持两种,分别是:MaxPooling()和AvgPooling()。 - `pooling_type` 目前支持两种,分别是:pooling.Max()和pooling.Avg()。
- `agg_level=AggregateLevel.EACH_TIMESTEP` 时(默认值): - `agg_level=AggregateLevel.EACH_TIMESTEP` 时(默认值):
@ -47,7 +47,7 @@ pooling_layer 的使用示例如下,详细见 :ref:`api_trainer_config_helpers
last_seq 和 first_seq last_seq 和 first_seq
===================== =====================
last_seq 的使用示例如下( :ref:`api_trainer_config_helpers_layers_first_seq` 类似),详细见 :ref:`api_trainer_config_helpers_layers_last_seq` 配置API。 last_seq 的使用示例如下( :ref:`api_v2.layer_first_seq` 类似),详细见 :ref:`api_v2.layer_last_seq` 配置API。
.. code-block:: bash .. code-block:: bash
@ -65,16 +65,16 @@ last_seq 的使用示例如下( :ref:`api_trainer_config_helpers_layers_first_
- 输入:必须是一个双层序列 - 输入:必须是一个双层序列
- 输出一个单层序列其中每个元素是双层序列中每个subseq最后一个或第一个元素。 - 输出一个单层序列其中每个元素是双层序列中每个subseq最后一个或第一个元素。
expand_layer expand
============ ======
expand_layer 的使用示例如下,详细见 :ref:`api_trainer_config_helpers_layers_expand_layer` 配置API。 expand 的使用示例如下,详细见 :ref:`api_v2.layer_expand` 配置API。
.. code-block:: bash .. code-block:: bash
expand = expand_layer(input=layer1, ex = expand(input=layer1,
expand_as=layer2, expand_as=layer2,
expand_level=ExpandLevel.FROM_TIMESTEP) expand_level=ExpandLevel.FROM_TIMESTEP)
- `expand_level=ExpandLevel.FROM_TIMESTEP` 时(默认值): - `expand_level=ExpandLevel.FROM_TIMESTEP` 时(默认值):

@ -4,7 +4,6 @@ RNN相关模型
.. toctree:: .. toctree::
:maxdepth: 1 :maxdepth: 1
rnn_config_cn.rst
recurrent_group_cn.md recurrent_group_cn.md
hierarchical_layer_cn.rst hierarchical_layer_cn.rst
hrnn_rnn_api_compare_cn.rst hrnn_rnn_api_compare_cn.rst

@ -1,7 +1,2 @@
RNN Models RNN Models
========== ==========
.. toctree::
:maxdepth: 1
rnn_config_en.rst

@ -5,7 +5,6 @@ PaddlePaddle 文档
:maxdepth: 1 :maxdepth: 1
getstarted/index_cn.rst getstarted/index_cn.rst
tutorials/index_cn.md
howto/index_cn.rst howto/index_cn.rst
api/index_cn.rst api/index_cn.rst
faq/index_cn.rst faq/index_cn.rst

@ -5,8 +5,6 @@ PaddlePaddle Documentation
:maxdepth: 1 :maxdepth: 1
getstarted/index_en.rst getstarted/index_en.rst
tutorials/index_en.md
howto/index_en.rst howto/index_en.rst
api/index_en.rst api/index_en.rst
about/index_en.rst about/index_en.rst

@ -114,10 +114,7 @@
</ul> </ul>
</div> </div>
<ul class="site-page-links"> <ul class="site-page-links">
<li><a>Home</a></li> <li><a href="/">Home</a></li>
<li><a>Get Started</a></li>
<li class="active"><a>Documentation</a></li>
<li><a>About Us</a></li>
</ul> </ul>
</div> </div>
<div class="doc-module"> <div class="doc-module">
@ -137,7 +134,7 @@
{{ toctree }} {{ toctree }}
{% endblock %} {% endblock %}
</nav> </nav>
{% if toc %} {% if False %}
<nav class="local-toc">{{ toc }}</nav> <nav class="local-toc">{{ toc }}</nav>
{% endif %} {% endif %}
<section class="doc-content-wrap"> <section class="doc-content-wrap">
@ -168,7 +165,8 @@
VERSION:'{{ release|e }}', VERSION:'{{ release|e }}',
COLLAPSE_INDEX:false, COLLAPSE_INDEX:false,
FILE_SUFFIX:'{{ '' if no_search_suffix else file_suffix }}', FILE_SUFFIX:'{{ '' if no_search_suffix else file_suffix }}',
HAS_SOURCE: {{ has_source|lower }} HAS_SOURCE: {{ has_source|lower }},
SOURCELINK_SUFFIX: ".txt",
}; };
</script> </script>
{%- for scriptfile in script_files %} {%- for scriptfile in script_files %}

@ -21,16 +21,13 @@ set(CUDA_CXX_WITH_GPU_SOURCES
if(WITH_GPU) if(WITH_GPU)
set(CUDA_CXX_SOURCES set(CUDA_CXX_SOURCES
src/hl_dso_loader.cc
src/hl_warpctc_wrap.cc src/hl_warpctc_wrap.cc
${CUDA_CXX_WITH_GPU_SOURCES}) ${CUDA_CXX_WITH_GPU_SOURCES})
set_source_files_properties(${CUDA_CXX_SOURCES} set_source_files_properties(${CUDA_CXX_SOURCES}
PROPERTIES COMPILE_FLAGS "-D__NVCC__") PROPERTIES COMPILE_FLAGS "-D__NVCC__")
else() else()
set(CUDA_CXX_SOURCES set(CUDA_CXX_SOURCES src/hl_warpctc_wrap.cc)
src/hl_dso_loader.cc
src/hl_warpctc_wrap.cc)
endif() endif()
set(CUDA_CU_SOURCES set(CUDA_CU_SOURCES
@ -47,7 +44,6 @@ set(CUDA_CU_SOURCES
set(CUDA_HEADERS set(CUDA_HEADERS
include/hl_time.h include/hl_time.h
include/hl_dso_loader.h
include/hl_warpctc_wrap.h include/hl_warpctc_wrap.h
include/hl_sequence.h include/hl_sequence.h
include/hl_cuda_cublas.h include/hl_cuda_cublas.h

@ -40,18 +40,18 @@ public:
namespace gpu { namespace gpu {
static __device__ Active<real>::forward forward[] = HPPL_ACTIVE_FUNCTION; static __device__ Active<real>::forward forward[] = HPPL_ACTIVE_FUNCTION;
static __device__ Active<real>::backward backward[] = HPPL_ACTIVE_FUNCTION; static __device__ Active<real>::backward backward[] = HPPL_ACTIVE_FUNCTION;
} } // namespace gpu
#else #else
namespace cpu { namespace cpu {
static Active<real>::forward forward[] = HPPL_ACTIVE_FUNCTION; static Active<real>::forward forward[] = HPPL_ACTIVE_FUNCTION;
static Active<real>::backward backward[] = HPPL_ACTIVE_FUNCTION; static Active<real>::backward backward[] = HPPL_ACTIVE_FUNCTION;
} } // namespace cpu
#ifdef __AVX__ #ifdef __AVX__
namespace avx { namespace avx {
static Active<__m256>::forward forward[] = HPPL_ACTIVE_FUNCTION; static Active<__m256>::forward forward[] = HPPL_ACTIVE_FUNCTION;
static Active<__m256>::backward backward[] = HPPL_ACTIVE_FUNCTION; static Active<__m256>::backward backward[] = HPPL_ACTIVE_FUNCTION;
} } // namespace avx
#endif #endif
#endif #endif

@ -273,23 +273,23 @@ extern void hl_bilinear_forward(const real* inData,
const real ratioW); const real ratioW);
/** /**
* @brief Bilinear interpolation backward. * @brief Bilinear interpolation backward.
* *
* @param[out] inGrad input gradient. * @param[out] inGrad input gradient.
* @param[in] inImgH input image height. * @param[in] inImgH input image height.
* @param[in] inImgW input image width. * @param[in] inImgW input image width.
* @param[in] inputH input batchSize. * @param[in] inputH input batchSize.
* @param[in] inputW input image data dim. * @param[in] inputW input image data dim.
* @param[in] outGrad output gradient. * @param[in] outGrad output gradient.
* @param[in] outImgH output image height. * @param[in] outImgH output image height.
* @param[in] outImgW output image width. * @param[in] outImgW output image width.
* @param[in] outputH output batchSize. * @param[in] outputH output batchSize.
* @param[in] outputW output image data dim. * @param[in] outputW output image data dim.
* @param[in] numChannels number of channels. * @param[in] numChannels number of channels.
* @param[in] ratioH inImgH / outImgH. * @param[in] ratioH inImgH / outImgH.
* @param[in] ratioW inImgW / outImgW. * @param[in] ratioW inImgW / outImgW.
* *
*/ */
extern void hl_bilinear_backward(real* inGrad, extern void hl_bilinear_backward(real* inGrad,
const size_t inImgH, const size_t inImgH,
const size_t inImgW, const size_t inImgW,

@ -14,10 +14,9 @@ limitations under the License. */
#include "hl_cuda_cublas.h" #include "hl_cuda_cublas.h"
#include <sys/time.h> #include <sys/time.h>
#include <mutex>
#include "hl_cuda.h" #include "hl_cuda.h"
#include "hl_dso_loader.h"
#include "hl_thread.ph" #include "hl_thread.ph"
#include "paddle/utils/DynamicLoader.h"
#include "paddle/utils/Logging.h" #include "paddle/utils/Logging.h"
namespace dynload { namespace dynload {

@ -15,10 +15,9 @@ limitations under the License. */
#include "hl_cuda_cudnn.h" #include "hl_cuda_cudnn.h"
#include <cudnn.h> #include <cudnn.h>
#include <gflags/gflags.h> #include <gflags/gflags.h>
#include <mutex>
#include "hl_cuda_cudnn.ph" #include "hl_cuda_cudnn.ph"
#include "hl_dso_loader.h"
#include "hl_thread.ph" #include "hl_thread.ph"
#include "paddle/utils/DynamicLoader.h"
#include "paddle/utils/Logging.h" #include "paddle/utils/Logging.h"
DEFINE_int32(cudnn_conv_workspace_limit_in_mb, DEFINE_int32(cudnn_conv_workspace_limit_in_mb,

@ -21,11 +21,10 @@ limitations under the License. */
#include <sys/syscall.h> #include <sys/syscall.h>
#include <sys/time.h> #include <sys/time.h>
#include <unistd.h> #include <unistd.h>
#include <mutex>
#include "hl_cuda.ph" #include "hl_cuda.ph"
#include "hl_thread.ph" #include "hl_thread.ph"
#include "hl_dso_loader.h"
#include "paddle/utils/Logging.h" #include "paddle/utils/Logging.h"
#include "paddle/utils/DynamicLoader.h"
// clang-format on // clang-format on
namespace dynload { namespace dynload {

@ -14,7 +14,7 @@ limitations under the License. */
#include "hl_warpctc_wrap.h" #include "hl_warpctc_wrap.h"
#include <mutex> #include <mutex>
#include "hl_dso_loader.h" #include "paddle/utils/DynamicLoader.h"
#include "paddle/utils/Logging.h" #include "paddle/utils/Logging.h"
namespace dynload { namespace dynload {

@ -12,7 +12,7 @@ endif()
add_library(paddle_function STATIC ${cpp_files} ${cu_objs}) add_library(paddle_function STATIC ${cpp_files} ${cu_objs})
add_dependencies(paddle_function ${external_project_dependencies}) add_dependencies(paddle_function ${external_project_dependencies})
add_dependencies(paddle_function gen_proto_cpp)
if(WITH_GPU) if(WITH_GPU)
if(WITH_TESTING) if(WITH_TESTING)

@ -74,9 +74,9 @@ TEST(MulOp, DDDMatrixMul) {
} }
/** /**
* C += A * B, B, C dense, A sparse * C += A * B, B, C dense, A sparse
* dense = sparse * dense * dense = sparse * dense
*/ */
void testFuncDSparseDMatrix( void testFuncDSparseDMatrix(
size_t dimM, size_t dimN, size_t dimK, size_t nnz, SparseFormat FORMAT) { size_t dimM, size_t dimN, size_t dimK, size_t nnz, SparseFormat FORMAT) {
real scaleT = 1.0; real scaleT = 1.0;
@ -119,9 +119,9 @@ TEST(MuLOp, DSparseDMul) {
} }
/** /**
* C += A * B, A, C dense, B sparse * C += A * B, A, C dense, B sparse
* dense = dense * sparse * dense = dense * sparse
*/ */
void testFuncDDSparseMatrix( void testFuncDDSparseMatrix(
size_t dimM, size_t dimN, size_t dimK, size_t nnz, SparseFormat FORMAT) { size_t dimM, size_t dimN, size_t dimK, size_t nnz, SparseFormat FORMAT) {
real scaleT = 1.0; real scaleT = 1.0;
@ -165,9 +165,9 @@ TEST(MulOp, DDSparseMul) {
} }
/** /**
* C += A * B, A sparse, B, C dense * C += A * B, A sparse, B, C dense
* sparse = dense * dense * sparse = dense * dense
*/ */
void testFuncSparseDDMatrix( void testFuncSparseDDMatrix(
size_t dimM, size_t dimN, size_t dimK, size_t nnz, SparseFormat FORMAT) { size_t dimM, size_t dimN, size_t dimK, size_t nnz, SparseFormat FORMAT) {
real scaleT = 1.0; real scaleT = 1.0;

@ -21,7 +21,6 @@ limitations under the License. */
#include "MultiGradientMachine.h" #include "MultiGradientMachine.h"
#include "MultiNetwork.h" #include "MultiNetwork.h"
#include "NeuralNetwork.h" #include "NeuralNetwork.h"
#include "NeuralNetwork.h"
#include "ParallelNeuralNetwork.h" #include "ParallelNeuralNetwork.h"
#include "hl_gpu.h" #include "hl_gpu.h"

@ -637,7 +637,7 @@ void RecurrentGradientMachine::removeBeamSearchStatisticsCallbacks() {
/* create scattered id infomation for all realLayer of inFrameLines one time. /* create scattered id infomation for all realLayer of inFrameLines one time.
* If hasSubseq, will also create scattered sequenceStartPositions infomation * If hasSubseq, will also create scattered sequenceStartPositions infomation
* for all realLayer of inFrameLines one time. * for all realLayer of inFrameLines one time.
*/ */
void RecurrentGradientMachine::createInFrameInfo(int inlinkId, void RecurrentGradientMachine::createInFrameInfo(int inlinkId,
const Argument& input, const Argument& input,

Some files were not shown because too many files have changed in this diff Show More

Loading…
Cancel
Save