* Implement the operator with sprase matrix multiply
* Update the URL of mklml library.
test=develop
* Disable MKLML implematation when using no-linux.
test=develop
* Ignore the deprecated status for windows
test=develop
* test=develop,fix the inference library compilation bug on windows
* test=develop,Fix the inference library compilation bug on windows
* test=develop,fix the bug that PYTHON_EXECUTABLE not exists
* update anakin-engine interfaces for content-dnn
test=develop
* support only-gpu mode of Anakin
modify eltwise parse
test=develop
* modification for thread-safe
test=develop
* Integrated template instance
test=develop
* increase template parameters
test=develop
* support MLU predictor
test=develop
* update anakin cmake files
test=develop
* update TargetWrapper::set_device
* update the initialization of anakin subgraph
test=develop
* use the default constructor of base class
test=develop
* load model from buffer with length
test=develop
* modify the access level of class
test=develop
* support anakin for bitmain arch
test=develop
* remove files
* checkout cmakelists
test=develop
* Optimize the concat and split kernel for special cases that the number of inputs/outputs is 2.
test=develop
* Refine codes.
test=develop
* Correct the condition.
test=develop
* Move the define of tmp_data outside the if statement.
* Print the cudnn minor version.
test=develop
* Fix the case when in_num/o_num is 1 in concat/split op.
test=develop
* Remove const_cast.
test=develop
* fuse mul and elementwise add to fc
* Reimplement the FC forward operator
* Fix FC MKLDNN integration by transposing weights
* Add FC MKLDNN Pass
test=develop
* FC MKLDNN Pass: change memcpy to std::copy
* Fix MKLDNN FC handling of mismatch input and weights dims
* Lower tolerance for MKL-DNN in resnet50 test
test=develop
* Adjust FC to support MKLDNN Op placement
test=develop
* Adjust Placement Op to set use_mkldnn attribute for graph
test=develop
* MKLDNN FC: fix weights format so that gemm version is called
test=develop
* FC MKLDNN: Remove tolerance decrease from tester_helper
* FC MKL-DNN: Refactor the code, change input reorder to weight reorder
* MKL-DNN FC: Introduce operator caching
test=develop
* FC MKL-DNN: Fix the tensor type in ExpectedKernelType
test=develop
* FC MKL-DNN: fix style changes
test=develop
* FC MKL-DNN: fallback to native on non-supported dim sizes
test=develop
* FC MKLDNN: fix CMake paths
test=develop
* FC MKLDNN: Refine placement pass graph mkldnn attribute
test=develop
* Fix Transpiler error for fuse_conv_eltwise
test=develop
* Fix missing STL includes in files
test=develop
* FC MKL-DNN: Enable new output size computation
Also, refine pass to comply with newest interface.
test=develop
* FC MKL-DNN: enable only when fc_mkldnn_pass is enabled
* FC MKL-DNN: Allow Weights to use oi or io format
* FC MKL-DNN: Adjust UT to work with correct dims
test=develop
* Enable MKL DEBUG for resnet50 analyzer
test=develop
* FC MKL-DNN: Improve Hashing function
test=develop
* FC MKL-DNN: Fix shape for fc weights in transpiler
* FC MKL-DNN: Update input pointer in re-used fc primitive
* Add log for not handling fc fuse for unsupported dims
test=develop
* FC MKL-DNN: Move transpose from pass to Op Kernel
test=develop
* FC MKL-DNN: Disable transpose in unit test
test=develop
* FC MKL-DNN: Remove fc_mkldnn_pass from default list
* Correct Flag for fake data analyzer tests
test=develop
* FC MKL-DNN: Add comment about fc mkldnn pass disablement
test=develop
* FC MKL-DNN: Disable fc in int8 tests
test=develop
* link the libwbaes.so into paddle
* polish detail, test=develop
* try fix mac_pr_ci error, test=develop
* add compile option, test=develop
* fix ci error, test=develop
* ignore failed to find mac lib, test=develop
* change cdn to bj, cdn can't get the latest version
* trigger ci, test=develop
* temporary delete win32 lib linking, test=develop
* change https to http, test=develop
* turn compile option on to off
* turn compile option off to on, test=develop
* try lib compiled by gcc4.8, test=develop
* update lib version, test=develop
* link other lib, test=develop
* add setup config
* delete false, test=develop
* delete no_soname, test=develop
* recover so name set
* fix, test=develop
* adjust make config, test=develop
* remove link to wbaes, test=develop
* remove useless define, test=develop
* Support Sync Batch Norm.
* Note, do not enable it in one device.
Usage:
build_strategy = fluid.BuildStrategy()
build_strategy.sync_batch_norm = True
binary = fluid.compiler.CompiledProgram(tp).with_data_parallel(
loss_name=loss_mean.name,
build_strategy=build_strategy)
* Upgrade MKLDNN to v0.18-rc and fix issue caused by lib/lib64
Upgrade MKLDNN to v0.18-rc
Also fix the issue during upgrade
test=develop
* Rebase MKLDNN to rls-v0.18 branch
Some issues in v0.18-rc which caused INT8 conv op unit test failure was fixed
in rls-v0.18 branch
test=develop
* Upgrade MKLDNN from v0.18rc to formal v0.18 tag
test=develop
* Fix the windows compile issue.
test=develop
* Optimize for gelu operator
* Set up the low accuracy mode of MKL ERF function.
test=develop
* Only enable MKLML ERF when OS is linux
* Use the speical mklml version included vmsErf function to verify gelu mkl kernel.
test=develop
* Add the CUDA macro to avoid NVCC's compile issue.
test=develop
* Add the TODO comments for mklml library modification.
test=develop
* Clean Code
test=develop
* Add the comment of marco for NVCC compiler.
test=develop
* Simplify the compare op for CPU.
* Use asynchronous tensor copy in reshape_op's kernel.
* Optimize while_op for test, avoiding creating variables every time.
test=develop
* Enable the cache of kernel type and kernel function.
test=develop
* Enable profiling with gperftools.
* Remove flags for testing, and fix the linking error.
test=develop
* Delete the codes of ChooseKernel.
test=develop
* Fix bug when preparing ExecutorPrepareContext for while_op.
* Fix missing depending on grpc libraries.
* Remove the redundant print.
test=develop
* Follow comments.
* Remove the codes related to prepare the ExecutorPrepareContext for while_op.
test=develop
* Inception fusion operator.
* Support horizontal layer fusion in conv_fusion_op.
* Search conv algo strategy for variable-length input.
search N times and cache the searched algos. For other input, choose the algo of input whose area is closest to this input.
* HIP cmake.
Enable whole archieve build for pybind library.
Disable two warning.
Rollback to C++11.
Link RCCL to WA gpu kernel loading issue.
Update eigen to fix build failure.
Add more include directories.
Fix O3 build failure.
Update eigen.
fix tensor_util_test segment fault issue
add more macro check in hip.cmake.
we may consider refine hip.cmake to inherit all add_definitions() in parrent scope, in the future.
Fix rocRAND load.
Update eigen to fix gru_unit_op and reduce_op.
Add HIP support to testing.
Update eigen to support int16 and int8 in arg min and arg max.
* add rocprim as cub library used by nv implementation
* Reduce build time in rocprim.
* Add rocprim introduction, remove useless cmake code.
* Remove useless flags and format cmake file.
* add recordio support
* disable the openblas multi-thread on windows since no support
adjust the python script
* code style
* code style
test=develop
* add create_recordio_file_reader back
* fix code style
test=develop
* fix the gtest.cmake on windows
* fix cc_test on windows
* fix the win build
test=develop
* remove fused compile support on windows
test=develop
* add the jit support
test=develop
* add the jit support, test=develop
* add the jit support, test=develop
* add the jit back
fix compile error on windows
* rollback test=develop
* test case fix
* disable DSO by default on windows
* exclude warpctc_op on windows
* exclude the dynload_warpctc out on windows
test=develop
* fix the scripts error
test=develop
* disable avx on windows by default
test=develop
* re-organize the cmake file
* disable mkl on windows by default
* add warp_ctc back
* fix the dependency
* fix the dependency
* fix the build issue on windows
* remove unsupported flag on windows
* code style
* code style
test=develop
* fix issue
* add profiler, parallel_executor back
* clean up the pre-definitions on windows
* fix build issue
* test=develop