Zhaolong Xing
88b52a27fe
Inference: fix mask rcnn model diff, optim memory usage, memory leak. ( #18532 )
...
* Fix Mask rcnn predictor
1. refine memory optim algorithm to support the model with the block op.
2. output diff : modify the affine channel fuse
3. add condition_block_infer op
add interface for setting trt calib table dir
test=develop
* add the missing files.
test=develop
6 years ago
Yi Liu
a873fa84ce
supports collective training with programs ( #18392 )
...
1. Since allreduce op has 4 reduce types, We split these four reduce types into four ops
2. We also refined the collective op code, e.g. we separated the collective op kernel into CPUKernel and CUDAKernel, and remove the device specified DeviceContext parameter in template as we already knew the target DeviceContext
3. We remove the newly added Collective op role to reduce the complexity of program and graph analysis
7 years ago
xsrobin
47e2ef38e9
add "import paddle.fluid as fluid" to examples lack of it
7 years ago
lujun
fd6631ef2f
Fix dygraph show style ( #18297 )
...
Fix dygraph show style for FluidDoc.
7 years ago
tangwei12
999d9a59a5
fix communicator with pyreader ( #18350 )
...
* add is_runnning in communicator, test=develop
7 years ago
HaoRen
b7128bac5f
supports collective communicated training ( #18175 )
...
* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop
* supports collective training in executor
* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop
* fix comment
test=develop
* use unique name for nccl_id
* supports output to stream in program_to_code
* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code
* set op role in collective training
* add collective op role
* remove orig file
* add build optimizer by strategy
* add collective strategy
* refine collective strategy
* add multi-process role maker
* refine strategy building factory so that we can easily plugin more strategy
* scale loss grad in collective sgd transpiler
* add support for distributed fc
* code format
* revert some features for dist fc
* add support for distributed fc training
* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop
* supports collective training in executor
* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop
* use unique name for nccl_id
* supports output to stream in program_to_code
* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code
* set op role in collective training
* add collective op role
* fix comment
test=develop
* remove orig file
* add build optimizer by strategy
* add collective strategy
* refine collective strategy
* add multi-process role maker
* refine strategy building factory so that we can easily plugin more strategy
* scale loss grad in collective sgd transpiler
* add support for distributed fc
* code format
* revert some features for dist fc
* add support for distributed fc training
* test=develop
add collective op unittest standard
* test=develop
remove the test_collective directory
* test=develop
remove the test_collective directory
* remove slicegather test
* code format for reducescatter
* update attr of shard_index_op
* Modify macro nccl_helper
* remove test without distribute
* macro collective_helper
* marcro update
* test=develop
update support python3.5
* test=develop change gpu memory use to 0.1 when test
* test=develop
update ut equal func
* test=develop
set flags to 1.5
* test=develop fix pickle dumple py35
* test=develop
fix divide in slice and add sync_comm_stream
update atol and rtol to 1e-05
rm shard_index op and test
modify read input from file to read from memory
remove origin_program in framework and add i/o in c_sync_calc_stream
* test=develop update unittest sync operator I/O
7 years ago
Zeng Jinle
5826b72e06
Refine CUDAPlace error message. ( #18343 )
...
* refine cuda place error msg, test=develop
* use LOG(ERROR)+exit(-1), test=develop
7 years ago
jiaqi
3f8031e256
dataset ( #17973 )
...
(1) use channel instead of vector/BlockingQueue in Dataset,to keep same with existing implementation, and make code more readable and flexible (dataset single output channel or multi output channel). one previous memory out of limit problem is cause by not release memory after training.
(2) add Record because MultiSlotType costs too much memory (80B),fix memory out of limit problem.
(3) add Channel, Archive in paddle/fluid/framework
(4) change dataset from shared_ptr to unique_ptr in pybind
(5) move create/destroy readers from trainer to dataset
(6) move shuffle from datafeed to dataset. dataset holds memory, datafeed is only for load data and feed data to network.
(7) fix thread num bug of Dataset when filelist size < thread num
(8) support set_queue_num in InMemoryDataset
7 years ago
chengduo
25f3cd6486
Update execution_strategy option default value ( #18183 )
...
* update execution_strategy option default value
test=develop
* fix doc error
test=develop
7 years ago
Zeng Jinle
25ab23be28
Fix dygraph mem leak ( #18082 )
...
* fix dygraph mem leak, test=develop
* polish msg, test=develop
7 years ago
Sylwester Fraczek
accb132f0f
fix slim int8 mkldnn multithreading issue ( #18009 )
7 years ago
tensor-tang
5c06bff222
combine noavx and avx package ( #17889 )
...
* support avx and noavx core
* add catch and give some log
test=develop
* fix build
test=develop
* add missing package
test=develop
* fix pybind name
test=develop
* fix import error
test=develop
* conbime noavx core
test=develop
* add requirements
test=develop
* fix unkown message
test=develop
* fix api spec
test=develop
* refine and clean
test=develop
* update
* pass dist ut
* follow comments
test=develop
* refine scripts
test=develop
7 years ago
Jiabin Yang
4d5f6937c3
Feature/refine api for dygraph ( #17907 )
...
* WIP
* WIP
* test=develop, add api doc and example code for dygraph
7 years ago
gongweibao
fbbdc9ccad
Add backward and optimizer operator dependency pass. ( #17746 )
7 years ago
wopeizl
453a49b1bc
Make ParallelExecutor support Windows GPU ( #17787 )
...
* fix the ParallelExecutor on Windows
test=develop
* restrict to use one GPU only under windows
7 years ago
翟飞跃
993c703bcc
INT8 MKL-DNN v2 integrate to slim ( #17634 )
...
* refactor PR 16865
* delete mergetool files
* test=develop
* test=develop
* test=develop
* test=develop
* create dir for int8 model before call SaveOptimModel
* test=develop
* mkldnn int8 only support linux; test=develop
* refine code; test=develop
* remove comment; test=develop
* refine code; test=develop
* fix bug; test=develop
* add exception for mkldnn_post_training_strategy
* reuse int8v2 CAPI dataset; test=develop
* fix accuracy check bug; test=develop
* remove tab
* convert files to unix format
* test=develop
* reduce CI time;test=develop
* reduce CI time and refine code;test=develop
* refine comment; test=develop
* add cmake FLAGS;test=develop
* remove predict_num;test=develop
7 years ago
wopeizl
841553e13f
use pyreader to read data in dygraph mode ( #17314 )
...
* use pyreader to read data
* add return_list to PyReader to support return value represented as list
7 years ago
Zeng Jinle
674e0ce2d6
Use Python C-API to speed up dygraph trace ( #17837 )
...
* use python api to reduce python time cost, test=develop
* fix travis ci, test=develop
* fix Py_None error,test=develop
7 years ago
Jiabin Yang
3b70f870e2
Using Smart pointer to optimizer memory usage of dyGraph ( #17768 )
...
* for debug
* test=develop, memory optimize for dygraph using shared_ptr
* test=develop, fix travis ci showed error
* test=develop, fix bug for recurrent usage of varbase
* test=develop, init varbase when it need to be Add
7 years ago
guru4elephant
d52391094d
fix prepare context redundant code problem, optimize executor by cach… ( #17743 )
...
* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop
* cache sub_scope, program, var when use_program_cache=True is set
* make fetch_list runable with variables, add more unittest for use_program_cache
7 years ago
Zeng Jinle
432ac70124
clean code of py_layer in dygraph mode,test=develop ( #17661 )
7 years ago
gongweibao
65bbf950ee
Add multi-ncclcomm and 2D ncclallreduce support. ( #17263 )
7 years ago
Zhaolong Xing
61221ebc28
TRT: Support set dynamic range in int8 mode. ( #17524 )
...
* fluid int8 train and trt int8 predict align.
trt int8 predict init
op converter
* 2. align fluid int8 train and trt int8 inference.
enhance quant dequant fuse pass
enhance op converter, trt engine, trt engine op, trt subgraph pass.
* 3. add delete_quant_dequant_pass for trt
test=develop
* 4. add the missing file
test=develop
* 5. i modify the c++ interface, but forget to modify the pybind code
fix the IS_TRT_VERSION_GE bug, and fix elementwise op converter
test=develop
7 years ago
wopeizl
6724a652f3
add __str__ method for tensor and lodtensor to support print test=dev… ( #17588 )
...
* add __str__ method for tensor and lodtensor to support print test=develop
7 years ago
guru4elephant
326bf8291a
add Run Prepared Ctx ( #17616 )
...
add Run Prepared Ctx, fix pybind problem
7 years ago
flame
2280f185d7
BuildStrategy api comment ( #17348 )
...
Python examples of fluid.layers.io.double_buffer and some BuildStrategy's methods.
7 years ago
guru4elephant
7f8bc49d00
polish_executor_and_add_ctx_cache ( #17536 )
...
* polish_executor_and_add_ctx_cache
7 years ago
Zeng Jinle
c6189637cd
Fix allocator bug ( #16712 )
...
* Revert "Revert "Fix allocator bug""
This reverts commit 174d0d0b90 .
* Revert "fix travis ci"
This reverts commit 5656fa9f7c .
test=develop
* add inlined_vector.h, test=develop
* add inlined_vector_test,test=develop
7 years ago
Qiao Longfei
92e7d5d7cc
fix distribute doc test=develop ( #17318 )
...
* fix distribute doc
7 years ago
Qiao Longfei
58f7695ab2
Async exe support communicator ( #17386 )
...
Async exe support communicator
7 years ago
Tao Luo
32da5e9c3d
remove unused expected_kernel_cache_pass ( #17486 )
...
test=develop
7 years ago
Yan Xu
0217555530
polish parallel dygraph code ( #17164 )
...
* add var grad hook test=develop
7 years ago
Jiabin Yang
d7df4e5e5b
Fix/Fix memory leak in dygraph ( #17394 )
...
* test=develop, add gradient sort backward strategy
* test=develop, fix test by add FLAGS_cudnn_deterministic on new tests
* test=develop, fix memory leak in dygraph mode
* test=develop, fix memory leak in dygraph mode
* test=develop, polish code
* test=develop, polish code
* test=develop, polish code
7 years ago
Zhen Wang
4a1b7fec96
Add setting Scope function for the graph class ( #17417 )
...
* add set_not_owned function for graph
* add scope set. test=develop
* add scope_ptr enforce not null before setting.test=develop
7 years ago
jiaqi
66d51206b1
add save/load model, shrink table, cvm, config file & fix pull dense bug ( #17118 )
...
* add save/load model, shrink table, cvm, config file & fix pull dense bug
test=develop
* fix global shuffle bug, fix pull dense bug, fix release memeory bug, fix shrink error
add client flush, add get data size
test=develop
* fix global shuffle bug
test=develop
* fix global shuffle bug
test=develop
* fix code style
test=develop
* fix code style & modify pslib cmake
test=develop
* fix error of _role_maker
test=develop
* fix code style
test=develop
* fix code style
test=develop
* fix code style
test=develop
* fix code style
test=develop
* fix code style
test=develop
* fix windows compile error of fleet
test=develop
* fix global shuffle bug
* add comment
test=develop
* update pslib.cmake
test=develop
* fix fill sparse bug
test=develop
* fix push sparse bug
test=develop
7 years ago
Tao Luo
68ec0a6f74
make parallel_executor support FLAGS_use_mkldnn ( #17341 )
...
* make parallel_executor support FLAGS_use_mkldnn
test=develop
* add warning when set mkldnn_enabled_op_types_ in non-mkldnn env
test=develop
7 years ago
Jiabin Yang
4624d7c642
test=develop, add gradient sort backward strategy ( #17125 )
...
* test=develop, add gradient sort backward strategy
* test=develop, fix test by add FLAGS_cudnn_deterministic on new tests
7 years ago
chengduo
bc833945a4
Add DropLocalExeScopes in ParallelExecutor ( #17297 )
...
* reset drop local scope counter
test=develop
7 years ago
qingqing01
e32c9888f5
Double backward of conv2d. ( #17211 )
...
* Add conv2d_grad_grad_op
* Extracte the cuDNN conv algo searching code in conv_cudnn_helper.h.
- Now use it in conv2d_grad_grad.
- Will simply the searching code in conv2d and conv2d_grad in next PR.
* Enhance and fix bug in unit testing of gradient_checker.
* Support to fetch empty variables,return None in Python.
7 years ago
lujun
e388a1fb66
Repair api example ( #17221 )
...
Fix the following API examples:
paddle.fluid.scope_guard
paddle.fluid.backward.append_backward
paddle.fluid.cpu_places
paddle.fluid.cuda_pinned_places
paddle.fluid.cuda_places
paddle.fluid.in_dygraph_mode
paddle.fluid.CUDAPlace
paddle.fluid.CPUPlace
paddle.fluid.CUDAPinnedPlace
7 years ago
chengduo
04bd413acb
Code Clean: Move all pass to paddle::framework::ir ( #17228 )
...
* move pass to ir
* polish code
test=develop
* fix dependency
test=develop
7 years ago
Zeng Jinle
f2fa3f7300
fix api doc,test=develop ( #17241 )
7 years ago
石晓伟
a72dbe9abf
Cherry-pick benchmark related changes from release/1.4 ( #17156 )
...
* cherry-pick commit from 8877054
* cherry-pick commit from 3f0b97d
* cherry-pick from 16691:Anakin subgraph support yolo_v3 and faster-rcnn
(cherry picked from commit 8643dbc233 )
* Cherry-Pick from 16662 : Anakin subgraph cpu support
(cherry picked from commit 7ad182e16c )
* Cherry-pick from 1662, 16797.. : add anakin int8 support
(cherry picked from commit e14ab180fe )
* Cherry-pick from 16813 : change singleton to graph RegistBlock
test=release/1.4
(cherry picked from commit 4b9fa42307 )
* Cherry Pick : 16837 Support ShuffleNet and MobileNet-v2
Support ShuffleNet and MobileNet-v2, test=release/1.4
(cherry picked from commit a6fb066f90 )
* Cherry-pick : anakin subgraph add opt config layout argument #16846
test=release/1.4
(cherry picked from commit 8121b3eccb )
* 1. add shuffle_channel_detect
(cherry picked from commit 6efdea8997 )
* update shuffle_channel op convert, test=release/1.4
(cherry picked from commit e4726a066f )
* Modify symbol export rules
test=develop
7 years ago
Zeng Jinle
c5eeecca7c
Fix tensor_py.h ( #17195 )
...
* fix tensor_py,test=develop
* change class name,test=develop
7 years ago
Zeng Jinle
5dfe2ab9e8
Fix mem leak when converting Tensor to numpy array ( #17182 )
...
* fix mem leak when converting Tensor to numpy array
test=develop
* remove unused unittest,test=develop
* follow comments, test=develop
* fix dygraph bug,test=develop
7 years ago
Yan Xu
0b07eef118
ParallelDyGraph with GPU collective mode ( #16827 )
...
implement dygraph.parallel.DataParallel to hook reduce op.
7 years ago
guru4elephant
03d469ad98
Merge pull request #17005 from wopeizl/fix_ncclwrapper_win1
...
fix nccl wrapper on windows
7 years ago
liuwei1031
a770ce0615
add doc for memory_optimize, test=develop ( #17010 )
...
* add doc for memory_optimize, test=develop
* update doc, test=develop
* doc update, test=develop
7 years ago
qingqing01
ea42e431f8
Speed unit testing. ( #16978 )
...
* Speed affine_channel_op unit testing
* Add check in tensor_py
* Fix ONLY_CPU Compiling
7 years ago
wopeizl
51a0243a56
fix nccl wrapper on windows
...
test=develop
7 years ago
Zeng Jinle
1202d3fc74
Refine model gpu memory ( #16993 )
...
* speedup gc and inplace softmax_with_cross_entropy_grad
test=develop
* refine models gpu mem
Merge skip vars and warning messages of mem opt
remove relu mem opt
test=develop
* follow comments
test=develop
7 years ago
guru4elephant
bbc6c5714f
Merge pull request #16887 from guru4elephant/add_nccl_context_pybind
...
Add nccl context pybind
7 years ago
gongweibao
cbdb8a17b1
Polish DGC code ( #16818 )
7 years ago
dongdaxiang
466d177d09
add pybind dependency
...
test=develop
7 years ago
dongdaxiang
4aa6f679b5
add pybind dependency
...
test=develop
7 years ago
dongdaxiang
b091139049
add nccl wrapper for python API
7 years ago
Yiqun Liu
112f16143b
Add an option to enable the cache of expected kernel in train phase. ( #16724 )
...
* Add an option to enable the cache of expected kernel in train phase.
test=develop
* Change the default value of cache_expected_kernel to true.
7 years ago
chengduo
55b15db5af
Add unit test for fuse all_reduce ops ( #16699 )
...
* test fuse all_reduce
7 years ago
Yiqun Liu
3fe8cb0dd7
Enable the runtime_context_cache pass in train phase ( #16640 )
...
* Try to enable the runtime_context_cache pass in train phase.
* Put the append of runtime_context_cache pass ahead of multi_dev passes.
test=develop
7 years ago
guru4elephant
7d653f0aed
Merge pull request #16652 from xjqbest/dataset_merge_develop
...
fix dataset bug
7 years ago
xjqbest
6a57e8075a
remove trainer_id in datafeed and dataset
...
test=develop
7 years ago
Yan Xu
b4c3a6aa0b
[Imperative] implement imperative NCCLParallelContext ( #16477 )
...
add NCCLParallelContext for parallel dygraph
7 years ago
xjqbest
271b7147cc
fix dataset bug
...
test=develop
7 years ago
chengduo
b75a69bad6
Add Stream for fetch op handle ( #16600 )
...
* expose fuse broadcast ops
7 years ago
乔龙飞 Qiao Longfei
21622ca30b
Merge pull request #16172 from jacquesqiao/add-async-ssa-graph-executor-communicator
...
Add async ssa graph executor communicator
7 years ago
sneaxiy
10249c0b78
Merge develop
...
test=develop
7 years ago
Qiao Longfei
adf272bcec
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async-ssa-graph-executor-communicator
...
test=develop
7 years ago
xjqbest
9b84e8e66b
fix code style
...
test=develop
7 years ago
xjqbest
a99c8d0c29
fix client to client communication bug
...
test=develop
7 years ago
sneaxiy
33473890f3
Merge develop
...
test=develop
7 years ago
dongdaxiang
720647e17f
rebase current develop and fix conflict
...
test=develop
7 years ago
dongdaxiang
45eb6f0765
run pre-commit check files and fix code style problem
...
test=develop
7 years ago
xjqbest
e95cafd9a7
fix code style & add dataset testcase
...
test=develop
7 years ago
xjqbest
be74de2c61
fix code style & fix register bug & add release_memory
...
test=develop
7 years ago
xujiaqi01
a5b1a0e12b
support multi dataset && add init model && fix bug
7 years ago
dongdaxiang
b7a202aa38
add distributed optimizer factory
7 years ago
dongdaxiang
f612877797
add incubate for unified API
7 years ago
dongdaxiang
317eb0aad3
add incubate for unified API
7 years ago
xujiaqi01
ecfc7df913
add dataset factory && fix style
7 years ago
xujiaqi01
3cea00bd52
store memory data in Dataset && fix bug
7 years ago
dongdaxiang
cc4def6ba5
fix some conflict for compilation
7 years ago
heqiaozhi
9bca1926c1
refactor & fix bug
7 years ago
xjqbest
2e9a836c6f
add DataSet and InMemoryDataFeed, support load data into memory and shuffle data
7 years ago
dongdaxiang
e36bbcc871
fix some typo and CMakefile.txt
7 years ago
xjqbest
824b84d185
add DataSet and InMemoryDataFeed, support load data into memory and shuffle data
7 years ago
dongdaxiang
be757096da
add pybind for fleet
7 years ago
Qiao Longfei
d8974e6da0
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async-ssa-graph-executor-communicator
...
test=develop
7 years ago
chengduo
1096746cbf
Fuse Adam And SGD ops ( #15933 )
...
* fuse optimizer
7 years ago
sneaxiy
2c836ff914
check default grad maker
...
test=develop
7 years ago
Zeng Jinle
69cb9792ea
Merge pull request #16506 from sneaxiy/revert-16424-fix_allocator_bug
...
Revert "Fix allocator bug"
7 years ago
chengduo
ed61d67c73
Fix the interface of Pass::Apply ( #16484 )
...
* modify the interface of Pass::Allay
test=develop
* Polish code
test=develop
* Fix Travis CI
test=develop
* fix Pass::Apply interface
test=develop
* Fix Travis CI
test=develop
7 years ago
Zeng Jinle
174d0d0b90
Revert "Fix allocator bug"
...
add include headers to fix travis-ci
test=develop
7 years ago
gongweibao
eb83abeac3
Add DGC(Deep Gradient Compression) interface. ( #15841 )
7 years ago
Zeng Jinle
644e8af4cf
Merge pull request #16424 from sneaxiy/fix_allocator_bug
...
Fix allocator bug
7 years ago
Zeng Jinle
c7c6eeb44e
Merge pull request #16409 from sneaxiy/feature/advance_gc
...
Enhance gc to support deleting tensor buffer in advance
7 years ago
wopeizl
c300b1ba69
Tensor index ( #16223 )
...
* extend the slice function for python
test=develop
7 years ago
Xin Pan
f8c279b11c
Merge pull request #16454 from panyx0718/imperative2
...
polish deepCF model to support real dataset
7 years ago
Qiao Longfei
30618409db
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async-ssa-graph-executor-communicator
7 years ago
chengduo
4f2278f032
Add doc for CPUPlace CUDAPlace CUDAPinPlace ( #16442 )
...
test=develop
7 years ago
sneaxiy
78fb3a62e0
fix env variable settting bug
...
test=develop
7 years ago
sneaxiy
2d92b6be98
merge develop
...
test=develop
7 years ago
Xin Pan
fd24ab47ab
polish
...
test=develop
7 years ago
sneaxiy
a7d0ac50b8
Merge develop
7 years ago
sneaxiy
7000ec85d9
fix some op grad maker
...
fix ctest eager deletion disable bug
test=develop
7 years ago
sneaxiy
f8ed2c229e
try to fix ci error
...
test=develop
7 years ago
sneaxiy
c20db6357b
split PR
...
test=develop
7 years ago
sneaxiy
2f54d9f995
Merge develop
...
test=develop
7 years ago
sneaxiy
a93a9eef8f
add op registry type
...
refine gc code
test=develop
7 years ago
sneaxiy
953214ad97
add more unittest
...
modify allocator strategy
remove changes of legacy buddy_allocator
test=develop
7 years ago
chengduo
f26ba5bddd
Fuse AllReduce ( #15921 )
...
* fuse all_reduce
test=develop
* add fuse_parameter_groups_size
test=develop
* Polish code
test=develop
* Fix travis-ci
test=develop
* Add SetGroupAccordingToLayers and SetGroupAccordingToGroupSize
test=develop
* Add SetGroupAccordingToMemorySize
test=develop
* fix multi_devices_graph
test=develop
* reset params_grads
test=develop
* Polish code
test=develop
7 years ago
Tao Luo
7d2740db83
Revert "cache runtime_context"
7 years ago
sneaxiy
fd23262e0c
merge develop, fix conflict
...
test=develop
7 years ago
Qiyang Min
c7f1f3ed0c
Merge pull request #16214 from velconia/imperative_infer_var_type
...
Implement imperative infer var type
7 years ago
Tao Luo
dbb92ee4b1
Merge pull request #16002 from luotao1/runtime_context
...
cache runtime_context
7 years ago
sneaxiy
161b8ddcaa
Merge develop
7 years ago
minqiyang
b40e41fbd1
Polish code style
...
test=develop
7 years ago
Qiyang Min
8e4ad008fb
Merge pull request #16198 from velconia/imperative_train_speed
...
Improve imperative mode training speed
7 years ago
minqiyang
36dce65bb3
Take DataType and VarType apart
...
test=develop
7 years ago
minqiyang
438bca9c3d
Implement Runtime Var Type Inference
...
test=develop
7 years ago
luotao1
1b59bed989
Merge branch 'develop' into runtime_context
7 years ago
qingqing01
8ad672a287
Support sync batch norm. ( #16121 )
...
* Support Sync Batch Norm.
* Note, do not enable it in one device.
Usage:
build_strategy = fluid.BuildStrategy()
build_strategy.sync_batch_norm = True
binary = fluid.compiler.CompiledProgram(tp).with_data_parallel(
loss_name=loss_mean.name,
build_strategy=build_strategy)
7 years ago
minqiyang
7355d41834
1. Add imperative gperf profiler
...
2. Add binutils 2.27 in manylinux support
test=develop
7 years ago
luotao1
b2898c0f57
Merge branch 'develop' into runtime_context
...
test=develop
7 years ago
minqiyang
98dfb492bb
Release GIL lock
7 years ago
sneaxiy
ac0e0f5181
merge develop
...
test=develop
7 years ago
minqiyang
42e96a029f
Accelerate CPU part
7 years ago
sneaxiy
682f2dbf29
merge develop
...
test=develop
7 years ago
sneaxiy
2c4fcaa683
merge develop
7 years ago
luotao1
d94fd97230
add runtime_context_cache_pass
...
test=develop
7 years ago
Yan Xu
30568473ec
fix broadcast on mp mode ( #15951 )
...
* fix broadcast with mp mode
* polish code test=develop
* fix bcast strategy test=develop
* fic cpplint test=develop
* fix py3 failed test=develop
* fix comment test=develop
* update comment test=develop
7 years ago
baojun
e3c37bd564
remove const_cast and refactor ngraph engine code ( #15925 )
...
* remove concast_cast and refactor code test=develop
* reduce flag use test=develop
7 years ago
Zhen Wang
ac6ef06ffa
Add the Clone method in Graph. test=develop
7 years ago
Zhen Wang
01eddf125c
Not add graph copy construction method. test=develop
7 years ago
Zhen Wang
1b9c8d5f06
add clone function for IrGraph. test=develop
7 years ago
Qiyang Min
1f4aa7a202
Imperative remove all descs ( #16045 )
...
* Remove Desc in Forward Pass
* Refactor VarBase
* Add dbg info
* Only check type in imperative mode
* Polish code and support optimizer
test=develop
* Fix stop gradient problem in PyLayer
test=develop
7 years ago
Zeng Jinle
472f16b5aa
Merge pull request #16063 from sneaxiy/enhance_gc
...
Enhance gc
7 years ago
wopeizl
a38db3cb99
Fixrecordio ( #16124 )
...
* fix recordio on win
test=develop
* test=develop
* test=develop
* fix code style
test=develop
* test=develop
7 years ago
sneaxiy
b80d76f784
merge develop
7 years ago
sneaxiy
732fa00eaf
disable gc in recurrent_op currently
...
test=develop
7 years ago
Tao Luo
6f2581e4c5
Merge pull request #16090 from lidanqing-intel/paddle-int32
...
Add PaddleDType INT32 support
7 years ago
Zhaolong Xing
3d63aa0a11
Merge pull request #15729 from NHZlX/add_static_model_load_for_trt
...
Four points for enhancing Paddle-TRT
7 years ago
nhzlx
a9ed427749
cant not pass ci
...
add if use static engine for trt
test=develop
7 years ago
lidanqing
4aeb261da9
Add INT32 support. INT32 in last switch case
...
test=develop
7 years ago
sneaxiy
2a639d5c2a
add allocator chain to fix bug
...
test=develop
7 years ago
Qiao Longfei
8744f9a083
fix parallel executor async mode
7 years ago
Qiao Longfei
e70b1727ef
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async-ssa-graph-executor
7 years ago
Qiao Longfei
847e4f4e85
pure async mode train
7 years ago
sneaxiy
3334c279d0
add sample_generator
...
test=develop
7 years ago
Qiyang Min
187cffd019
Merge pull request #15928 from velconia/imperative_backward_hooks
...
Imperative backward hooks
7 years ago
minqiyang
ac88c62a5b
Reset output var's pre_op pointer when op was destructed
7 years ago