hutuxian
c756b5d231
Paddlebox Framework ( #18982 )
...
* Support looking up embeddings from BoxPS.
* Add a _pull_box_sparse op, for now this op is not exposed to users.
* Add a BoxHelper class, providing 'BeginPass', 'EndPass', 'FeedPass' functions and so on.
* Add 'BoxPSDataset' in python code.
* Add a compile options WITH_BOX_PS and a MACRO PADDLE_WITH_BOX_PS.
* Add UT.
* More concrete information pls refer to: https://github.com/PaddlePaddle/Paddle/pull/18982
6 years ago
Thunderbrook
1fe468d319
support debug each output of each ins ( #19004 )
...
* dump slot
* test
* proto
* dump slot
* test
* proto
* code style
* code style
* code style
* style
* add delete after unseen days
* add unseen days
* code style
* conflict solve
test=develop
* add clear model
* code style
test=develop
* code style
test=develop
* support debug tensor of each ins
test=develop
* support debug tensor of each ins
test=develop
* learning rate
* code style
* code style
* code style
* code style
* code style
* code style
* code style
* code style
* code style
* code style
* code style
* code style
* code style
test=develop
* code style
test=develop
* unitest
* style
* style
* multi phase
* add channel
* code style
* style
* style
* unitest
* style
* define
* define
test=develop
* style
test=develop
* rm define
test=develop
* linux
* linux
test=develop
* style
test=develop
* output format
test=develop
* windows ci
test=develop
6 years ago
Leo Chen
6fb310ae29
Fix bug of getting bool Flags from os.environ ( #19349 )
...
* fix bug of getting bool Flags from os.environ, test=develop
* add empty loss_name in CompiledProgram for inplace grad test, test=develop
6 years ago
liu zhengxi
32598ffd8f
Python infer api update and add unit test ( #19353 )
...
* python inference api supports numpy and add unit test, fix unit test fail in test_slim_int8_googlenet and test_slim_int8_mobilenet
6 years ago
Leo Chen
a9d5fc5142
Enhance OpTest to check the consistency of operators when using and not using inplace ( #19101 )
...
* add pybind interface to get all inplace ops, test=develop
* enhance OpTest to check whether the consistency of operator when using and not using inplace, test=develop
* handle corner cases in op_test, test=develop
* support outputs without tensor holder_, like XShape in reshape_op, test=develop
* fix bug, some op has GradOpMaker, but actually no grad_op in OpInfoMap, test=develop
* use reshape_grad instead of reshape in FlattenGradOp, test=develop
* fix error debug dims info for variables like XShape, test=develop
* change computational order in sum_op to relieve computation difference using inplace, test=develop
* add inplace_atol to check group_norm, and skip inplace_grad for mkldnn, test=develop
* follow sneaxiy's comments, test=develop
* remove unused DefaultGradOpDescMaker in mkldnn op, test=develop
6 years ago
Zeng Jinle
5b6673c44d
merge develop to solve conflict, also fix API doc, test=develop ( #18823 )
6 years ago
Tao Luo
5f5648a8ff
Revert "Python inference API support numpy ( #19009 )" ( #19160 )
...
test=develop
6 years ago
flame
b7e1a1d7e7
Python inference API support numpy ( #19009 )
...
test=develop
6 years ago
yaoxuefeng
9150cf50fc
add save cache model api in fleet& add slots shuffle in dataset module & add metric op to calculate ctr related metrics ( #18871 )
...
* add ctr related metric layer test=develop
* add save cache and slots shuffle test=develop
* add save cache and slots shuffle test=develop
* fix error
* fix error
* fix style for ci
* fix for comments
* change SlotsShuffle input to std::strinf for generality
* fix style
* fix style
* fix style
* fix style
* fix style
* fix style
* fix stylr
* fix style
* fix style
* fix style
* fix style
* fix style
* fix style
* fix style
* fix style
* fix style
* fix style
* fix style
* fix style
* fix style
* change non-const reference to pointer
* fix style
* fix style
* fix style test=develop
* fix style test=develop
* add return ins num in ctr metric op
* change dtype to float in metric_op.py
* fix error test=develop
* fix style test=develop
* fix API spec
* fix API spec
* fix API spec test=develop
* add UT test=develop
6 years ago
Zeng Jinle
88f111f885
remove unused inplace act codes, test=develop ( #19079 )
6 years ago
jiaqi
a99bc64c63
add fleet util, add some interface in hdfs util ( #18752 )
...
* add fleet util (fleet/utils/fleet_util.py): functions for users' convenience
* add some interface in hdfs util : hdfs is_file、hdfs cat
6 years ago
Leo Chen
8f53735437
Fix memory overwriting of tensors returned by executor ( #19030 )
...
* fix memory overlapping of fetch var (return of executor.run), test=develop
* fix wrong usage of ParallelExecutor in op_test, test=develop
* remove useless parameter and simplify code
* avoid tensor destruct untimely, test=develop
* add testcase independent of OpTest, test=develop
6 years ago
liuwei1031
a43a763b54
fix warpctc.dll not found issue ( #18761 )
...
* fix warpctc.dll not found issue, test=develop
* revert the linux platform change, test=develop
* delete warpctc_lib_path.h.in, test=develop
* add SetPySitePackagePath function
* fix warpctc.dylib not found issue on Mac, test=develop
* improve the paddle lib path setting logic, test=develop
* fix mac ci issue caused by test_warpctc_op unittest, test=develop
* tweak code, test=develop
6 years ago
flame
65d987527d
python inference enable_memory_optim( #18817 )
...
python inference API support enable_memory_optim
6 years ago
Zhaolong Xing
61238d31f7
Trt fp16 support ( #18860 )
...
* Fix Mask rcnn predictor
1. refine memory optim algorithm to support the model with the block op.
2. output diff : modify the affine channel fuse
3. add condition_block_infer op
add interface for setting trt calib table dir
test=develop
* add the missing files.
test=develop
* 1 add trt fp16 support
test=develop
6 years ago
chengduo
20859c08e8
[DyGraph] Make multi-card program faster ( #18892 )
...
* update parallel.py
test=develop
6 years ago
Zeng Jinle
8008ab4e6b
Remove legacy C++ memory optimization codes ( #18834 )
...
* remove legacy memory optimization codes, test=develop
* follow huihuang's comments,test=develop
* follow luotao's comments, test=develop
6 years ago
Thunderbrook
52c1431eee
add clear_model interface in fleetwrapper ( #18815 )
...
* dump slot
* test
* proto
* dump slot
* test
* proto
* code style
* code style
* code style
* style
* add delete after unseen days
* add unseen days
* code style
* conflict solve
test=develop
* add clear model
* code style
test=develop
* code style
test=develop
6 years ago
chengduo
292dfbce63
fix build strategy doc ( #18725 )
...
test=develop
6 years ago
jiaqi
d18aabb472
support patch data, add load_one_table, fix bug ( #18509 )
...
(1)support patch data (merge slots of instances of same line id, modify dense layer which
changes its size)
(2)add fleet load_one_table interface, support load from paddle model and load from pslib model
(3)fix push sparse bug which cause push sparse cost more time(about 10% in my testcase)
(4)when some slots are not in one of your network (join/update, etc.),data feed、collect label info、push/pull sparse will skip these slots, instead of throw error.
(5)add more debug info in TrainFilesWithProfiler
6 years ago
chengduo
fd3aad6cb3
Make fuse_optimizer_op_pass also work when the model contains sparse gradients. ( #18664 )
...
* support sparse gradients
test=develop
6 years ago
Zeng Jinle
ae58afc546
Feature/auto_growth_allocator ( #18561 )
...
* feature/auto_growth_allocator, test=develop
* add unittest of AlignedAllocator, test=develop
* try to turn on auto_growth to test on CI, test=develop
* fix segmentation fault in mixed_vector.h, test=develop
* add unittests, test=develop
6 years ago
guru4elephant
d714bf037c
remove async executor and add data_feed.proto to the deps of train demo ( #18659 )
...
* remove async executor and add data_feed.proto to the deps of train demo
6 years ago
123malin
b414645a65
fix #17430 : int64类型的attr训练非预期 ( #18264 )
...
* fix int64_t
* update fill constant op unittest
* add empty line
6 years ago
gongweibao
c0a82748cf
Polish backwards optimizer dependency codes and use more default values. ( #18255 )
6 years ago
Zeng Jinle
d3003a1620
Feature/buffer_shared_inplace ( #17911 )
...
* feature/buffer_shared_inplace, test=develop
* refine code, test=develop
* fix elementwise_add op cpu inplace and sum inplace bug, test=develop
* add unittest and debug log, test=develop
* fix parallel_executor scope bug, polish code, test=develop
* fix sum op, activation op, single_in_place_inference bug, test=develop
* remove kLocalExecScopeName, test=develop
* fix unittest,test=develop
* fix out_var first version bug, test=develop
* follow comments,test=develop
6 years ago
Zhaolong Xing
88b52a27fe
Inference: fix mask rcnn model diff, optim memory usage, memory leak. ( #18532 )
...
* Fix Mask rcnn predictor
1. refine memory optim algorithm to support the model with the block op.
2. output diff : modify the affine channel fuse
3. add condition_block_infer op
add interface for setting trt calib table dir
test=develop
* add the missing files.
test=develop
6 years ago
Yi Liu
a873fa84ce
supports collective training with programs ( #18392 )
...
1. Since allreduce op has 4 reduce types, We split these four reduce types into four ops
2. We also refined the collective op code, e.g. we separated the collective op kernel into CPUKernel and CUDAKernel, and remove the device specified DeviceContext parameter in template as we already knew the target DeviceContext
3. We remove the newly added Collective op role to reduce the complexity of program and graph analysis
6 years ago
xsrobin
47e2ef38e9
add "import paddle.fluid as fluid" to examples lack of it
6 years ago
lujun
fd6631ef2f
Fix dygraph show style ( #18297 )
...
Fix dygraph show style for FluidDoc.
6 years ago
tangwei12
999d9a59a5
fix communicator with pyreader ( #18350 )
...
* add is_runnning in communicator, test=develop
6 years ago
HaoRen
b7128bac5f
supports collective communicated training ( #18175 )
...
* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop
* supports collective training in executor
* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop
* fix comment
test=develop
* use unique name for nccl_id
* supports output to stream in program_to_code
* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code
* set op role in collective training
* add collective op role
* remove orig file
* add build optimizer by strategy
* add collective strategy
* refine collective strategy
* add multi-process role maker
* refine strategy building factory so that we can easily plugin more strategy
* scale loss grad in collective sgd transpiler
* add support for distributed fc
* code format
* revert some features for dist fc
* add support for distributed fc training
* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop
* supports collective training in executor
* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop
* use unique name for nccl_id
* supports output to stream in program_to_code
* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code
* set op role in collective training
* add collective op role
* fix comment
test=develop
* remove orig file
* add build optimizer by strategy
* add collective strategy
* refine collective strategy
* add multi-process role maker
* refine strategy building factory so that we can easily plugin more strategy
* scale loss grad in collective sgd transpiler
* add support for distributed fc
* code format
* revert some features for dist fc
* add support for distributed fc training
* test=develop
add collective op unittest standard
* test=develop
remove the test_collective directory
* test=develop
remove the test_collective directory
* remove slicegather test
* code format for reducescatter
* update attr of shard_index_op
* Modify macro nccl_helper
* remove test without distribute
* macro collective_helper
* marcro update
* test=develop
update support python3.5
* test=develop change gpu memory use to 0.1 when test
* test=develop
update ut equal func
* test=develop
set flags to 1.5
* test=develop fix pickle dumple py35
* test=develop
fix divide in slice and add sync_comm_stream
update atol and rtol to 1e-05
rm shard_index op and test
modify read input from file to read from memory
remove origin_program in framework and add i/o in c_sync_calc_stream
* test=develop update unittest sync operator I/O
6 years ago
Zeng Jinle
5826b72e06
Refine CUDAPlace error message. ( #18343 )
...
* refine cuda place error msg, test=develop
* use LOG(ERROR)+exit(-1), test=develop
6 years ago
jiaqi
3f8031e256
dataset ( #17973 )
...
(1) use channel instead of vector/BlockingQueue in Dataset,to keep same with existing implementation, and make code more readable and flexible (dataset single output channel or multi output channel). one previous memory out of limit problem is cause by not release memory after training.
(2) add Record because MultiSlotType costs too much memory (80B),fix memory out of limit problem.
(3) add Channel, Archive in paddle/fluid/framework
(4) change dataset from shared_ptr to unique_ptr in pybind
(5) move create/destroy readers from trainer to dataset
(6) move shuffle from datafeed to dataset. dataset holds memory, datafeed is only for load data and feed data to network.
(7) fix thread num bug of Dataset when filelist size < thread num
(8) support set_queue_num in InMemoryDataset
6 years ago
chengduo
25f3cd6486
Update execution_strategy option default value ( #18183 )
...
* update execution_strategy option default value
test=develop
* fix doc error
test=develop
6 years ago
Zeng Jinle
25ab23be28
Fix dygraph mem leak ( #18082 )
...
* fix dygraph mem leak, test=develop
* polish msg, test=develop
6 years ago
Sylwester Fraczek
accb132f0f
fix slim int8 mkldnn multithreading issue ( #18009 )
6 years ago
tensor-tang
5c06bff222
combine noavx and avx package ( #17889 )
...
* support avx and noavx core
* add catch and give some log
test=develop
* fix build
test=develop
* add missing package
test=develop
* fix pybind name
test=develop
* fix import error
test=develop
* conbime noavx core
test=develop
* add requirements
test=develop
* fix unkown message
test=develop
* fix api spec
test=develop
* refine and clean
test=develop
* update
* pass dist ut
* follow comments
test=develop
* refine scripts
test=develop
6 years ago
Jiabin Yang
4d5f6937c3
Feature/refine api for dygraph ( #17907 )
...
* WIP
* WIP
* test=develop, add api doc and example code for dygraph
6 years ago
gongweibao
fbbdc9ccad
Add backward and optimizer operator dependency pass. ( #17746 )
6 years ago
wopeizl
453a49b1bc
Make ParallelExecutor support Windows GPU ( #17787 )
...
* fix the ParallelExecutor on Windows
test=develop
* restrict to use one GPU only under windows
6 years ago
翟飞跃
993c703bcc
INT8 MKL-DNN v2 integrate to slim ( #17634 )
...
* refactor PR 16865
* delete mergetool files
* test=develop
* test=develop
* test=develop
* test=develop
* create dir for int8 model before call SaveOptimModel
* test=develop
* mkldnn int8 only support linux; test=develop
* refine code; test=develop
* remove comment; test=develop
* refine code; test=develop
* fix bug; test=develop
* add exception for mkldnn_post_training_strategy
* reuse int8v2 CAPI dataset; test=develop
* fix accuracy check bug; test=develop
* remove tab
* convert files to unix format
* test=develop
* reduce CI time;test=develop
* reduce CI time and refine code;test=develop
* refine comment; test=develop
* add cmake FLAGS;test=develop
* remove predict_num;test=develop
6 years ago
wopeizl
841553e13f
use pyreader to read data in dygraph mode ( #17314 )
...
* use pyreader to read data
* add return_list to PyReader to support return value represented as list
6 years ago
Zeng Jinle
674e0ce2d6
Use Python C-API to speed up dygraph trace ( #17837 )
...
* use python api to reduce python time cost, test=develop
* fix travis ci, test=develop
* fix Py_None error,test=develop
6 years ago
Jiabin Yang
3b70f870e2
Using Smart pointer to optimizer memory usage of dyGraph ( #17768 )
...
* for debug
* test=develop, memory optimize for dygraph using shared_ptr
* test=develop, fix travis ci showed error
* test=develop, fix bug for recurrent usage of varbase
* test=develop, init varbase when it need to be Add
6 years ago
guru4elephant
d52391094d
fix prepare context redundant code problem, optimize executor by cach… ( #17743 )
...
* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop
* cache sub_scope, program, var when use_program_cache=True is set
* make fetch_list runable with variables, add more unittest for use_program_cache
6 years ago
Zeng Jinle
432ac70124
clean code of py_layer in dygraph mode,test=develop ( #17661 )
6 years ago
gongweibao
65bbf950ee
Add multi-ncclcomm and 2D ncclallreduce support. ( #17263 )
6 years ago
Zhaolong Xing
61221ebc28
TRT: Support set dynamic range in int8 mode. ( #17524 )
...
* fluid int8 train and trt int8 predict align.
trt int8 predict init
op converter
* 2. align fluid int8 train and trt int8 inference.
enhance quant dequant fuse pass
enhance op converter, trt engine, trt engine op, trt subgraph pass.
* 3. add delete_quant_dequant_pass for trt
test=develop
* 4. add the missing file
test=develop
* 5. i modify the c++ interface, but forget to modify the pybind code
fix the IS_TRT_VERSION_GE bug, and fix elementwise op converter
test=develop
6 years ago
wopeizl
6724a652f3
add __str__ method for tensor and lodtensor to support print test=dev… ( #17588 )
...
* add __str__ method for tensor and lodtensor to support print test=develop
6 years ago
guru4elephant
326bf8291a
add Run Prepared Ctx ( #17616 )
...
add Run Prepared Ctx, fix pybind problem
6 years ago
flame
2280f185d7
BuildStrategy api comment ( #17348 )
...
Python examples of fluid.layers.io.double_buffer and some BuildStrategy's methods.
6 years ago
guru4elephant
7f8bc49d00
polish_executor_and_add_ctx_cache ( #17536 )
...
* polish_executor_and_add_ctx_cache
6 years ago
Zeng Jinle
c6189637cd
Fix allocator bug ( #16712 )
...
* Revert "Revert "Fix allocator bug""
This reverts commit 174d0d0b90
.
* Revert "fix travis ci"
This reverts commit 5656fa9f7c
.
test=develop
* add inlined_vector.h, test=develop
* add inlined_vector_test,test=develop
6 years ago
Qiao Longfei
92e7d5d7cc
fix distribute doc test=develop ( #17318 )
...
* fix distribute doc
6 years ago
Qiao Longfei
58f7695ab2
Async exe support communicator ( #17386 )
...
Async exe support communicator
6 years ago
Tao Luo
32da5e9c3d
remove unused expected_kernel_cache_pass ( #17486 )
...
test=develop
6 years ago
Yan Xu
0217555530
polish parallel dygraph code ( #17164 )
...
* add var grad hook test=develop
6 years ago
Jiabin Yang
d7df4e5e5b
Fix/Fix memory leak in dygraph ( #17394 )
...
* test=develop, add gradient sort backward strategy
* test=develop, fix test by add FLAGS_cudnn_deterministic on new tests
* test=develop, fix memory leak in dygraph mode
* test=develop, fix memory leak in dygraph mode
* test=develop, polish code
* test=develop, polish code
* test=develop, polish code
6 years ago
Zhen Wang
4a1b7fec96
Add setting Scope function for the graph class ( #17417 )
...
* add set_not_owned function for graph
* add scope set. test=develop
* add scope_ptr enforce not null before setting.test=develop
6 years ago
jiaqi
66d51206b1
add save/load model, shrink table, cvm, config file & fix pull dense bug ( #17118 )
...
* add save/load model, shrink table, cvm, config file & fix pull dense bug
test=develop
* fix global shuffle bug, fix pull dense bug, fix release memeory bug, fix shrink error
add client flush, add get data size
test=develop
* fix global shuffle bug
test=develop
* fix global shuffle bug
test=develop
* fix code style
test=develop
* fix code style & modify pslib cmake
test=develop
* fix error of _role_maker
test=develop
* fix code style
test=develop
* fix code style
test=develop
* fix code style
test=develop
* fix code style
test=develop
* fix code style
test=develop
* fix windows compile error of fleet
test=develop
* fix global shuffle bug
* add comment
test=develop
* update pslib.cmake
test=develop
* fix fill sparse bug
test=develop
* fix push sparse bug
test=develop
6 years ago
Tao Luo
68ec0a6f74
make parallel_executor support FLAGS_use_mkldnn ( #17341 )
...
* make parallel_executor support FLAGS_use_mkldnn
test=develop
* add warning when set mkldnn_enabled_op_types_ in non-mkldnn env
test=develop
6 years ago
Jiabin Yang
4624d7c642
test=develop, add gradient sort backward strategy ( #17125 )
...
* test=develop, add gradient sort backward strategy
* test=develop, fix test by add FLAGS_cudnn_deterministic on new tests
6 years ago
chengduo
bc833945a4
Add DropLocalExeScopes in ParallelExecutor ( #17297 )
...
* reset drop local scope counter
test=develop
6 years ago
qingqing01
e32c9888f5
Double backward of conv2d. ( #17211 )
...
* Add conv2d_grad_grad_op
* Extracte the cuDNN conv algo searching code in conv_cudnn_helper.h.
- Now use it in conv2d_grad_grad.
- Will simply the searching code in conv2d and conv2d_grad in next PR.
* Enhance and fix bug in unit testing of gradient_checker.
* Support to fetch empty variables,return None in Python.
6 years ago
lujun
e388a1fb66
Repair api example ( #17221 )
...
Fix the following API examples:
paddle.fluid.scope_guard
paddle.fluid.backward.append_backward
paddle.fluid.cpu_places
paddle.fluid.cuda_pinned_places
paddle.fluid.cuda_places
paddle.fluid.in_dygraph_mode
paddle.fluid.CUDAPlace
paddle.fluid.CPUPlace
paddle.fluid.CUDAPinnedPlace
6 years ago
chengduo
04bd413acb
Code Clean: Move all pass to paddle::framework::ir ( #17228 )
...
* move pass to ir
* polish code
test=develop
* fix dependency
test=develop
6 years ago
Zeng Jinle
f2fa3f7300
fix api doc,test=develop ( #17241 )
6 years ago
石晓伟
a72dbe9abf
Cherry-pick benchmark related changes from release/1.4 ( #17156 )
...
* cherry-pick commit from 8877054
* cherry-pick commit from 3f0b97d
* cherry-pick from 16691:Anakin subgraph support yolo_v3 and faster-rcnn
(cherry picked from commit 8643dbc233
)
* Cherry-Pick from 16662 : Anakin subgraph cpu support
(cherry picked from commit 7ad182e16c
)
* Cherry-pick from 1662, 16797.. : add anakin int8 support
(cherry picked from commit e14ab180fe
)
* Cherry-pick from 16813 : change singleton to graph RegistBlock
test=release/1.4
(cherry picked from commit 4b9fa42307
)
* Cherry Pick : 16837 Support ShuffleNet and MobileNet-v2
Support ShuffleNet and MobileNet-v2, test=release/1.4
(cherry picked from commit a6fb066f90
)
* Cherry-pick : anakin subgraph add opt config layout argument #16846
test=release/1.4
(cherry picked from commit 8121b3eccb
)
* 1. add shuffle_channel_detect
(cherry picked from commit 6efdea8997
)
* update shuffle_channel op convert, test=release/1.4
(cherry picked from commit e4726a066f
)
* Modify symbol export rules
test=develop
6 years ago
Zeng Jinle
c5eeecca7c
Fix tensor_py.h ( #17195 )
...
* fix tensor_py,test=develop
* change class name,test=develop
6 years ago
Zeng Jinle
5dfe2ab9e8
Fix mem leak when converting Tensor to numpy array ( #17182 )
...
* fix mem leak when converting Tensor to numpy array
test=develop
* remove unused unittest,test=develop
* follow comments, test=develop
* fix dygraph bug,test=develop
6 years ago
Yan Xu
0b07eef118
ParallelDyGraph with GPU collective mode ( #16827 )
...
implement dygraph.parallel.DataParallel to hook reduce op.
6 years ago
guru4elephant
03d469ad98
Merge pull request #17005 from wopeizl/fix_ncclwrapper_win1
...
fix nccl wrapper on windows
6 years ago
liuwei1031
a770ce0615
add doc for memory_optimize, test=develop ( #17010 )
...
* add doc for memory_optimize, test=develop
* update doc, test=develop
* doc update, test=develop
6 years ago
qingqing01
ea42e431f8
Speed unit testing. ( #16978 )
...
* Speed affine_channel_op unit testing
* Add check in tensor_py
* Fix ONLY_CPU Compiling
6 years ago
wopeizl
51a0243a56
fix nccl wrapper on windows
...
test=develop
6 years ago
Zeng Jinle
1202d3fc74
Refine model gpu memory ( #16993 )
...
* speedup gc and inplace softmax_with_cross_entropy_grad
test=develop
* refine models gpu mem
Merge skip vars and warning messages of mem opt
remove relu mem opt
test=develop
* follow comments
test=develop
6 years ago
guru4elephant
bbc6c5714f
Merge pull request #16887 from guru4elephant/add_nccl_context_pybind
...
Add nccl context pybind
6 years ago
gongweibao
cbdb8a17b1
Polish DGC code ( #16818 )
6 years ago
dongdaxiang
466d177d09
add pybind dependency
...
test=develop
6 years ago
dongdaxiang
4aa6f679b5
add pybind dependency
...
test=develop
6 years ago
dongdaxiang
b091139049
add nccl wrapper for python API
6 years ago
Yiqun Liu
112f16143b
Add an option to enable the cache of expected kernel in train phase. ( #16724 )
...
* Add an option to enable the cache of expected kernel in train phase.
test=develop
* Change the default value of cache_expected_kernel to true.
6 years ago
chengduo
55b15db5af
Add unit test for fuse all_reduce ops ( #16699 )
...
* test fuse all_reduce
6 years ago
Yiqun Liu
3fe8cb0dd7
Enable the runtime_context_cache pass in train phase ( #16640 )
...
* Try to enable the runtime_context_cache pass in train phase.
* Put the append of runtime_context_cache pass ahead of multi_dev passes.
test=develop
6 years ago
guru4elephant
7d653f0aed
Merge pull request #16652 from xjqbest/dataset_merge_develop
...
fix dataset bug
6 years ago
xjqbest
6a57e8075a
remove trainer_id in datafeed and dataset
...
test=develop
6 years ago
Yan Xu
b4c3a6aa0b
[Imperative] implement imperative NCCLParallelContext ( #16477 )
...
add NCCLParallelContext for parallel dygraph
6 years ago
xjqbest
271b7147cc
fix dataset bug
...
test=develop
6 years ago
chengduo
b75a69bad6
Add Stream for fetch op handle ( #16600 )
...
* expose fuse broadcast ops
6 years ago
乔龙飞 Qiao Longfei
21622ca30b
Merge pull request #16172 from jacquesqiao/add-async-ssa-graph-executor-communicator
...
Add async ssa graph executor communicator
6 years ago
sneaxiy
10249c0b78
Merge develop
...
test=develop
6 years ago
Qiao Longfei
adf272bcec
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async-ssa-graph-executor-communicator
...
test=develop
6 years ago
xjqbest
9b84e8e66b
fix code style
...
test=develop
6 years ago
xjqbest
a99c8d0c29
fix client to client communication bug
...
test=develop
6 years ago
sneaxiy
33473890f3
Merge develop
...
test=develop
6 years ago
dongdaxiang
720647e17f
rebase current develop and fix conflict
...
test=develop
6 years ago
dongdaxiang
45eb6f0765
run pre-commit check files and fix code style problem
...
test=develop
6 years ago
xjqbest
e95cafd9a7
fix code style & add dataset testcase
...
test=develop
6 years ago
xjqbest
be74de2c61
fix code style & fix register bug & add release_memory
...
test=develop
6 years ago
xujiaqi01
a5b1a0e12b
support multi dataset && add init model && fix bug
6 years ago
dongdaxiang
b7a202aa38
add distributed optimizer factory
6 years ago
dongdaxiang
f612877797
add incubate for unified API
6 years ago
dongdaxiang
317eb0aad3
add incubate for unified API
6 years ago
xujiaqi01
ecfc7df913
add dataset factory && fix style
6 years ago
xujiaqi01
3cea00bd52
store memory data in Dataset && fix bug
6 years ago
dongdaxiang
cc4def6ba5
fix some conflict for compilation
6 years ago
heqiaozhi
9bca1926c1
refactor & fix bug
6 years ago
xjqbest
2e9a836c6f
add DataSet and InMemoryDataFeed, support load data into memory and shuffle data
6 years ago
dongdaxiang
e36bbcc871
fix some typo and CMakefile.txt
6 years ago
xjqbest
824b84d185
add DataSet and InMemoryDataFeed, support load data into memory and shuffle data
6 years ago
dongdaxiang
be757096da
add pybind for fleet
6 years ago
Qiao Longfei
d8974e6da0
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async-ssa-graph-executor-communicator
...
test=develop
6 years ago
chengduo
1096746cbf
Fuse Adam And SGD ops ( #15933 )
...
* fuse optimizer
6 years ago
sneaxiy
2c836ff914
check default grad maker
...
test=develop
6 years ago
Zeng Jinle
69cb9792ea
Merge pull request #16506 from sneaxiy/revert-16424-fix_allocator_bug
...
Revert "Fix allocator bug"
6 years ago
chengduo
ed61d67c73
Fix the interface of Pass::Apply ( #16484 )
...
* modify the interface of Pass::Allay
test=develop
* Polish code
test=develop
* Fix Travis CI
test=develop
* fix Pass::Apply interface
test=develop
* Fix Travis CI
test=develop
6 years ago
Zeng Jinle
174d0d0b90
Revert "Fix allocator bug"
...
add include headers to fix travis-ci
test=develop
6 years ago
gongweibao
eb83abeac3
Add DGC(Deep Gradient Compression) interface. ( #15841 )
6 years ago
Zeng Jinle
644e8af4cf
Merge pull request #16424 from sneaxiy/fix_allocator_bug
...
Fix allocator bug
6 years ago
Zeng Jinle
c7c6eeb44e
Merge pull request #16409 from sneaxiy/feature/advance_gc
...
Enhance gc to support deleting tensor buffer in advance
6 years ago
wopeizl
c300b1ba69
Tensor index ( #16223 )
...
* extend the slice function for python
test=develop
6 years ago
Xin Pan
f8c279b11c
Merge pull request #16454 from panyx0718/imperative2
...
polish deepCF model to support real dataset
6 years ago
Qiao Longfei
30618409db
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async-ssa-graph-executor-communicator
6 years ago
chengduo
4f2278f032
Add doc for CPUPlace CUDAPlace CUDAPinPlace ( #16442 )
...
test=develop
6 years ago
sneaxiy
78fb3a62e0
fix env variable settting bug
...
test=develop
6 years ago
sneaxiy
2d92b6be98
merge develop
...
test=develop
6 years ago
Xin Pan
fd24ab47ab
polish
...
test=develop
6 years ago
sneaxiy
a7d0ac50b8
Merge develop
6 years ago
sneaxiy
7000ec85d9
fix some op grad maker
...
fix ctest eager deletion disable bug
test=develop
6 years ago
sneaxiy
f8ed2c229e
try to fix ci error
...
test=develop
6 years ago
sneaxiy
c20db6357b
split PR
...
test=develop
6 years ago
sneaxiy
2f54d9f995
Merge develop
...
test=develop
6 years ago
sneaxiy
a93a9eef8f
add op registry type
...
refine gc code
test=develop
6 years ago
sneaxiy
953214ad97
add more unittest
...
modify allocator strategy
remove changes of legacy buddy_allocator
test=develop
6 years ago
chengduo
f26ba5bddd
Fuse AllReduce ( #15921 )
...
* fuse all_reduce
test=develop
* add fuse_parameter_groups_size
test=develop
* Polish code
test=develop
* Fix travis-ci
test=develop
* Add SetGroupAccordingToLayers and SetGroupAccordingToGroupSize
test=develop
* Add SetGroupAccordingToMemorySize
test=develop
* fix multi_devices_graph
test=develop
* reset params_grads
test=develop
* Polish code
test=develop
6 years ago
Tao Luo
7d2740db83
Revert "cache runtime_context"
6 years ago
sneaxiy
fd23262e0c
merge develop, fix conflict
...
test=develop
6 years ago
Qiyang Min
c7f1f3ed0c
Merge pull request #16214 from velconia/imperative_infer_var_type
...
Implement imperative infer var type
6 years ago
Tao Luo
dbb92ee4b1
Merge pull request #16002 from luotao1/runtime_context
...
cache runtime_context
6 years ago
sneaxiy
161b8ddcaa
Merge develop
6 years ago
minqiyang
b40e41fbd1
Polish code style
...
test=develop
6 years ago
Qiyang Min
8e4ad008fb
Merge pull request #16198 from velconia/imperative_train_speed
...
Improve imperative mode training speed
6 years ago
minqiyang
36dce65bb3
Take DataType and VarType apart
...
test=develop
6 years ago
minqiyang
438bca9c3d
Implement Runtime Var Type Inference
...
test=develop
6 years ago
luotao1
1b59bed989
Merge branch 'develop' into runtime_context
6 years ago
qingqing01
8ad672a287
Support sync batch norm. ( #16121 )
...
* Support Sync Batch Norm.
* Note, do not enable it in one device.
Usage:
build_strategy = fluid.BuildStrategy()
build_strategy.sync_batch_norm = True
binary = fluid.compiler.CompiledProgram(tp).with_data_parallel(
loss_name=loss_mean.name,
build_strategy=build_strategy)
6 years ago
minqiyang
7355d41834
1. Add imperative gperf profiler
...
2. Add binutils 2.27 in manylinux support
test=develop
6 years ago
luotao1
b2898c0f57
Merge branch 'develop' into runtime_context
...
test=develop
6 years ago
minqiyang
98dfb492bb
Release GIL lock
6 years ago
sneaxiy
ac0e0f5181
merge develop
...
test=develop
6 years ago
minqiyang
42e96a029f
Accelerate CPU part
6 years ago
sneaxiy
682f2dbf29
merge develop
...
test=develop
6 years ago
sneaxiy
2c4fcaa683
merge develop
6 years ago
luotao1
d94fd97230
add runtime_context_cache_pass
...
test=develop
6 years ago
Yan Xu
30568473ec
fix broadcast on mp mode ( #15951 )
...
* fix broadcast with mp mode
* polish code test=develop
* fix bcast strategy test=develop
* fic cpplint test=develop
* fix py3 failed test=develop
* fix comment test=develop
* update comment test=develop
6 years ago
baojun
e3c37bd564
remove const_cast and refactor ngraph engine code ( #15925 )
...
* remove concast_cast and refactor code test=develop
* reduce flag use test=develop
6 years ago
Zhen Wang
ac6ef06ffa
Add the Clone method in Graph. test=develop
6 years ago
Zhen Wang
01eddf125c
Not add graph copy construction method. test=develop
6 years ago
Zhen Wang
1b9c8d5f06
add clone function for IrGraph. test=develop
6 years ago
Qiyang Min
1f4aa7a202
Imperative remove all descs ( #16045 )
...
* Remove Desc in Forward Pass
* Refactor VarBase
* Add dbg info
* Only check type in imperative mode
* Polish code and support optimizer
test=develop
* Fix stop gradient problem in PyLayer
test=develop
6 years ago
Zeng Jinle
472f16b5aa
Merge pull request #16063 from sneaxiy/enhance_gc
...
Enhance gc
6 years ago
wopeizl
a38db3cb99
Fixrecordio ( #16124 )
...
* fix recordio on win
test=develop
* test=develop
* test=develop
* fix code style
test=develop
* test=develop
6 years ago
sneaxiy
b80d76f784
merge develop
6 years ago
sneaxiy
732fa00eaf
disable gc in recurrent_op currently
...
test=develop
6 years ago
Tao Luo
6f2581e4c5
Merge pull request #16090 from lidanqing-intel/paddle-int32
...
Add PaddleDType INT32 support
6 years ago
Zhaolong Xing
3d63aa0a11
Merge pull request #15729 from NHZlX/add_static_model_load_for_trt
...
Four points for enhancing Paddle-TRT
6 years ago
nhzlx
a9ed427749
cant not pass ci
...
add if use static engine for trt
test=develop
6 years ago
lidanqing
4aeb261da9
Add INT32 support. INT32 in last switch case
...
test=develop
6 years ago
sneaxiy
2a639d5c2a
add allocator chain to fix bug
...
test=develop
6 years ago
Qiao Longfei
8744f9a083
fix parallel executor async mode
6 years ago
Qiao Longfei
e70b1727ef
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async-ssa-graph-executor
6 years ago
Qiao Longfei
847e4f4e85
pure async mode train
6 years ago
sneaxiy
3334c279d0
add sample_generator
...
test=develop
6 years ago
Qiyang Min
187cffd019
Merge pull request #15928 from velconia/imperative_backward_hooks
...
Imperative backward hooks
6 years ago
minqiyang
ac88c62a5b
Reset output var's pre_op pointer when op was destructed
6 years ago
sneaxiy
69b1ebdfa5
merge develop
...
test=develop
6 years ago
mozga-intel
68a9ead17a
The flag of mkldnn is enabled iff it is necessary
...
test=develop
6 years ago
Zhen Wang
e00c7a2e26
Merge pull request #15830 from wzzju/add_ir_node_encapsulation
...
add IrNode&IrVarNode&IrOpNode. test=develop
6 years ago
Qiao Longfei
f768fbf715
support multi graph
...
test=develop
6 years ago
minqiyang
efb2f2baf8
Fix bugs
...
test=develop
6 years ago
Qiao Longfei
cf0511f21e
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async-ssa-graph-executor
...
test=develop
6 years ago
Zhen Wang
548931456c
update some functions' names according to the suggestion. test=develop
6 years ago
sneaxiy
c545f1ed8f
unify API
...
test=develop
6 years ago
minqiyang
b420ec3a92
invoke backward_hooks after reduce op's depcounts map
...
test=develop
6 years ago
Qiyang Min
4bd28b304b
Merge pull request #15831 from velconia/imperative_engine
...
Imperative training network to the end
6 years ago
sneaxiy
b17541a9c1
fix hang bug
6 years ago
minqiyang
84bf4d7b06
Move ClearBlock into OpBase and VarBase's destructor
...
test=develop
6 years ago
minqiyang
2b3510bc50
Add imperative python tracer
6 years ago
minqiyang
a15a3fc314
Polish code
...
test=develop
6 years ago
sneaxiy
1e4c0a6f72
merge develop
6 years ago
minqiyang
9dc64edfd9
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into imperative_engine
...
test=develop
6 years ago
Xin Pan
32d5a16036
resolve conflicts
...
test=develop
6 years ago
Xin Pan
26e32e095a
allow compiler to use graph
...
test=develop
6 years ago
minqiyang
8fe0c0c52c
implement backward refs
6 years ago
Qiao Longfei
cc71e89499
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async-ssa-graph-executor
...
test=develop
6 years ago
minqiyang
74551758cc
Polish code
...
test=develop
6 years ago
minqiyang
f53e1d5c4b
implement ClearBlock
6 years ago
sneaxiy
7160cb0f32
decoupled reader
...
test=develop
6 years ago
sneaxiy
d331e97af8
fix compiler place compare
...
test=develop
6 years ago