Commit Graph

42 Commits (e312a1ff6ed897743f29a64a61ad03b6275ceed7)

Author SHA1 Message Date
Qi Li 50967135a5
[ROCM] update fluid framework for rocm (part3), test=develop (#31011)
4 years ago
wanghuancoder 35c5b23f68
use iwyu clean include second time, test=develop (#30829)
4 years ago
Leo Chen 81217a94d8
unify calling cudaSetDevice (#30470)
4 years ago
liuyuhui 843dc3cdbd
[Kunlun]PR3: add xpu executor, multi xpu card train function optimization (#30317)
4 years ago
liuyuhui 254ad61959
fix xpu pe sync, test=notest (#30095)
4 years ago
liuyuhui 4427df37cf
[Kunlun] PR2: Support MultiDevicePass and BKCL in parallel executor (#29574)
5 years ago
liuyuhui f13c3a9cd7
[Kunlun] PR1:Support one Kunlun card training in parallel executor (#29337)
5 years ago
Leo Chen 1f3be85914
Fix bug of fetch_async_op_handle when fetching the feed variable (#28194)
5 years ago
Leo Chen 35074963e3
Refine error msg in paddle/fluid/framework/details [part 2] (#27429)
5 years ago
Leo Chen aba759ba16
[Feature] Enhance inplace addto strategy for gradient accumulation in static graph (#27112)
5 years ago
Chen Weihang 4061aa6488
Polish ParallelExecutor exception process logic (#25449)
5 years ago
Chen Weihang aa0f254fbe
Add macro BOOST_GET to enrich the error information of boost :: get (#24175)
5 years ago
Zeng Jinle cdb3d27985
Fix warn of gcc8 (#21205)
6 years ago
Zeng Jinle a9c8bdad7b
refine pe codes, test=develop (#20479)
6 years ago
Zeng Jinle d3003a1620
Feature/buffer_shared_inplace (#17911)
6 years ago
gongweibao fbbdc9ccad
Add backward and optimizer operator dependency pass. (#17746)
6 years ago
gongweibao 65bbf950ee
Add multi-ncclcomm and 2D ncclallreduce support. (#17263)
6 years ago
chengduo ea2a2f778a Fix the bug of AllReduceDepPass (#16393)
6 years ago
Wu Yi 9ffd5eecef
test fix fetch bar place for ce (#16406)
6 years ago
chengduo a6a3b2fbbc
[Speed]Refine ParallelExecutor (#16190)
6 years ago
chengduo ed087f8232
refine op_handle (#14178)
7 years ago
yuyang18 d49763a87d Stash
7 years ago
Xin Pan 37e514432b op compose node and update nodes.
7 years ago
yuyang18 2d0e5592b5
Use std::map for Place <--> DeviceContext
7 years ago
fengjiayi ff4317cee9 follow comments
7 years ago
fengjiayi 47388020a2 fix bugs
7 years ago
chengduo da556ed6d4
enhance ParallelExecutor stable (#11637)
7 years ago
chengduoZH c99fca5f90 Add No Mutex
7 years ago
chengduoZH aadaadf735 replace use_event with use_cuda, because use_event means the program running with CUDA, so use_cuda maybe more intuitive.
7 years ago
chengduoZH a584bc86dd add fuse var op handle
7 years ago
chengduoZH a89cd46700 Wait VarDummyHandle generated
7 years ago
chengduoZH 9eec2c7509 refine pe
7 years ago
Yu Yang 4452ff76b7 Fix CPU compile
7 years ago
Yu Yang 79be06045c Support CPU/GPU mixture for ParallelExecutor
7 years ago
Yu Yang 6b20b35589 Fix Transformer Hang Problem
7 years ago
Yu Yang 084cdd1f4f Rename code
7 years ago
Yu Yang 76570c2e96 Wait fetch op
7 years ago
Yu Yang b6ca3711b4 Get error
7 years ago
Yu Yang f385228f05 Add Paddle Enforce
7 years ago
Yu Yang 54bd17fe7b Complete Flowers
7 years ago
Yu Yang 15f5f10ed5 AddInput/AddOutput for OpHandle
7 years ago
Yu Yang fe7ed285d1 Extract NCCLCtxMap
7 years ago