* - First set of modifications
- Compilation fixes
- compilation fix
- Another compilation fix
- Moved AcquireSoftmaxPrimitiveDescriptor call into handler
- MKL-DNN Softmax PD refactor
test=develop
- Compilation fix
test=develop
- another compilation fix
- cosmetcis
test=develop
- Compilation fix
- Fix to crash when softmax backward is created
* - Fixes after review of softmax refactoring
test=develop
* Support looking up embeddings from BoxPS.
* Add a _pull_box_sparse op, for now this op is not exposed to users.
* Add a BoxHelper class, providing 'BeginPass', 'EndPass', 'FeedPass' functions and so on.
* Add 'BoxPSDataset' in python code.
* Add a compile options WITH_BOX_PS and a MACRO PADDLE_WITH_BOX_PS.
* Add UT.
* More concrete information pls refer to: https://github.com/PaddlePaddle/Paddle/pull/18982
- Refactor step 1
- Compilation fix
- Yet another compilation fix
- Even more compilation fix
- Lint fixes
test=develop
- Removed deprectaed PADDLE_ENFORCE occurance
test=develop
- Candidate fix to BN forward
- Lint fixes
test=develop
- Refactoring in data_layout_transform
- compilation fix
- Another comppilation fix
- Step further into darkness
- Yet another compilation fix
- Yet another compilation fix
- missing header
- compilation fix
- Added MKLDNN -> Paddle conversion in fetch op
test=develop
- Compilation fix
test=develop
- Lint
test=develop
- Mul fix
- Fix to MKLDNN MUL op and Elementwise MUL UT
test=develop
- Workaround for diffrent weights with groups representation Paddle vs
MKL-DNN.
test=develop
- Candidate fix for 5D convolution with groups
- Refactor of fix for conv3d and conv2d in fetch op
test=develop
- Compilation fix
- Still same compilation fix
- Compilation fix
- Compilation fix
- Reverted refactoring of fixes
- Adapted test_conv2d_int8_mkldnn so it exects data in NCHW format
not NHWC
test=develop
- minor fix in UT
test=develop
- Lint fixes
test=develop
* Add simplify_with_basic_ops_pass to replace dropout_op with scale_op when is_test is true.
test=develop
* Delete dropout_op directly when upscale_in_train is true.
test=develop
* Improve the debug string, adding the print of op_desc information.
* Fix the case when dropout's input x is reused as the next op's output.
* Add the pass to inference.
test=develop
* Change the log level.
test=develop
* Add unittest for inplace case.
* Add comment to explain the pass.
* Apply the pass for CPU inference.
test=develop
* Fix the typo.
test=develop
* Add the check of AttrType.
test=develop
* fix con2d transpose bias by create and init it in build_onee
* fix API spec
* test=develop, invoke ci
* fix bias_attr and act has no effect error on layer norm, conv2dTranpose, billinearTensorProduct, sequece_conv. fix original_mode not used error on GRUunit. fix sample_weight not set error on NCE. Add ut for all thoese layer
* test=develop, change success standard for conv2dTranspose
* test=develop, fix test_layers to invoke some error branch
* test=develop, fix sample code
* test=develop, fix BilinearTensorProduct failed in dygraph mode
* test=develop, fix test_layers segment fault error
* fix correctness of the communicator
* fix a bug in send thread when sending var context is empty, test=develop
* add lookup_table_prefetch_op and prefetch optimize, test=develop
* remove remote prefetch GPU supported
* word2vec force with CPU, test=develop
* test dist remote lookup table force with CPU, test=develop
* supports multiple NCCL communicators preserved in NCCLCommContext
test=develop
* add ut for c_comm_init_all operator and fix cuda resource release problem
test=develop
* support tensor input with padding for warpctc op
* merge with develop
* test=develop
* modified python API examples test=develop
* nn.py is modified for code coverage test=develop
* update documents info about warpctc op in API.spec test=develop
* add test_warpctc_with_padding in test_layers test=develop
* add warning log for cuda_version back to warpctc_op.cc
* modify API.spec for warpctc op test=develop
* modify API.spec
* update warpctc test to new CompiledProgram API test=develop
* modify code examples for warpctc op test=develop
* modify API.spec for warpctc op test=develop
* modify API.spec for warpctc op test=develop
* add local user data conversion into full_pascalvoc_test_preprocess.py
test=develop
* change PADDLE_ENFORCE to PADDLE_ENFORCE_GE
test=develop
* change according to reviews
test=develop
* Implement the operator with sprase matrix multiply
* Update the URL of mklml library.
test=develop
* Disable MKLML implematation when using no-linux.
test=develop
* optimize bp with mkl sparse matrix
test=develop
* add pybind interface to get all inplace ops, test=develop
* enhance OpTest to check whether the consistency of operator when using and not using inplace, test=develop
* handle corner cases in op_test, test=develop
* support outputs without tensor holder_, like XShape in reshape_op, test=develop
* fix bug, some op has GradOpMaker, but actually no grad_op in OpInfoMap, test=develop
* use reshape_grad instead of reshape in FlattenGradOp, test=develop
* fix error debug dims info for variables like XShape, test=develop
* change computational order in sum_op to relieve computation difference using inplace, test=develop
* add inplace_atol to check group_norm, and skip inplace_grad for mkldnn, test=develop
* follow sneaxiy's comments, test=develop
* remove unused DefaultGradOpDescMaker in mkldnn op, test=develop
* replace part of PADDLE_ASSERT to PADDLE_ENFORCE
test=develop
* remove unused fallback_alloc_size_
* add unit-test of CUDAPinnedAllocator
test=develop
* Implement the operator with sprase matrix multiply
* Update the URL of mklml library.
test=develop
* Disable MKLML implematation when using no-linux.
test=develop
* Ignore the deprecated status for windows
test=develop
add fl_listen_and_serv op for Federated_learning and fl_distribute_transpiler add this op to pserver program . This op just listen the endpoint and sum&scale.
modify checkMacPython2 and checkMacPython3 function to ensure python version used for paddle installation is EQ/GT than 2.7.15 on MacOS; modify checkMacPaddleVersion function to ensure paddle version is release version, because paddle don't have develop version on MacOS.
* change the default value of summarize from -1 to 20 in Print op to improve ease of use, test=develop
* change the doc of API Print to make the document easier to understand, test=develop
* Enhance the error message when GrapOpMaker is null.
test=develop
* Call Proto() instead of directly using proto_ pointer.
test=develop
* Rollback to use proto_ directly, because some sepecial grad ops, such some double grad ops, donot have proto.
test=develop
* remove unused DefaultGradOpDescMaker in REGISTER_OPERATOR(), test=develop
* remove SplitIdsOpGradMaker since it is buggy and not tested, update spec file, test=develop
* instag lod tensor impl
* First PR for instag
* First PR for instag
* Before adding Selection Rows.
* Change name from instag to filter_instag, add upgrade the impl of filter_instag
* Change name from instag to filter_instag, add upgrade the impl of filter_instag
* Fix yapf error in gradient_checker.py to pass Travis-CI
* Fix Filter Instag Grad test=develop
* Fix Filter Instag Grad test=develop
* 1) Fix API.spec, add filter_instag Op. 2) Add Vector Support for CUDA. test=develop
* Impl Loss_weight and empty output handler
* change Loss Weight datatype to Float32, and add Loss Weight as 2nd output
* 1) Support Tensor Input(without LOD) 2) Add Unit test
* Filter By Instag Final test=develop
* Update API.spec for filter_by_instag test=develop
* Update API.spec for filter_by_instag 2 test=develop
* Add Filter By Instag Coverage
* code format of test_layers.py
* code format test_layers.py test=develop
* Make API args more readable test=develop
* Make API args more readable and pass code format test=develop
* Filter By Instag Op, Rename Map to Index Map test=develop
* Filter By Instag Op, code format err in filter_by_instag_op.cc test=develop
* Filter by instag op: code format of cpp files test=develop
* Filter by instag Op: Api spec modification test=develop
* Filter by instag Op: Api spec doc id modification test=develop
* Filter by instag Op: Api spec and doc preview test=develop test=document_preview
* Filter By Instag Op, fix doc erro test=document_preview test=develop
* Filter By Instag Op, fix doc err and Api spec test=document_preview test=develop
* Filter By Instag Op, fix Api spec test=document_preview test=develop
* Filter By Instag Op, fix Paddle Encoforce deprecated warning test=document_preview test=develop
* Filter By Instag Op, fix Paddle Encoforce deprecated and code format warning test=document_preview test=develop
* add hard_swish activation op (new op)
test=develop
* remove redundancy files
* modify document content of HardSwish OP
* add API test in test_layers.py
* add dynamic_graph for test_hard_swish
* add a place field in DataFeed to denote which place it will feed data to.
* abstract the copy process in CopyToFeedTensor function
* add UT for float32 type and for CUDAPlace
* Add call stack info during runtime and compile time
test=develop
* Rename operator_call_stack
test=develop
* Add unit test
test=develop
* follow comment
test=develop
* add train demo for imdb text classification task
* make inference library release data_feed dataset dataset_factory data_feed_factory
* add String Data Generator
* new feature of train demo: save model params
* New feature of train demo: set training config using gflags
* change code style for CI
* add readme and dataset for imdb demo trainer
* fix QueueDataset queue size,set queue size = batch size * 100, to avoid too many instances in channel when training is much slower than reading data.
* fix warpctc.dll not found issue, test=develop
* revert the linux platform change, test=develop
* delete warpctc_lib_path.h.in, test=develop
* add SetPySitePackagePath function
* fix warpctc.dylib not found issue on Mac, test=develop
* improve the paddle lib path setting logic, test=develop
* fix mac ci issue caused by test_warpctc_op unittest, test=develop
* tweak code, test=develop
* open gc by default, test=develop
* fix test_train_recognize_digits and disable gc when ngraph is enabled, test=develop
* fix conditional_block op eager deletion bug, test=develop
* add some comments to reviewers, test=develop
* Fix Mask rcnn predictor
1. refine memory optim algorithm to support the model with the block op.
2. output diff : modify the affine channel fuse
3. add condition_block_infer op
add interface for setting trt calib table dir
test=develop
* add the missing files.
test=develop
* 1 add trt fp16 support
test=develop
* fix trt fp16 ce error
test=develop
* add an vlog if the user use trt4 and specify fp16.
test=develop
* support filelist size < trainer num
* pull dense when stop, to make sure local dense params are same as pserver, so save paddle model will save dense model same as pserver
* enable QueueDataset train same filelist for serveral times
* Fix memory leak in test
test=develop
* Fix memory leak in test
test=develop
* Fix memory leak in test
test=develop
* Pull out vars of the loops
test=develop
* test=develop
Add the op of unique_with_counts, the op is calc the unqiue input of data, and output the corresponding indices and count of data.
* test=develop
Check the input and dtype in the op of unique_with_counts
* test=develop
test=document_preview
update the API.spec for `unique_with_counts`, at the same time, optimize the python api in the op of `unique_with_count`
* test=develop
test=document_preview
Fix some python api problem in the op of `unique_with_counts`, and change the error messsage in this op.
* Fix some API problem in the op of `unique_with_counts`
test=develop
test=document_preview
* test=develop
test=document_preview
Fix the api sample of op `unique_with_counts`, and update api.spec
test=develop
- Extracted key generation from FWD and GRAD into separate function
test=develop
- Compilation fix
test=develop
- another compilation
test=develop
* fix security issue, test=develop
* bug fix, test=develop
* throw an exception when null pointer data with non-zero length PaddleBuf is passed, test=develop
* Fix Mask rcnn predictor
1. refine memory optim algorithm to support the model with the block op.
2. output diff : modify the affine channel fuse
3. add condition_block_infer op
add interface for setting trt calib table dir
test=develop
* add the missing files.
test=develop
* 1 add trt fp16 support
test=develop
* support center loss
* change tensor copy api to high level api tensorcopy
* test=develop rewrite the center_loss cuda_kernel to make it faster
and add document of the center loss api,also update test function
* test=document_preview test=develop
update document of center loss
* test=document_preview test=develop
modify API.spec modify test code remove nouse const_cast
* change INT8 to template so that checking dst_dt with if-else could be removed. CI will be enabled after fixing reviews
* reverse user_residual_memory_p and user_bias_memory_p declaration scope
test=develop
* extend matmul op to support multiple head multiplication
With the support of multiple head, the multiplication of two big matrixes is
split into multiplication of several (head_number) small matrixes. e.g. if
Mat A is [3, 24] and Mat B is [24, 4], when multiple A and B with head_number
as 4, Mat A will be split as 4 matrix of [3, 6] and Mat B will be 4 matrix of
[6, 4]. The result of final matrix will be 4 matrix of [3, 4], i.e. [3, 16].
* update paddle-trt for:
1. fix bug: when batch > 2, core in split plugin.
2. add leaky_relu trt5.0 support (yolov3 from 65ms to 42ms.)
3. add new attr to dropout.
4. shuffle channel, swish, relu6 support
test=develop
* 1. fix ci
test=develop
The change includes 2 things:
1. save delta model and shrink table are control by the same parameter before, now add delete_after_unseen_days to control shrink table.
2. value in sparse table has no slot before, now add slot in sparse table, and add DownpureCtrAccessor to support the new meta.
test=develop
(1)support patch data (merge slots of instances of same line id, modify dense layer which
changes its size)
(2)add fleet load_one_table interface, support load from paddle model and load from pslib model
(3)fix push sparse bug which cause push sparse cost more time(about 10% in my testcase)
(4)when some slots are not in one of your network (join/update, etc.),data feed、collect label info、push/pull sparse will skip these slots, instead of throw error.
(5)add more debug info in TrainFilesWithProfiler