Compare commits

...

272 Commits
master ... r0.3

Author SHA1 Message Date
mindspore-ci-bot ad0a705489 !2952 udpate release notes for r0.3.1
6 years ago
jonyguo d3fbc1523b update r0.3.1 release notes
6 years ago
mindspore-ci-bot 1c9ed09fd0 !2812 Remove submodule akg.
6 years ago
mindspore-ci-bot 1ccedcde65 !2826 add libtiff notice info to r0.3
6 years ago
xulei2020 921e7de987 add libtiff notice to r0.3
6 years ago
Tron Zhang d5d9e92336 remove submodule akg
6 years ago
mindspore-ci-bot 9343746ef7 !2607 Modify long description format of whl package
6 years ago
zhoufeng e2593466fc Modify long description format of whl package
6 years ago
mindspore-ci-bot 2e5a76e0df !2606 Update version to 0.3.1
6 years ago
zhoufeng 067b619034 Update version to 0.3.1
6 years ago
mindspore-ci-bot 5e5c66e300 !2540 Move LayerNormGrad split pass ahead of kernel select
6 years ago
mindspore-ci-bot d8969d243e !2568 update run_train.sh of mobilenetv2_quant && resnet50_quant
6 years ago
wandongdong f03e88c26f update run_train.sh
6 years ago
mindspore-ci-bot 8c30045178 !2553 add mindrecord to mobilenetv2_quant && resnet50_quant
6 years ago
mindspore-ci-bot 095d86e16f !2556 fix: change field name from 'data' to 'image' - sync
6 years ago
jonyguo 862bc22b38 fix: change field name from data to image in mindrecord for imagenet
6 years ago
wandongdong a6590d1866 add mindrecord
6 years ago
mindspore-ci-bot 3f0a350d68 !2510 THOR ops modified
6 years ago
huanghui e936d5cd4b place layernormgrad split pass before kernel select
6 years ago
mindspore-ci-bot 46700bec69 !2500 add output activation quant in mobilenetv2 and resnet50
6 years ago
zongha 0920094f81 fine img2col_impl
6 years ago
chenzupeng bf0673003b add dense quant activation fake
6 years ago
mindspore-ci-bot 3a40ac6521 !2435 fix perchannel num_channels not set bug and adjust quant.py params order
6 years ago
wangdongxu f110c7616b fix perchannel num_channels not set bug and adjust quant.py params order
6 years ago
mindspore-ci-bot 3e3cbbba0f !2447 asyn save checkpoint to file
6 years ago
mindspore-ci-bot e368d0524b !2455 add perchannel quant train
6 years ago
changzherui 966f05231d asyn save checkpoint to file merge to r0.3
6 years ago
chenzupeng e9ee59c7ad add perchannel quant train
6 years ago
mindspore-ci-bot 4bbd4414c4 !1734 deal with resnet50_THOR train print many ERROR&WARNING log and produce many ir file
6 years ago
mindspore-ci-bot cf7c60a5ed !2431 update README
6 years ago
panfengfeng 87cc57d3aa update readme
6 years ago
mindspore-ci-bot 2d35511d7c !2423 Adapt module akg's change
6 years ago
mindspore-ci-bot fded8732ea !2422 reshuffle all data and shard again whe use MindDataset distribute
6 years ago
tronzhang 109a21d520 Adapt change of module akg
6 years ago
jonyguo 0f380b559e enhance: add full reshuffle per epoch and fix: random_device failed
6 years ago
mindspore-ci-bot e519317622 !2407 change Q.BNTrainingReduce() to P.BNTrainingReduce()
6 years ago
chenzomi 2fab2492bc change Q.BNTrainingReduce() to P.BNTrainingReduce()
6 years ago
mindspore-ci-bot 11caa3aec8 !2340 fix random_crop_resize_2
6 years ago
panfengfeng 25827a8619 fix random_crop_and_resize
6 years ago
mindspore-ci-bot 91c856e5ee !2334 remove dataset send from data exec for r0.3
6 years ago
wangnan39@huawei.com 20049bbea6 send data after model init
6 years ago
mindspore-ci-bot cb6211f25d !2291 remove _quant_op.py from __init__.py
6 years ago
mindspore-ci-bot 24d61337c0 !2302 improve summary performance
6 years ago
Margaret_wangrui 69b32e4dca improve summary performance
6 years ago
chenzomi 9be52e0a1b remove _quant_op.py from __init__.py
6 years ago
mindspore-ci-bot 53d7e622f9 !2261 modify log level from warning to info
6 years ago
mindspore-ci-bot 1127ace7ec !2228 cache get_dataset_size value
6 years ago
mindspore-ci-bot ab39708929 !2099 fix summary nodes memory reuse refcount
6 years ago
mindspore-ci-bot 147d0cde07 !2277 fix arithmetic simplify
6 years ago
Xian Weizhao 9c70861343 fix arithmetic simplify
6 years ago
jjfeing e78e819b7c modify log level from warning to info
6 years ago
mindspore-ci-bot f3bb991ce9 !2232 split correction_mul ops
6 years ago
mindspore-ci-bot dcb90588b0 !2248 bind summary nodes to KernelGraph in order to memory reuse
6 years ago
wandongdong c742384a39 split correction_mul op
6 years ago
yanghaitao1 038040750d store get dataset size
6 years ago
mindspore-ci-bot 476671b1cf !2196 fix log level too high: conversion of const tensor is normal
6 years ago
mindspore-ci-bot c749f513ac !2195 【r0.3 branch】fix FakeQuantPerLayer/FakeQuantPerLayerGrad symmetric=True calculation error bug
6 years ago
王东旭 7995189c72 fix FakeQuantPerLayer/FakeQuantPerLayerGrad symmetric bug and remove BNTrainingReduceGrad/BNTrainingUpdateGrad
6 years ago
Margaret_wangrui 6f5303f0d9 bind summary nodes to KernelGraph in order to memory reuse
6 years ago
tronzhang ac7197d33e fix log level: const tensor conversion is normal
6 years ago
laiyongqiang 8d0691aaf9 fix summary nodes memory reuse refcount
6 years ago
mindspore-ci-bot 1e90e7be05 !2172 fix some info
6 years ago
jonyguo 5e2953247f fix: verify info
6 years ago
mindspore-ci-bot ff500c678e !2122 add set_dataset_size for MindDataset
6 years ago
jonyguo 488b74e92f 1. add set_dataset_size for MindDataset 2. modify parameter dupe_factor from 5 to 10
6 years ago
mindspore-ci-bot fba21459a7 !2115 change readme.md
6 years ago
mindspore-ci-bot 6d04e1a8e5 !2115 change readme.md
6 years ago
chenzomi d6bd690d34 change readme.md
6 years ago
mindspore-ci-bot 9fc00ca521 !2031 add sync bewteen hcom
6 years ago
mindspore-ci-bot 7c77bb8782 !2104 change mobilenet V2 readme.md
6 years ago
chenzomi 077d21f055 change mobilenet V2 readme.
6 years ago
mindspore-ci-bot da9530f7f7 !2090 resnet quant dataset aug change
6 years ago
panfengfeng 690db9a515 resnet_quant data aug change
6 years ago
mindspore-ci-bot 653519630a !2079 Feat(GraphKernel): Init GraphKernel.
6 years ago
mindspore-ci-bot 3c2f4df87c !2087 data aug changes from c to py
6 years ago
mindspore-ci-bot 62fae9befa !2082 MindDataset with padded mode print reshuffle error info
6 years ago
panfengfeng e20d687e7a using py_transform for data aug.
6 years ago
mindspore-ci-bot 23d103a122 !2085 remove unused code in quant train
6 years ago
chenzupeng 52a90f2587 remove unused code in quant train
6 years ago
mindspore-ci-bot e21a0aad69 !2073 add resnet50 quant model
6 years ago
gong chen 13a2d6d49e Init GraphKernel.
6 years ago
jonyguo f3ebc7319c fix: MindDataset padded log error
6 years ago
wandongdong df65f16812 add resnet50_quant
6 years ago
mindspore-ci-bot dc9a51aad5 !2070 adapt quantization aware train for r0.3
6 years ago
chenzupeng cc497424fc adapt for mobilenetV2 quantization awared train in r0.3
6 years ago
mindspore-ci-bot b3f09b1d45 !1995 remove the useless transdata and cast connected with control depend
6 years ago
mindspore-ci-bot f05da3aae9 !1948 fix resnet50 distribute bug
6 years ago
WilliamLian 0ac5911910 remove the useless transdata and cast connected with control depend
6 years ago
mindspore-ci-bot fb65a1a929 !2049 update mobilenetv2 scripts
6 years ago
mindspore-ci-bot 7d965477a1 !2041 add mobilenetC2 quant
6 years ago
panfengfeng 68c3c73fab update mobilenetV2 dataset codes
6 years ago
mindspore-ci-bot 7ffcc606c9 !2035 add example for zhwiki, CLUERNER2020 and enwiki to mindrecord
6 years ago
mindspore-ci-bot aa4c4f51ac !2025 fix remove reshape pair pass
6 years ago
mindspore-ci-bot 854e16f0f8 !2033 fix mindrecord seekg failed
6 years ago
chenzomi 60dc921186 add mobilenetC2 quant
6 years ago
jonyguo 16e9da5ae5 enhance: add example for zhwiki, CLUERNER2020 and enwiki to mindrecord
6 years ago
jonyguo a48a97208b fix: mindrecord seekg failed when shift raw page
6 years ago
liubuyu e3145f18b0 fix remove reshape pair pass
6 years ago
gukecai c4abebafcc add sync bewteen hcom
6 years ago
mindspore-ci-bot 0e4fab2368 !2011 fake quant debug
6 years ago
chenzomi 5a26546b56 fake quant debug
6 years ago
mindspore-ci-bot a40e9e6fae !2001 fix MindDataset distribute shuffle error
6 years ago
jonyguo 07f7d1ae62 fix: MindDataset distribute shuffle bug
6 years ago
mindspore-ci-bot 9944abe99d !1963 bug fix in fake quant training in r0.3
6 years ago
chenzomi bb58ea35b9 bug fix in fake quant training in r0.3
6 years ago
mindspore-ci-bot eaaacfea4c !1941 Add order function in group params in r0.3
6 years ago
mindspore-ci-bot 676e717edf !1952 use VisitKernelWithReturnType instead of VisitKernel to get node's input in mem_reuse
6 years ago
laiyongqiang 9bdf017379 use VisitKernelWithReturnType instead of VisitKernel to get node's input
6 years ago
zhaoting b37184050f fix resnet50 distribute bug
6 years ago
zhaoting 4d92e2b579 Revert "Revert "add pattern AdjustAllReduceMulAdduse the old opadd test case for bugtemp fix try""
6 years ago
mindspore-ci-bot ba125f9673 !1925 bug fix in fake quant
6 years ago
chenzomi e0fa277a05 fix bug in fake quant grad
6 years ago
mindspore-ci-bot eac1f93ee4 !1889 add dropout special kernel selected rules
6 years ago
mindspore-ci-bot 40e1e3843f !1894 fix lars weight decay computation error
6 years ago
Ziyan fdb2a915b9 fix weight decay in lars
6 years ago
WilliamLian 159119cb2a add dropout special kernel selected rules
6 years ago
guohongzilong f213c3a6ad add order function in group params
6 years ago
mindspore-ci-bot 1f34378b9c !1837 [MD] support padding samples in minddataset
6 years ago
liyong d915d46d79 pad samples in mindrecord
6 years ago
mindspore-ci-bot 6ce8a4ab20 !1836 update register info of BiasAddGrad and modify adam optimizer&softmax_grad to match fusion rules
6 years ago
shibeiji 188c9feca4 update register info of BiasAddGrad and modify adam optimizer&softmax_grad to match fusion rules
6 years ago
mindspore-ci-bot 29deeca343 !1818 Add SoftmaxGradExt fusion pass from master to r0.3
6 years ago
huanghui cc582f5e30 add SoftmaxGradExt fusion pass
6 years ago
mindspore-ci-bot bbdc44a0cc !1646 reorder independent nodes for stream parallel
6 years ago
mindspore-ci-bot c8d31b0889 !1754 Add 5 patterns for AdamApplyOneWithDecay fusion pass
6 years ago
mindspore-ci-bot 3baa52717f !1795 fix compile bugs in mobilenetv2 quant aware training for r0.3
6 years ago
wandongdong 5485976f61 fix compile bugs for quant
6 years ago
mindspore-ci-bot 2109bb68b3 !1756 modify widedeep
6 years ago
mindspore-ci-bot 07166d11af !1751 fixed SoftmaxGradExt
6 years ago
huanghui 05afc22ffa add newly 5 patterns for AdamApplyOneWithDecayRule fusion pass
6 years ago
wukesong bb4b06946f modify widedeep
6 years ago
jiangjinsheng 022d391e3c fixed SoftmaxGradExt
6 years ago
mindspore-ci-bot 4cff81ee2d !1733 change some settings in SSD
6 years ago
zhaoting ac12df82d2 change some settings in SSD
6 years ago
mindspore-ci-bot 9cb129ac99 !1720 add reducemean's special kernel fileter rule
6 years ago
mindspore-ci-bot 5adcbf6e23 !1727 move add graph manager to gpu session
6 years ago
z00478463 491ba51b8b set save graphs False and add bprop for op cholesky trsm
6 years ago
mindspore-ci-bot db6bb720df !1716 fix bug introduced by gpu support
6 years ago
lizhenyu c0aa7602e0 move add graph manager to gpu session
6 years ago
WilliamLian ba48964f2a add reduce mean kernel filter function
6 years ago
mindspore-ci-bot 0a4a449e8f !1711 fix log1p
6 years ago
gengdongjie 4f50cb3a9b fix bug introduced by gpu support
6 years ago
mindspore-ci-bot 5d0cc35792 !1567 lstm&transpose_r0.3
6 years ago
baihuawei b85c310ea1 add lstm & transpose
6 years ago
jiangjinsheng a9de8012df fixed log1p
6 years ago
mindspore-ci-bot d85262e03c !1686 update r0.3 relase notes
6 years ago
jonyguo 22158fc703 update r0.3 release notes and install path
6 years ago
mindspore-ci-bot e3a7f8f21c !1698 bugfix:get nullptr from graph manager
6 years ago
lizhenyu df04230e13 fix get nullptr when use graph manager
6 years ago
mindspore-ci-bot 20d26b17f8 !1684 dataset: repair get_sampler_size problem
6 years ago
mindspore-ci-bot 85012ceedd !1677 TopK fusion pass bug fix
6 years ago
mindspore-ci-bot 527f1d70ce !1680 fix resource release bug of memory swap
6 years ago
mindspore-ci-bot cab2612c23 !1662 fix get_dataset_size error for GeneratorDataset
6 years ago
linqingke a339fac777 topk bug fix
6 years ago
mindspore-ci-bot 723c66bb66 !1683 modify dataset.py and add autp parallel split
6 years ago
wanghua 298ff4adc1 modify dataset.py and add autp parallel split
6 years ago
ms_yan 09fd47a256 repair get_sampler_size problem
6 years ago
mindspore-ci-bot e0510928f1 !1676 GPU fix resnet script
6 years ago
mindspore-ci-bot 9205271347 !1671 Add DeepLabV3 network
6 years ago
lizhenyu 13bda4caf1 fix resource release bug of memory swap
6 years ago
gukecai 8d68bd874e reorder independent nodes
6 years ago
VectorSL 60dadd6d21 gpu fix resnet script
6 years ago
mindspore-ci-bot 0e4574af6b !1656 fix bug for mobilenet in model_zoo
6 years ago
mindspore-ci-bot 3ae2f8d12c !1664 revert parameter set kernel build info
6 years ago
mindspore-ci-bot a74e238e21 !1664 revert parameter set kernel build info
6 years ago
yangyongjie 0a97cb8acd add deeplabv3 to model zoo
6 years ago
WilliamLian 344f2ef4df revert don't set parameter's format when it's has been setted before
6 years ago
mindspore-ci-bot 6f3758f313 !1657 add readme
6 years ago
yanghaitao 1187411af1 a
6 years ago
mindspore-ci-bot f7acf0ed6f !1633 modify ssd script for merging backbone
6 years ago
chenzomi e658eb7f24 bug fix
6 years ago
mindspore-ci-bot 5cb99aadf5 !1645 ModelZoo WideDeep r0.3
6 years ago
mindspore-ci-bot dc5b04846f !1566 sync lstm ops code from master to r0.3
6 years ago
mindspore-ci-bot 72a166ff8c !1624 Remove WARNING log in pynative mode
6 years ago
z00478463 894e329218 add the readme
6 years ago
yao_yf 4ac88b6bcc modelzoo_widedeep_r0.3
6 years ago
mindspore-ci-bot 67fecef6a8 !1651 GPU fix example scripts resnet r0.3
6 years ago
mindspore-ci-bot c32c17bbad !1644 dataset: re-fix some format problem in take and split
6 years ago
VectorSL a9db68db3a fix gpu resnet script
6 years ago
chengxianbin 4fb3ab7882 modify ssd script for merging backbone
6 years ago
mindspore-ci-bot b2f0135224 !1629 add cpu stridedslice
6 years ago
ms_yan 5a1fba5103 repair api format problem
6 years ago
mindspore-ci-bot 69dd996278 !1620 Add protection in cross entropy kernel.
6 years ago
mindspore-ci-bot 9dd3c1f77d !1621 upload fasterrcnn scripts
6 years ago
sunsuodong ba39d53c22 sync lstm ops code from master to r0.3
6 years ago
mindspore-ci-bot b47847167d !1630 Add DeepFM scripts
6 years ago
mindspore-ci-bot 41e179cc51 !1640 Fix lenet hang problem on windows
6 years ago
mindspore-ci-bot 01d9ce3e5d !1622 change mobilenet file struct.
6 years ago
xiefangqi 2c42665e90 fix lenet hang problem on windows
6 years ago
mindspore-ci-bot 803a91596a !1614 LSTM network adapt to cpu target.
6 years ago
yangyongjie 8f79f0cce8 add DeepFM
6 years ago
kswang 808d5947d5 add cpu strided slice
6 years ago
mindspore-ci-bot 9274daec9c !1610 fix subset random sampler error
6 years ago
mindspore-ci-bot ce57e02db3 !1562 don't set parameter's format when it's has been setted before
6 years ago
mindspore-ci-bot 07724c7080 !1608 add get_dataset_size for CelebADataset
6 years ago
chenzomi 9853294aaa change mobilenet struct
6 years ago
caifubi 6c491b8d3e Only release runtime resource in GRAPH_MODE
6 years ago
meixiaowei 24fb17895a upload fasterrcnn scripts
6 years ago
ZPaC 42641f17ab Add protection in cross entropy kernel.
6 years ago
mindspore-ci-bot a8efea5c81 !1588 GPU upadate resnet50 script in example
6 years ago
caojian05 600d052ac1 LSTM network adapt to cpu target.
6 years ago
mindspore-ci-bot 6599cc1aca !1579 recitify pretrained path and revert AdjustAllReduceMulAdduse
6 years ago
mindspore-ci-bot a14be2254b !1594 refine data copy in multi-graph
6 years ago
mindspore-ci-bot 1289c3e4db !1592 bug fix while evaluation
6 years ago
mindspore-ci-bot b70b2da675 !1582 add topk and randomchoicewithmask op data type for aicpu
6 years ago
mindspore-ci-bot 02914ba0b9 !1581 fix flatten grad error with reshape
6 years ago
WilliamLian 085d8f1233 don't set parameter's format when it's has been setted before
6 years ago
yanghaitao a379c668f5 fix subsetrandomsampler
6 years ago
mindspore-ci-bot 00a4e188b7 !1590 dataset: fix some format problem in take and split
6 years ago
yanghaitao 415afe09f5 add get_dataset_size to celebadataset
6 years ago
mindspore-ci-bot 94872b7678 !1570 Check the size of topk input names before converting input to attr
6 years ago
mindspore-ci-bot 2f936166c9 !1575 VocDataset support split ops
6 years ago
mindspore-ci-bot 76befd5703 !1577 fix reshape reshape case in auto parallel for r0.3
6 years ago
mindspore-ci-bot 78909200ed !1589 fix bert performance
6 years ago
lizhenyu 32dbbc1de2 refine data copy in multi-graph
6 years ago
chenzomi 97610885d0 bug fix while evaluation
6 years ago
ms_yan 27712eafaf repair some format problem in API
6 years ago
chenhaozhe 04bc2a938e fix performance of bert
6 years ago
VectorSL b5ce6c55a5 gpu update example resnet
6 years ago
mindspore-ci-bot 63479f8e7c !1574 fix tfreadop hang
6 years ago
zhaozhenlong 6cd15ea553 use reshape as flatten grad
6 years ago
yanzhenxiang2020 d5af2f23b2 add topk and randomchoicewithmask data type for aicpu
6 years ago
gengdongjie 217d801c12 bugfix for resnet50_imagenet pretrained_ckpt
6 years ago
gengdongjie 135e90b135 Revert "add pattern AdjustAllReduceMulAdduse the old opadd test case for bugtemp fix try"
6 years ago
mindspore-ci-bot 431bc8bf4b !1553 change hook function grad input to tuple
6 years ago
mindspore-ci-bot 771a88d490 !1569 fix multi-graph run out of device resource
6 years ago
mindspore-ci-bot b298c515a6 !1559 Voc dataset support split ops
6 years ago
yao_yf dcb91b0ef6 fix reshape reshape case in auto parallel
6 years ago
yanghaitao f30928f084 fix tfreaderop hang
6 years ago
yujianfeng 1fb2cce274 Check the size of topk input names before converting input to attr
6 years ago
caifubi f6ad679ef9 fix multi-graph device resource run out bug
6 years ago
mindspore-ci-bot 0f22140331 !1548 [session]make manager for every graph
6 years ago
kingfo a5e66e159e change hook grad input to tuple
6 years ago
chenfei fce296eb38 make manager for every graph
6 years ago
mindspore-ci-bot e5c45bd339 !1538 add custom tbe ops for quant aware training
6 years ago
mindspore-ci-bot af1fde399b !1509 dataset: PR1457 fix 3 bug reports for split
6 years ago
mindspore-ci-bot 1deb091c0f !1529 support tensor set item the number value type is similar as tensor dtype
6 years ago
wandongdong 0a52fd052b add custom tbe ops for quant aware training
6 years ago
mindspore-ci-bot cf20b3443c !1514 fix ssd run failed problem
6 years ago
mindspore-ci-bot a0e552e75c !1524 fix compilation order
6 years ago
mindspore-ci-bot ffdb11f548 !1526 Move graph_map_schema.py to example directory
6 years ago
mindspore-ci-bot fac36e6a1a !1527 THOR ops master -> r0.3
6 years ago
jjfeing 642761c2b1 adapte Second order optimization ops
6 years ago
mindspore-ci-bot cfe87d9563 !1519 [Data]Updated UA, RandSharp and RandColor parameter check, Updated UA code and description.
6 years ago
buxue 5ab32c33e4 support tensor set item the number value type is similar as tensor dtype not same
6 years ago
mindspore-ci-bot e056799467 !1517 Add check for empty group parameters
6 years ago
heleiwang 2be75b0c74 mv graph_map_schema.py to example
6 years ago
panfengfeng d0d7864ccc fix compilation order
6 years ago
Peilin Wang 9228384304 fixed bug for split, RandomSampler and some other cleanup
6 years ago
mindspore-ci-bot bdd9aec368 !1463 Updated UA, RandSharp and RandColor parameter check, Updated UA code and description.
6 years ago
guohongzilong 75c1e7f6af add check for group parameters
6 years ago
mindspore-ci-bot c6f309e125 !1507 remove print
6 years ago
chengxianbin c77ac8aa0b add mobilenet file for ssd net
6 years ago
buxue c821cc1ebb remove print
6 years ago
mindspore-ci-bot 6be8929f62 !1496 revert decoupled of 1313
6 years ago
mindspore-ci-bot f51a745931 !1486 add train and eval script for LSTM
6 years ago
c00499814 d3c848fc09 Revert "!1313 decoupled of insert transdata and deal ref and split transdata"
6 years ago
mindspore-ci-bot 702fcbbe99 !1467 Pynative can not add cell hook
6 years ago
mindspore-ci-bot 7013a9918a !1485 Fix fusion condition of transpose and reshape
6 years ago
yujianfeng f95052fd65 Fix fusion condition of transpose and reshape
6 years ago
caojian05 1fae83d746 add train and eval script for LSTM
6 years ago
lvliang 11303142b1 pynative-cell-hook-grad-abnormal
6 years ago
mindspore-ci-bot 5157063cbb !1470 Fix log and comment errors in graphdata
6 years ago
mindspore-ci-bot d210fbb7e9 !1471 Fix input check in graphdata
6 years ago
heleiwang b0a354830b fix input check
6 years ago
heleiwang 0ca8daa1a2 fix log error
6 years ago
mindspore-ci-bot 47039a6d98 !1449 fix kernel select
6 years ago
mindspore-ci-bot 74cdb91151 !1458 remove old buffer fusion pass
6 years ago
liubuyu 6f6fc75ba5 bug fix
6 years ago
etone-chan 24e5387973 remove old buffer fusion pass
6 years ago
mindspore-ci-bot 51a50e17b7 !1429 update version from 0.2 to 0.3
6 years ago
jonyguo 3d6802007a update version from 0.2 to 0.3
6 years ago

3
.gitmodules vendored

@ -13,3 +13,6 @@
[submodule "graphengine"]
path = graphengine
url = https://gitee.com/mindspore/graphengine.git
[submodule "akg"]
path = akg
url = https://gitee.com/mindspore/akg.git

@ -89,4 +89,4 @@ if (ENABLE_TESTCASES)
add_subdirectory(tests)
endif()
include(cmake/package.cmake)
include(cmake/package.cmake)

@ -29,7 +29,7 @@ enrichment of the AI software/hardware application ecosystem.
<img src="docs/MindSpore-architecture.png" alt="MindSpore Architecture" width="600"/>
For more details please check out our [Architecture Guide](https://www.mindspore.cn/docs/en/0.2.0-alpha/architecture.html).
For more details please check out our [Architecture Guide](https://www.mindspore.cn/docs/en/0.3.0-alpha/architecture.html).
### Automatic Differentiation
@ -76,7 +76,7 @@ For installation using `pip`, take `CPU` and `Ubuntu-x86` build version as an ex
1. Download whl from [MindSpore download page](https://www.mindspore.cn/versions/en), and install the package.
```
pip install https://ms-release.obs.cn-north-4.myhuaweicloud.com/0.2.0-alpha/MindSpore/cpu/x86_ubuntu/mindspore-0.2.0-cp37-cp37m-linux_x86_64.whl
pip install https://ms-release.obs.cn-north-4.myhuaweicloud.com/0.3.0-alpha/MindSpore/cpu/ubuntu_x86/mindspore-0.3.0-cp37-cp37m-linux_x86_64.whl
```
2. Run the following command to verify the install.
@ -133,8 +133,8 @@ currently the containerized build options are supported as follows:
For `CPU` backend, you can directly pull and run the latest stable image using the below command:
```
docker pull mindspore/mindspore-cpu:0.2.0-alpha
docker run -it mindspore/mindspore-cpu:0.2.0-alpha /bin/bash
docker pull mindspore/mindspore-cpu:0.3.0-alpha
docker run -it mindspore/mindspore-cpu:0.3.0-alpha /bin/bash
```
* GPU
@ -151,8 +151,8 @@ currently the containerized build options are supported as follows:
Then you can pull and run the latest stable image using the below command:
```
docker pull mindspore/mindspore-gpu:0.2.0-alpha
docker run -it --runtime=nvidia --privileged=true mindspore/mindspore-gpu:0.2.0-alpha /bin/bash
docker pull mindspore/mindspore-gpu:0.3.0-alpha
docker run -it --runtime=nvidia --privileged=true mindspore/mindspore-gpu:0.3.0-alpha /bin/bash
```
To test if the docker image works, please execute the python code below and check the output:
@ -187,7 +187,7 @@ please check out [docker](docker/README.md) repo for the details.
## Quickstart
See the [Quick Start](https://www.mindspore.cn/tutorial/en/0.2.0-alpha/quick_start/quick_start.html)
See the [Quick Start](https://www.mindspore.cn/tutorial/en/0.3.0-alpha/quick_start/quick_start.html)
to implement the image classification.
## Docs

File diff suppressed because one or more lines are too long

@ -3053,6 +3053,61 @@ Copyright 2003 Google Inc.
Copyright 2009 Google Inc.
Copyright (C) 1991-2, RSA Data Security, Inc. Created 1991. All
Software: libtiff 4.1.0
Copyright notice:
Copyright © 2015 Open Microscopy Environment / University of Dundee
Copyright (c) 2004, Andrey Kiselev <dron@ak4719.spb.edu>
Copyright (c) 1990-1997 Sam Leffler
Copyright (c) 1991-1997 Silicon Graphics, Inc.
Copyright (c) 1988-1997 Sam Leffler
Copyright (c) 1991-1997 Sam Leffler
Use and Copyright
Copyright (C) 1990, 1995 Frank D. Cringle.
Copyright (c) 1994-1997 Sam Leffler
Copyright (c) 1994-1997 Silicon Graphics, Inc.
Copyright (c) 1997 Greg Ward Larson
Copyright (c) 1997 Silicon Graphics, Inc.
Copyright (c) 2010, Andrey Kiselev <dron@ak4719.spb.edu>
Copyright (c) Joris Van Damme <info@awaresystems.be>
Copyright (c) AWare Systems <http:www.awaresystems.be/>
Copyright (c) 1996-1997 Sam Leffler
Copyright (c) 1996 Pixar
Copyright (c) 1995-1997 Sam Leffler
Copyright (c) 1995-1997 Silicon Graphics, Inc.
Copyright (c) 1988-1996 Sam Leffler
Copyright (c) 1991-1996 Silicon Graphics, Inc.
Copyright (c) 1992-1997 Sam Leffler
Copyright (c) 1992-1997 Silicon Graphics, Inc.
Copyright (c) 2018, Mapbox
Copyright (c) 2017, Planet Labs
Copyright (c) 1990 by Sun Microsystems, Inc.
Copyright 1990 by Digital Equipment Corporation, Maynard, Massachusetts.
Copyright 1991 by Digital Equipment Corporation, Maynard, Massachusetts.
Copyright (c) 2002, Andrey Kiselev <dron@ak4719.spb.edu>
Copyright (c) 2003 Ross Finlayson
Additions (c) Richard Nolde 2006-2010
Copyright (c) 2003, Andrey Kiselev <dron@ak4719.spb.edu>
Copyright (c) 2000, Frank Warmerdam
Copyright (c) 1987, 1993, 1994
Copyright (c) 1989, 1993
Copyright (c) 2009 Frank Warmerdam
Copyright (c) 1987, 1993
Copyright (c) 2005 The DragonFly Project. All rights reserved.
Copyright (c) 2003 Citrus Project,
All rights reserved.
Copyright (c) 1990, 1993
Copyright (c) 1996 Mike Johnson
Copyright (c) 1996 BancTec AB
Copyright (c) 2004, Andrey Kiselev <dron@ak4719.spb.edu>
Copyright (c) 2012, Frank Warmerdam <warmerdam@pobox.com>
Copyright (c) 2019, Even Rouault <even.rouault at spatialys.com>
Copyright (c) 2007, Frank Warmerdam <warmerdam@pobox.com>
Copyright (c) 2019, Thomas Bernard <miniupnp@free.fr>
Copyright (c) 2008, Andrey Kiselev <dron@ak4719.spb.edu>
Copyright (c) 1999, Frank Warmerdam
Copyright (c) 1991-1996 Sam Leffler
Copyright (c) 1996 USAF Phillips Laboratory
Software: opencv 4.2.0
Copyright notice:
Copyright (C) 2016, NVIDIA Corporation, all rights reserved.

@ -25,7 +25,7 @@ usage()
echo "Usage:"
echo "bash build.sh [-d] [-r] [-v] [-c on|off] [-t on|off] [-g on|off] [-h] [-b ge] [-m infer|train] \\"
echo " [-a on|off] [-Q on|off] [-p on|off] [-i] [-L] [-R] [-D on|off] [-j[n]] [-e gpu|d|cpu] \\"
echo " [-P on|off] [-z [on|off]] [-M on|off] [-V 9.2|10.1] [-I] [-K]"
echo " [-P on|off] [-z [on|off]] [-M on|off] [-V 9.2|10.1] [-I]"
echo ""
echo "Options:"
echo " -d Debug mode"
@ -52,7 +52,6 @@ usage()
echo " -M Enable MPI and NCCL for GPU training, default on"
echo " -V Specify the minimum required cuda version, default CUDA 9.2"
echo " -I Compile predict, default off"
echo " -K Compile with AKG, default off"
}
# check value of input is 'on' or 'off'
@ -91,7 +90,6 @@ checkopts()
COMPILE_PREDICT="off"
USE_GLOG="on"
PREDICT_PLATFORM=""
ENABLE_AKG="off"
# Process the options
while getopts 'drvj:c:t:hsb:a:g:p:ie:m:I:LRP:Q:D:zM:V:K' opt
@ -230,10 +228,6 @@ checkopts()
exit 1
fi
;;
K)
ENABLE_AKG="on"
echo "enable compile with akg"
;;
*)
echo "Unknown option ${opt}!"
usage
@ -307,9 +301,6 @@ build_mindspore()
if [[ "X$USE_GLOG" = "Xon" ]]; then
CMAKE_ARGS="${CMAKE_ARGS} -DUSE_GLOG=ON"
fi
if [[ "X$ENABLE_AKG" = "Xon" ]]; then
CMAKE_ARGS="${CMAKE_ARGS} -DENABLE_AKG=ON"
fi
echo "${CMAKE_ARGS}"
if [[ "X$INC_BUILD" = "Xoff" ]]; then
cmake ${CMAKE_ARGS} ../..
@ -433,9 +424,9 @@ build_predict()
cd "${BASEPATH}/predict/output/"
if [[ "$PREDICT_PLATFORM" == "x86_64" ]]; then
tar -cf MSPredict-0.2.0-linux_x86_64.tar.gz include/ lib/ --warning=no-file-changed
tar -cf MSPredict-0.3.0-linux_x86_64.tar.gz include/ lib/ --warning=no-file-changed
elif [[ "$PREDICT_PLATFORM" == "arm64" ]]; then
tar -cf MSPredict-0.2.0-linux_aarch64.tar.gz include/ lib/ --warning=no-file-changed
tar -cf MSPredict-0.3.0-linux_aarch64.tar.gz include/ lib/ --warning=no-file-changed
fi
echo "success to build predict project!"
}

@ -16,7 +16,6 @@ option(ENABLE_DUMP_PROTO "Enable dump anf graph to file in ProtoBuffer format, d
option(ENABLE_DUMP_E2E "Enable dump e2e file, default on" OFF)
option(ENABLE_DUMP_IR "Enable dump funciton graph ir, default on" ON)
option(ENABLE_MPI "enable mpi" OFF)
option(ENABLE_AKG "enable akg" OFF)
if (CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
if (WIN32)

@ -0,0 +1,67 @@
FROM ubuntu:18.04
MAINTAINER leonwanghui <leon.wanghui@huawei.com>
# Set env
ENV PYTHON_ROOT_PATH /usr/local/python-3.7.5
ENV PATH /usr/local/bin:$PATH
# Install base tools
RUN apt update \
&& DEBIAN_FRONTEND=noninteractive apt install -y \
vim \
wget \
curl \
xz-utils \
net-tools \
openssh-client \
git \
ntpdate \
tzdata \
tcl \
sudo \
bash-completion
# Install compile tools
RUN DEBIAN_FRONTEND=noninteractive apt install -y \
gcc \
g++ \
zlibc \
make \
libgmp-dev \
patch \
autoconf \
libtool \
automake \
flex
# Set bash
RUN echo "dash dash/sh boolean false" | debconf-set-selections
RUN DEBIAN_FRONTEND=noninteractive dpkg-reconfigure dash
# Install python (v3.7.5)
RUN apt install -y libffi-dev libssl-dev zlib1g-dev libbz2-dev libncurses5-dev \
libgdbm-dev libgdbm-compat-dev liblzma-dev libreadline-dev libsqlite3-dev \
&& cd /tmp \
&& wget https://github.com/python/cpython/archive/v3.7.5.tar.gz \
&& tar -xvf v3.7.5.tar.gz \
&& cd /tmp/cpython-3.7.5 \
&& mkdir -p ${PYTHON_ROOT_PATH} \
&& ./configure --prefix=${PYTHON_ROOT_PATH} \
&& make -j4 \
&& make install -j4 \
&& rm -f /usr/local/bin/python \
&& rm -f /usr/local/bin/pip \
&& ln -s ${PYTHON_ROOT_PATH}/bin/python3.7 /usr/local/bin/python \
&& ln -s ${PYTHON_ROOT_PATH}/bin/pip3.7 /usr/local/bin/pip \
&& rm -rf /tmp/cpython-3.7.5 \
&& rm -f /tmp/v3.7.5.tar.gz
# Set pip source
RUN mkdir -pv /root/.pip \
&& echo "[global]" > /root/.pip/pip.conf \
&& echo "trusted-host=mirrors.aliyun.com" >> /root/.pip/pip.conf \
&& echo "index-url=http://mirrors.aliyun.com/pypi/simple/" >> /root/.pip/pip.conf
# Install MindSpore cpu whl package
RUN pip install --no-cache-dir https://ms-release.obs.cn-north-4.myhuaweicloud.com/0.3.0-alpha/MindSpore/cpu/ubuntu_x86/mindspore-0.3.0-cp37-cp37m-linux_x86_64.whl

@ -0,0 +1,83 @@
FROM nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
MAINTAINER leonwanghui <leon.wanghui@huawei.com>
# Set env
ENV PYTHON_ROOT_PATH /usr/local/python-3.7.5
ENV OMPI_ROOT_PATH /usr/local/openmpi-3.1.5
ENV PATH ${OMPI_ROOT_PATH}/bin:/usr/local/bin:$PATH
ENV LD_LIBRARY_PATH ${OMPI_ROOT_PATH}/lib:$LD_LIBRARY_PATH
# Install base tools
RUN apt update \
&& DEBIAN_FRONTEND=noninteractive apt install -y \
vim \
wget \
curl \
xz-utils \
net-tools \
openssh-client \
git \
ntpdate \
tzdata \
tcl \
sudo \
bash-completion
# Install compile tools
RUN DEBIAN_FRONTEND=noninteractive apt install -y \
gcc \
g++ \
zlibc \
make \
libgmp-dev \
patch \
autoconf \
libtool \
automake \
flex \
libnccl2=2.4.8-1+cuda10.1 \
libnccl-dev=2.4.8-1+cuda10.1
# Set bash
RUN echo "dash dash/sh boolean false" | debconf-set-selections
RUN DEBIAN_FRONTEND=noninteractive dpkg-reconfigure dash
# Install python (v3.7.5)
RUN apt install -y libffi-dev libssl-dev zlib1g-dev libbz2-dev libncurses5-dev \
libgdbm-dev libgdbm-compat-dev liblzma-dev libreadline-dev libsqlite3-dev \
&& cd /tmp \
&& wget https://github.com/python/cpython/archive/v3.7.5.tar.gz \
&& tar -xvf v3.7.5.tar.gz \
&& cd /tmp/cpython-3.7.5 \
&& mkdir -p ${PYTHON_ROOT_PATH} \
&& ./configure --prefix=${PYTHON_ROOT_PATH} \
&& make -j4 \
&& make install -j4 \
&& rm -f /usr/local/bin/python \
&& rm -f /usr/local/bin/pip \
&& ln -s ${PYTHON_ROOT_PATH}/bin/python3.7 /usr/local/bin/python \
&& ln -s ${PYTHON_ROOT_PATH}/bin/pip3.7 /usr/local/bin/pip \
&& rm -rf /tmp/cpython-3.7.5 \
&& rm -f /tmp/v3.7.5.tar.gz
# Set pip source
RUN mkdir -pv /root/.pip \
&& echo "[global]" > /root/.pip/pip.conf \
&& echo "trusted-host=mirrors.aliyun.com" >> /root/.pip/pip.conf \
&& echo "index-url=http://mirrors.aliyun.com/pypi/simple/" >> /root/.pip/pip.conf
# Install openmpi (v3.1.5)
RUN cd /tmp \
&& wget https://download.open-mpi.org/release/open-mpi/v3.1/openmpi-3.1.5.tar.gz \
&& tar -xvf openmpi-3.1.5.tar.gz \
&& cd /tmp/openmpi-3.1.5 \
&& mkdir -p ${OMPI_ROOT_PATH} \
&& ./configure --prefix=${OMPI_ROOT_PATH} \
&& make -j4 \
&& make install -j4 \
&& rm -rf /tmp/openmpi-3.1.5 \
&& rm -f /tmp/openmpi-3.1.5.tar.gz
# Install MindSpore cuda-10.1 whl package
RUN pip install --no-cache-dir https://ms-release.obs.cn-north-4.myhuaweicloud.com/0.3.0-alpha/MindSpore/gpu/ubuntu_x86/cuda-10.1/mindspore_gpu-0.3.0-cp37-cp37m-linux_x86_64.whl

@ -52,7 +52,7 @@ def create_bert_dataset(epoch_size=1, device_num=1, rank=0, do_shuffle="true", e
ds = ds.map(input_columns="input_ids", operations=type_cast_op)
# apply batch operations
ds = ds.batch(bert_net_cfg.batch_size, drop_remainder=True)
ds = ds.repeat(repeat_count)
ds = ds.repeat(new_repeat_count)
logger.info("data size: {}".format(ds.get_dataset_size()))
logger.info("repeatcount: {}".format(ds.get_repeat_count()))
return ds, new_repeat_count

@ -81,6 +81,11 @@ def run_pretrain():
context.reset_auto_parallel_context()
context.set_auto_parallel_context(parallel_mode=ParallelMode.DATA_PARALLEL, mirror_mean=True,
device_num=device_num)
from mindspore.parallel._auto_parallel_context import auto_parallel_context
if bert_net_cfg.num_hidden_layers == 12:
auto_parallel_context().set_all_reduce_fusion_split_indices([28, 55, 82, 109, 136, 163, 190, 205])
elif bert_net_cfg.num_hidden_layers == 24:
auto_parallel_context().set_all_reduce_fusion_split_indices([38, 93, 148, 203, 258, 313, 368, 397])
D.init()
rank = args_opt.device_id % device_num
else:

@ -26,8 +26,8 @@ import os
import pickle
######## mindrecord_schema begin ##########
mindrecord_schema = {"label": {"type": "int64"},
"data": {"type": "bytes"},
mindrecord_schema = {"label": {"type": "int32"},
"image": {"type": "bytes"},
"file_name": {"type": "string"}}
######## mindrecord_schema end ##########
@ -121,5 +121,5 @@ def mindrecord_dict_data(task_id):
if not image_bytes:
print("The image file: {} is invalid.".format(file_name))
continue
data["data"] = image_bytes
data["image"] = image_bytes
yield data

@ -0,0 +1,132 @@
# DeepFM Description
This is an example of training DeepFM with Criteo dataset in MindSpore.
[Paper](https://arxiv.org/pdf/1703.04247.pdf) Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, Xiuqiang He
# Model architecture
The overall network architecture of DeepFM is show below:
[Link](https://arxiv.org/pdf/1703.04247.pdf)
# Requirements
- Install [MindSpore](https://www.mindspore.cn/install/en).
- Download the criteo dataset for pre-training. Extract and clean text in the dataset with [WikiExtractor](https://github.com/attardi/wikiextractor). Convert the dataset to TFRecord format and move the files to a specified path.
- For more information, please check the resources below
- [MindSpore tutorials](https://www.mindspore.cn/tutorial/zh-CN/master/index.html)
- [MindSpore API](https://www.mindspore.cn/api/zh-CN/master/index.html)
# Script description
## Script and sample code
```python
├── deepfm
├── README.md
├── scripts
│ ├──run_train.sh
│ ├──run_eval.sh
├── src
│ ├──config.py
│ ├──dataset.py
│ ├──callback.py
│ ├──deepfm.py
├── train.py
├── eval.py
```
## Training process
### Usage
- sh run_train.sh [DEVICE_NUM] [DATASET_PATH] [MINDSPORE_HCCL_CONFIG_PAHT]
- python train.py --dataset_path [DATASET_PATH]
### Launch
```
# distribute training example
sh scripts/run_distribute_train.sh 8 /opt/dataset/criteo /opt/mindspore_hccl_file.json
# standalone training example
sh scripts/run_standalone_train.sh 0 /opt/dataset/criteo
or
python train.py --dataset_path /opt/dataset/criteo > output.log 2>&1 &
```
### Result
Training result will be stored in the example path.
Checkpoints will be stored at `./checkpoint` by default,
and training log will be redirected to `./output.log` by default,
and loss log will be redirected to `./loss.log` by default,
and eval log will be redirected to `./auc.log` by default.
## Eval process
### Usage
- sh run_eval.sh [DEVICE_ID] [DATASET_PATH] [CHECKPOINT_PATH]
### Launch
```
# infer example
sh scripts/run_eval.sh 0 ~/criteo/eval/ ~/train/deepfm-15_41257.ckpt
```
> checkpoint can be produced in training process.
### Result
Inference result will be stored in the example path, you can find result like the followings in `auc.log`.
```
2020-05-27 20:51:35 AUC: 0.80577889065281, eval time: 35.55999s.
```
# Model description
## Performance
### Training Performance
| Parameters | DeepFM |
| -------------------------- | ------------------------------------------------------|
| Model Version | |
| Resource | Ascend 910, cpu:2.60GHz 96cores, memory:1.5T |
| uploaded Date | 05/27/2020 |
| MindSpore Version | 0.2.0 |
| Dataset | Criteo |
| Training Parameters | src/config.py |
| Optimizer | Adam |
| Loss Function | SoftmaxCrossEntropyWithLogits |
| outputs | |
| Loss | 0.4234 |
| Accuracy | AUC[0.8055] |
| Total time | 91 min |
| Params (M) | |
| Checkpoint for Fine tuning | |
| Model for inference | |
#### Inference Performance
| Parameters | | |
| -------------------------- | ----------------------------- | ------------------------- |
| Model Version | | |
| Resource | Ascend 910 | Ascend 310 |
| uploaded Date | 05/27/2020 | 05/27/2020 |
| MindSpore Version | 0.2.0 | 0.2.0 |
| Dataset | Criteo | |
| batch_size | 1000 | |
| outputs | | |
| Accuracy | AUC[0.8055] | |
| Speed | | |
| Total time | 35.559s | |
| Model for inference | | |
# ModelZoo Homepage
[Link](https://gitee.com/mindspore/mindspore/tree/master/mindspore/model_zoo)

@ -0,0 +1,14 @@
# Copyright 2020 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the License);
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# httpwww.apache.orglicensesLICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an AS IS BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================

@ -0,0 +1,66 @@
# Copyright 2020 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""train_criteo."""
import os
import sys
import time
import argparse
from mindspore import context
from mindspore.train.model import Model
from mindspore.train.serialization import load_checkpoint, load_param_into_net
from src.deepfm import ModelBuilder, AUCMetric
from src.config import DataConfig, ModelConfig, TrainConfig
from src.dataset import create_dataset
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
parser = argparse.ArgumentParser(description='CTR Prediction')
parser.add_argument('--checkpoint_path', type=str, default=None, help='Checkpoint file path')
parser.add_argument('--dataset_path', type=str, default=None, help='Dataset path')
args_opt, _ = parser.parse_known_args()
device_id = int(os.getenv('DEVICE_ID'))
context.set_context(mode=context.GRAPH_MODE, device_target="Ascend", device_id=device_id)
def add_write(file_path, print_str):
with open(file_path, 'a+', encoding='utf-8') as file_out:
file_out.write(print_str + '\n')
if __name__ == '__main__':
data_config = DataConfig()
model_config = ModelConfig()
train_config = TrainConfig()
ds_eval = create_dataset(args_opt.dataset_path, train_mode=False,
epochs=1, batch_size=train_config.batch_size)
model_builder = ModelBuilder(ModelConfig, TrainConfig)
train_net, eval_net = model_builder.get_train_eval_net()
train_net.set_train()
eval_net.set_train(False)
auc_metric = AUCMetric()
model = Model(train_net, eval_network=eval_net, metrics={"auc": auc_metric})
param_dict = load_checkpoint(args_opt.checkpoint_path)
load_param_into_net(eval_net, param_dict)
start = time.time()
res = model.eval(ds_eval)
eval_time = time.time() - start
time_str = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
out_str = f'{time_str} AUC: {list(res.values())[0]}, eval time: {eval_time}s.'
print(out_str)
add_write('./auc.log', str(out_str))

@ -0,0 +1,44 @@
#!/bin/bash
# Copyright 2020 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
echo "Please run the script as: "
echo "sh scripts/run_distribute_train.sh DEVICE_NUM DATASET_PATH MINDSPORE_HCCL_CONFIG_PAHT"
echo "for example: sh scripts/run_distribute_train.sh 8 /dataset_path /rank_table_8p.json"
echo "After running the script, the network runs in the background, The log will be generated in logx/output.log"
export RANK_SIZE=$1
DATA_URL=$2
export MINDSPORE_HCCL_CONFIG_PAHT=$3
for ((i=0; i<RANK_SIZE;i++))
do
export DEVICE_ID=$i
export RANK_ID=$i
rm -rf log$i
mkdir ./log$i
cp *.py ./log$i
cp -r src ./log$i
cd ./log$i || exit
echo "start training for rank $i, device $DEVICE_ID"
env > env.log
python -u train.py \
--dataset_path=$DATA_URL \
--ckpt_path="checkpoint" \
--eval_file_name='auc.log' \
--loss_file_name='loss.log' \
--do_eval=True > output.log 2>&1 &
cd ../
done

@ -0,0 +1,32 @@
#!/bin/bash
# Copyright 2020 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
echo "Please run the script as: "
echo "sh scripts/run_eval.sh DEVICE_ID DATASET_PATH CHECKPOINT_PATH"
echo "for example: sh scripts/run_eval.sh 0 /dataset_path /checkpoint_path"
echo "After running the script, the network runs in the background, The log will be generated in ms_log/eval_output.log"
export DEVICE_ID=$1
DATA_URL=$2
CHECKPOINT_PATH=$3
mkdir -p ms_log
CUR_DIR=`pwd`
export GLOG_log_dir=${CUR_DIR}/ms_log
export GLOG_logtostderr=0
python -u eval.py \
--dataset_path=$DATA_URL \
--checkpoint_path=$CHECKPOINT_PATH > ms_log/eval_output.log 2>&1 &

@ -0,0 +1,34 @@
#!/bin/bash
# Copyright 2020 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
echo "Please run the script as: "
echo "sh scripts/run_standalone_train.sh DEVICE_ID DATASET_PATH"
echo "for example: sh scripts/run_standalone_train.sh 0 /dataset_path"
echo "After running the script, the network runs in the background, The log will be generated in ms_log/output.log"
export DEVICE_ID=$1
DATA_URL=$2
mkdir -p ms_log
CUR_DIR=`pwd`
export GLOG_log_dir=${CUR_DIR}/ms_log
export GLOG_logtostderr=0
python -u train.py \
--dataset_path=$DATA_URL \
--ckpt_path="checkpoint" \
--eval_file_name='auc.log' \
--loss_file_name='loss.log' \
--do_eval=True > ms_log/output.log 2>&1 &

@ -0,0 +1,14 @@
# Copyright 2020 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the License);
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# httpwww.apache.orglicensesLICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an AS IS BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================

@ -0,0 +1,107 @@
# Copyright 2020 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the License);
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# httpwww.apache.orglicensesLICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an AS IS BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""
Defined callback for DeepFM.
"""
import time
from mindspore.train.callback import Callback
def add_write(file_path, out_str):
with open(file_path, 'a+', encoding='utf-8') as file_out:
file_out.write(out_str + '\n')
class EvalCallBack(Callback):
"""
Monitor the loss in training.
If the loss is NAN or INF terminating training.
Note
If per_print_times is 0 do not print loss.
"""
def __init__(self, model, eval_dataset, auc_metric, eval_file_path):
super(EvalCallBack, self).__init__()
self.model = model
self.eval_dataset = eval_dataset
self.aucMetric = auc_metric
self.aucMetric.clear()
self.eval_file_path = eval_file_path
def epoch_end(self, run_context):
start_time = time.time()
out = self.model.eval(self.eval_dataset)
eval_time = int(time.time() - start_time)
time_str = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
out_str = "{} EvalCallBack metric{}; eval_time{}s".format(
time_str, out.values(), eval_time)
print(out_str)
add_write(self.eval_file_path, out_str)
class LossCallBack(Callback):
"""
Monitor the loss in training.
If the loss is NAN or INF terminating training.
Note
If per_print_times is 0 do not print loss.
Args
loss_file_path (str) The file absolute path, to save as loss_file;
per_print_times (int) Print loss every times. Default 1.
"""
def __init__(self, loss_file_path, per_print_times=1):
super(LossCallBack, self).__init__()
if not isinstance(per_print_times, int) or per_print_times < 0:
raise ValueError("print_step must be int and >= 0.")
self.loss_file_path = loss_file_path
self._per_print_times = per_print_times
def step_end(self, run_context):
cb_params = run_context.original_args()
loss = cb_params.net_outputs.asnumpy()
cur_step_in_epoch = (cb_params.cur_step_num - 1) % cb_params.batch_num + 1
cur_num = cb_params.cur_step_num
if self._per_print_times != 0 and cur_num % self._per_print_times == 0:
with open(self.loss_file_path, "a+") as loss_file:
time_str = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
loss_file.write("{} epoch: {} step: {}, loss is {}\n".format(
time_str, cb_params.cur_epoch_num, cur_step_in_epoch, loss))
print("epoch: {} step: {}, loss is {}\n".format(
cb_params.cur_epoch_num, cur_step_in_epoch, loss))
class TimeMonitor(Callback):
"""
Time monitor for calculating cost of each epoch.
Args
data_size (int) step size of an epoch.
"""
def __init__(self, data_size):
super(TimeMonitor, self).__init__()
self.data_size = data_size
def epoch_begin(self, run_context):
self.epoch_time = time.time()
def epoch_end(self, run_context):
epoch_mseconds = (time.time() - self.epoch_time) * 1000
per_step_mseconds = epoch_mseconds / self.data_size
print("epoch time: {0}, per step time: {1}".format(epoch_mseconds, per_step_mseconds), flush=True)
def step_begin(self, run_context):
self.step_time = time.time()
def step_end(self, run_context):
step_mseconds = (time.time() - self.step_time) * 1000
print(f"step time {step_mseconds}", flush=True)

@ -0,0 +1,62 @@
# Copyright 2020 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""
network config setting, will be used in train.py and eval.py
"""
class DataConfig:
"""
Define parameters of dataset.
"""
data_vocab_size = 184965
train_num_of_parts = 21
test_num_of_parts = 3
batch_size = 1000
data_field_size = 39
# dataset format, 1: mindrecord, 2: tfrecord, 3: h5
data_format = 2
class ModelConfig:
"""
Define parameters of model.
"""
batch_size = DataConfig.batch_size
data_field_size = DataConfig.data_field_size
data_vocab_size = DataConfig.data_vocab_size
data_emb_dim = 80
deep_layer_args = [[400, 400, 512], "relu"]
init_args = [-0.01, 0.01]
weight_bias_init = ['normal', 'normal']
keep_prob = 0.9
class TrainConfig:
"""
Define parameters of training.
"""
batch_size = DataConfig.batch_size
l2_coef = 1e-6
learning_rate = 1e-5
epsilon = 1e-8
loss_scale = 1024.0
train_epochs = 15
save_checkpoint = True
ckpt_file_name_prefix = "deepfm"
save_checkpoint_steps = 1
keep_checkpoint_max = 15
eval_callback = True
loss_callback = True

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

@ -0,0 +1,91 @@
# Copyright 2020 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""train_criteo."""
import os
import sys
import argparse
from mindspore import context, ParallelMode
from mindspore.communication.management import init
from mindspore.train.model import Model
from mindspore.train.callback import ModelCheckpoint, CheckpointConfig, TimeMonitor
from src.deepfm import ModelBuilder, AUCMetric
from src.config import DataConfig, ModelConfig, TrainConfig
from src.dataset import create_dataset, DataType
from src.callback import EvalCallBack, LossCallBack
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
parser = argparse.ArgumentParser(description='CTR Prediction')
parser.add_argument('--dataset_path', type=str, default=None, help='Dataset path')
parser.add_argument('--ckpt_path', type=str, default=None, help='Checkpoint path')
parser.add_argument('--eval_file_name', type=str, default="./auc.log", help='eval file path')
parser.add_argument('--loss_file_name', type=str, default="./loss.log", help='loss file path')
parser.add_argument('--do_eval', type=bool, default=True, help='Do evaluation or not.')
args_opt, _ = parser.parse_known_args()
device_id = int(os.getenv('DEVICE_ID'))
context.set_context(mode=context.GRAPH_MODE, device_target="Ascend", device_id=device_id)
if __name__ == '__main__':
data_config = DataConfig()
model_config = ModelConfig()
train_config = TrainConfig()
rank_size = int(os.environ.get("RANK_SIZE", 1))
if rank_size > 1:
context.reset_auto_parallel_context()
context.set_auto_parallel_context(parallel_mode=ParallelMode.DATA_PARALLEL, mirror_mean=True)
init()
rank_id = int(os.environ.get('RANK_ID'))
else:
rank_size = None
rank_id = None
ds_train = create_dataset(args_opt.dataset_path,
train_mode=True,
epochs=train_config.train_epochs,
batch_size=train_config.batch_size,
data_type=DataType(data_config.data_format),
rank_size=rank_size,
rank_id=rank_id)
model_builder = ModelBuilder(ModelConfig, TrainConfig)
train_net, eval_net = model_builder.get_train_eval_net()
auc_metric = AUCMetric()
model = Model(train_net, eval_network=eval_net, metrics={"auc": auc_metric})
time_callback = TimeMonitor(data_size=ds_train.get_dataset_size())
loss_callback = LossCallBack(loss_file_path=args_opt.loss_file_name)
callback_list = [time_callback, loss_callback]
if train_config.save_checkpoint:
config_ck = CheckpointConfig(save_checkpoint_steps=train_config.save_checkpoint_steps,
keep_checkpoint_max=train_config.keep_checkpoint_max)
ckpt_cb = ModelCheckpoint(prefix=train_config.ckpt_file_name_prefix,
directory=args_opt.ckpt_path,
config=config_ck)
callback_list.append(ckpt_cb)
if args_opt.do_eval:
ds_eval = create_dataset(args_opt.dataset_path, train_mode=False,
epochs=train_config.train_epochs,
batch_size=train_config.batch_size,
data_type=DataType(data_config.data_format))
eval_callback = EvalCallBack(model, ds_eval, auc_metric,
eval_file_path=args_opt.eval_file_name)
callback_list.append(eval_callback)
model.train(train_config.train_epochs, ds_train, callbacks=callback_list)

@ -0,0 +1,66 @@
# Deeplab-V3 Example
## Description
This is an example of training DeepLabv3 with PASCAL VOC 2012 dataset in MindSpore.
## Requirements
- Install [MindSpore](https://www.mindspore.cn/install/en).
- Download the VOC 2012 dataset for training.
> Notes:
If you are running a fine-tuning or evaluation task, prepare the corresponding checkpoint file.
## Running the Example
### Training
- Set options in config.py.
- Run `run_standalone_train.sh` for non-distributed training.
``` bash
sh scripts/run_standalone_train.sh DEVICE_ID EPOCH_SIZE DATA_DIR
```
- Run `run_distribute_train.sh` for distributed training.
``` bash
sh scripts/run_distribute_train.sh DEVICE_NUM EPOCH_SIZE DATA_DIR MINDSPORE_HCCL_CONFIG_PATH
```
### Evaluation
Set options in evaluation_config.py. Make sure the 'data_file' and 'finetune_ckpt' are set to your own path.
- Run run_eval.sh for evaluation.
``` bash
sh scripts/run_eval.sh DEVICE_ID DATA_DIR
```
## Options and Parameters
It contains of parameters of Deeplab-V3 model and options for training, which is set in file config.py.
### Options:
```
config.py:
learning_rate Learning rate, default is 0.0014.
weight_decay Weight decay, default is 5e-5.
momentum Momentum, default is 0.97.
crop_size Image crop size [height, width] during training, default is 513.
eval_scales The scales to resize images for evaluation, default is [0.5, 0.75, 1.0, 1.25, 1.5, 1.75].
output_stride The ratio of input to output spatial resolution, default is 16.
ignore_label Ignore label value, default is 255.
seg_num_classes Number of semantic classes, including the background class (if exists).
foreground classes + 1 background class in the PASCAL VOC 2012 dataset, default is 21.
fine_tune_batch_norm Fine tune the batch norm parameters or not, default is False.
atrous_rates Atrous rates for atrous spatial pyramid pooling, default is None.
decoder_output_stride The ratio of input to output spatial resolution when employing decoder
to refine segmentation results, default is None.
image_pyramid Input scales for multi-scale feature extraction, default is None.
```
### Parameters:
```
Parameters for dataset and network:
distribute Run distribute, default is false.
epoch_size Epoch size, default is 6.
batch_size batch size of input dataset: N, default is 2.
data_url Train/Evaluation data url, required.
checkpoint_url Checkpoint path, default is None.
enable_save_ckpt Enable save checkpoint, default is true.
save_checkpoint_steps Save checkpoint steps, default is 1000.
save_checkpoint_num Save checkpoint numbers, default is 1.
```

Some files were not shown because too many files have changed in this diff Show More

Loading…
Cancel
Save