You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
|
5 years ago | |
---|---|---|
.. | ||
imagenet | 5 years ago | |
template | 5 years ago | |
README.md | 5 years ago | |
run_imagenet.sh | 5 years ago | |
run_template.sh | 5 years ago | |
writer.py | 5 years ago |
README.md
MindRecord generating guidelines
Create work space
Assume the dataset name is 'xyz'
- Create work space from template
cd ${your_mindspore_home}/example/convert_to_mindrecord cp -r template xyz
Implement data generator
Edit dictionary data generator
- Edit file
cd ${your_mindspore_home}/example/convert_to_mindrecord vi xyz/mr_api.py
Two API, 'mindrecord_task_number' and 'mindrecord_dict_data', must be implemented
- 'mindrecord_task_number()' returns number of tasks. Return 1 if data row is generated serially. Return N if generator can be split into N parallel-run tasks.
- 'mindrecord_dict_data(task_id)' yields dictionary data row by row. 'task_id' is 0..N-1, if N is return value of mindrecord_task_number()
Tricky for parallel run
- For imagenet, one directory can be a task.
- For TFRecord with multiple files, each file can be a task.
- For TFRecord with 1 file only, it could also be split into N tasks. Task_id=K means: data row is picked only if (count % N == K)
Run data generator
- run python script
cd ${your_mindspore_home}/example/convert_to_mindrecord python writer.py --mindrecord_script imagenet [...]