History

gongweibao 990d6396fe Reuduce memory copy when communication between trainer and pserver. (#9271 )		8 years ago
..
vgg16	Reuduce memory copy when communication between trainer and pserver. (#9271 )	8 years ago
README.md	clean code	8 years ago

README.md

Cluster Training Benchmark

Setup

Platform
- Kubernetes: v1.6.2
- Linux Kernel: v3.10.0
Resource
- CPU: 10 Cores per Pod
- Memory: 5GB per Pod
Docker Image

We use different base Docker Image to run the benchmark on Kubernetes:
- PaddlePaddle v2: paddlepaddle/paddle:0.11.0
- PaddlePaddle Fluid: paddlepaddle/paddle:[commit-id]
- TensorFlow: tensorflow/tensorflow:1.5.0-rc0
Model vgg16 is used in this benchmark.

Cases

Variable
- Batch Size of training data.
- PServer count of the training job.
- The number of trainers.
Invariant
- The resource of trainer/pserver Pod.

Measure the Performance for Different Batch Size

PServer Count: 40
Trainer Count: 100
Metrics: mini-batch / sec

Batch Size	32	64	128	256
PaddlePaddle Fluid	-	-	-	-
PaddlePaddle v2	-	-	-	-
TensorFlow	-	-	-	-

Measure the Performance for Different PServer Count

Trainer Count: 100
Batch Size: 64
Metrics: mini-batch / sec

PServer Count	10	20	40	60
PaddlePaddle Fluid	-	-	-	-
PaddlePaddle v2	-	-	-	-
TensorFlow	-	-	-	-

Measure Parallel Efficiency By Increasing Trainer Count

PServer Count: 20
Batch Size: 64
Metrics:

S = \div(T1, TN)

which S is the ratio of T1 over TN, training time of 1 and N trainers. The parallel efficiency is:

E = \div(S, N)

Trainer Counter	1	10	20	30	40	50	60	70	80	90	100
PaddlePaddle Fluid	-	-	-	-	-	-	-	-	-	-	-
PaddlePaddle v2	-	-	-	-	-	-	-	-	-	-	-
TensorFlow	-	-	-	-	-	-	-	-	-	-	-

Reproduce the benchmark

TODO