Paddle/doc/design/evaluator.md

## Evaluator Design

### Problem Statement

During training or inference, we provide an evaluation function to measure the model performance, for example, accuracy, precision, etc. In the operator based framework design, the data passes through the network pipeline batch by batch. As a result, inside the operator, we only calculate the metrics for one minibatch. Thus, we need to provide a mechanism to calculate the metrics for each N pass/batch the user wants.

### Evaluator Design
Currently, every operation is expressed in the graph. We divide the evaluator process into three steps.

1. Initialize the metric state and add it into the block.

2. Calculate the concerned metrics for every mini-batch. The single evaluator operator is only responsible for calculating the necessary statistics for one mini-batch. For example, the accuracy operator only calculates the accuracy for a minibatch data if run once.


3. Merge the mini-batch statistics to form the evaluation result for multiple mini-batches. When it comes to distributed training/Multi-GPU training, aggregate the value from different devices.

### Implementation
This design is shown in the Python API. 
Each metric operator needs to caculate the metric statistic and return the batch-aware states. Python side is responsible for accumulating the states for each pass. 

    
```python
class Evaluator(object):
    """
    Evaluator Base class.
    """
    def __init__(self, name, **kwargs):
       """
       Different evaluator may has different metric states. E.g, Accuracy need two variables, total and right sample counts.
       Auc need four variables, `true_positives`,
         `true_negatives`, `false_positives` and `false_negatives`. So every evaluator should create its needed variables and append to main_program

       The initialization of Evaluator should be responsible for:
       create metric states and append to the main_program
       """ 
       pass

    def _update_ops(self, input, label, **kwargs)
       """
       Add mini-batch evaluator caculate operators to the main_program.
       Add increment operator to accumulate the metric states.
       """
    

    def reset(self, executor, reset_program=None):
      """
      Reset metric states at the begin of each pass/user specified batch number.
      Execute the reset_program to reset the states.
      """
      

    def eval(self, executor, eval_program=None):
      """
      Merge the mini-batch statistics to form the evaluation result for multiple mini-batches.
      Execute the eval_program and return the result.
      """
      return eval_result
```
"add evaluator design doc" 8 years ago			`## Evaluator Design`

Polish the Evaliuator design doc (#6195) 7 years ago			`### Problem Statement`
"add evaluator design doc" 8 years ago
Polish the Evaliuator design doc (#6195) 7 years ago			`During training or inference, we provide an evaluation function to measure the model performance, for example, accuracy, precision, etc. In the operator based framework design, the data passes through the network pipeline batch by batch. As a result, inside the operator, we only calculate the metrics for one minibatch. Thus, we need to provide a mechanism to calculate the metrics for each N pass/batch the user wants.`
"add evaluator design doc" 8 years ago
			`### Evaluator Design`
Polish the Evaliuator design doc (#6195) 7 years ago			`Currently, every operation is expressed in the graph. We divide the evaluator process into three steps.`
"add evaluator design doc" 8 years ago
"add accuracy " 8 years ago			`1. Initialize the metric state and add it into the block.`
"add evaluator design doc" 8 years ago
Polish the Evaliuator design doc (#6195) 7 years ago			`2. Calculate the concerned metrics for every mini-batch. The single evaluator operator is only responsible for calculating the necessary statistics for one mini-batch. For example, the accuracy operator only calculates the accuracy for a minibatch data if run once.`
"add evaluator design doc" 8 years ago

			`3. Merge the mini-batch statistics to form the evaluation result for multiple mini-batches. When it comes to distributed training/Multi-GPU training, aggregate the value from different devices.`

			`### Implementation`
Polish the Evaliuator design doc (#6195) 7 years ago			`This design is shown in the Python API.`
			`Each metric operator needs to caculate the metric statistic and return the batch-aware states. Python side is responsible for accumulating the states for each pass.`
"add evaluator design doc" 8 years ago
"polish document" 7 years ago
"add evaluator design doc" 8 years ago			```python
			`class Evaluator(object):`
			`"""`
"add accuracy " 8 years ago			`Evaluator Base class.`
"add evaluator design doc" 8 years ago			`"""`
"polish document" 7 years ago			`def __init__(self, name, **kwargs):`
"add evaluator design doc" 8 years ago			`"""`
"add accuracy " 8 years ago			`Different evaluator may has different metric states. E.g, Accuracy need two variables, total and right sample counts.`
			Auc need four variables, `true_positives`,
"polish document" 7 years ago			`true_negatives`, `false_positives` and `false_negatives`. So every evaluator should create its needed variables and append to main_program
"add accuracy " 8 years ago
			`The initialization of Evaluator should be responsible for:`
			`create metric states and append to the main_program`
"add evaluator design doc" 8 years ago			`"""`
			`pass`

"polish document" 7 years ago			`def _update_ops(self, input, label, **kwargs)`
			`"""`
			`Add mini-batch evaluator caculate operators to the main_program.`
			`Add increment operator to accumulate the metric states.`
			`"""`


"fix based on comments" 7 years ago			`def reset(self, executor, reset_program=None):`
"add evaluator design doc" 8 years ago			`"""`
"polish document" 7 years ago			`Reset metric states at the begin of each pass/user specified batch number.`
			`Execute the reset_program to reset the states.`
"add evaluator design doc" 8 years ago			`"""`
"polish document" 7 years ago
"add evaluator design doc" 8 years ago
"fix based on comments" 7 years ago			`def eval(self, executor, eval_program=None):`
"add evaluator design doc" 8 years ago			`"""`
"add accuracy " 8 years ago			`Merge the mini-batch statistics to form the evaluation result for multiple mini-batches.`
"polish document" 7 years ago			`Execute the eval_program and return the result.`
"add evaluator design doc" 8 years ago			`"""`
"polish document" 7 years ago			`return eval_result`
"add evaluator design doc" 8 years ago			```