@ -51,32 +51,34 @@ class SoftmaxWithCrossEntropyOpMaker
" the given labels as soft labels. " )
. SetDefault ( false ) ;
AddComment ( R " DOC(
Cross entropy loss with softmax are used as the output layer extensively . This
Softmax With Cross Entropy Operator .
Cross entropy loss with softmax is used as the output layer extensively . This
operator computes the softmax normalized values for each row of the input
tensor , after which cross - entropy loss is then computed . This provides a more
tensor , after which cross - entropy loss is computed. This provides a more
numerically stable gradient .
Because this operators performs a softmax on logits internally , it expects
unscaled logits . Please do not call this op with the output of softmax operator ,
which will produce incorrect results .
Because this operator performs a softmax on logits internally , it expects
unscaled logits . This operator should not be used with the output of
softmax operator since that would produce incorrect results .
When the attribute softLabel is set false , this operators expects mutually
exclusive hard labels , each sample in a batch is in exactly one class with
probabilit ies 1. Each sample in the batch with one and only on e label .
exclusive hard labels , each sample in a batch is in exactly one class with a
probabilit y of 1.0 . Each sample in the batch will have a singl e label .
Equation :
The equation is as follows :
1 ) h ard label ( one - hot label )
1 ) H ard label ( one - hot label , so every sample has exactly one class )
Loss_j = \ f $ - \ text { Logit } _ { Label_j } +
$ $ Loss_j = \ f $ - \ text { Logit } _ { Label_j } +
\ log \ left ( \ sum_ { i = 0 } ^ { K } \ exp ( \ text { Logit } _i ) \ right ) ,
j = 1 , . . . , K $ \ f
j = 1 , . . . , K $ \ f $ $
2 ) soft label ( a distribution over all classes )
2 ) Soft label ( each sample can have a distribution over all classes )
Loss_j = \ f $ - \ sum_ { i = 0 } ^ { K } \ text { Label } _i \ left ( \ text { Logit } _i -
$ $ Loss_j = \ f $ - \ sum_ { i = 0 } ^ { K } \ text { Label } _i \ left ( \ text { Logit } _i -
\ log \ left ( \ sum_ { i = 0 } ^ { K } \ exp ( \ text { Logit } _i ) \ right ) \ right ) ,
j = 1 , . . . , K $ \ f
j = 1 , . . . , K $ \ f $ $
) DOC " );
}