|
|
|
@ -641,9 +641,14 @@ class AdamOptimizer(Optimizer):
|
|
|
|
|
beta1 (float): The exponential decay rate for the 1st moment estimates.
|
|
|
|
|
beta2 (float): The exponential decay rate for the 2nd moment estimates.
|
|
|
|
|
epsilon (float): a small float value for numerical stability.
|
|
|
|
|
regularization: A Regularizer, such as
|
|
|
|
|
fluid.regularizer.L2DecayRegularizer.
|
|
|
|
|
regularization: A Regularizer, such as fluid.regularizer.L2DecayRegularizer.
|
|
|
|
|
name: A optional name prefix.
|
|
|
|
|
lazy_mode(bool: false): The official Adam algorithm has two moving-average accumulators
|
|
|
|
|
the accumulators are updated at every step. Every element of the two moving-average is updated
|
|
|
|
|
in both dense mode and sparse mode. If the size of parameter is very large, then the update
|
|
|
|
|
may be very slow. The lazy mode only update the element that has gradient is the current
|
|
|
|
|
mini-batch, so it will be much more faster. But this mode has different semantics with the
|
|
|
|
|
original Adam algorithm and may lead to different result.
|
|
|
|
|
|
|
|
|
|
Examples:
|
|
|
|
|
.. code-block:: python
|
|
|
|
|