Natural Evolution Strategies
(NES[3][9][10][17][19][21][27])
are well-grounded, evolution-strategy inspired black-box optimization algorithms, which
instead of maintaining a population of search points, iteratively update a **search
distribution**.

The parameterized search distribution
is used to produce a batch of search points, and the fitness function is evaluated
at each such point. The distribution parameters (which include strategy parameters)
allow the algorithm to adaptively capture the (local) structure of the fitness function.
For example, in the case of a Gaussian distribution, this comprises the mean and
the covariance matrix. From the samples, NES estimates a search gradient on the
parameters towards higher expected fitness. NES then performs a gradient ascent
step along the **natural gradient**, a second-order method which, unlike the
plain gradient, renormalizes the update with respect to uncertainty. This step is
crucial, since it prevents oscillations, premature convergence, and undesired
effects stemming from a given parameterization. The entire process reiterates
until a stopping criterion is met.

All members of the NES family operate based on the same principles.
They differ in the type of distribution and the gradient approximation method used.
Different search spaces require different search distributions; for
example, in low dimensionality it can be highly beneficial to model the full
covariance matrix (**xNES**). In high dimensions
a more scalable alternative is to limit the covariance to the diagonal only (**SNES**).
In addition, highly multi-modal search spaces may benefit from more heavy-tailed
distributions. A last distinction arises between distributions where we can analytically
compute the natural gradient, and more general distributions where we need
to estimate it from samples.

xNES | SNES | |
---|---|---|

When to use? | Small-to-medium problem dimension (<100), especially powerful with highly correlated parameters. | Medium-to-large problem dimension, with approximately separable parameters. |

Benchmarking results | BBOB 2012 [Pdf] | BBOB 2012 [Pdf] |

Pseudo-code | [Pdf] | [Pdf] |

Python | xnes.py by Tom Schaul (also xnes.py in PyBrain) | snes.py by Tom Schaul (also snes.py in PyBrain) |

Matlab | xnes.m by Sun Yi | snes.m by Matt Luciw |

Mathematica | nes.nb by Jan Koutník | |

C++ | xnes.cpp by Tobias Glasmachers | |

Lua | In the torch library (Clément Farabet) |

**Cauchy-NES**([Pseudocode]), using a multivariate Cauchy distribution that can improve the search for deceptive and multimodal functions.- Radial NES (
**rNES**, [Pseudocode]; [Mathematica] by Giuseppe Cuccu), the simplest variant that adapts a single parameter. Lends itself to theoretical studies such as [27]. - xNES with adaptation sampling (
**xNES-as**, [Pseudocode]), a variant that adapts the learning rate online, which often improves performance (see this comparison). **(1+1)-NES**( [Pseudocode]), a hill-climber variant.- Multi-objective extension (
**MO-NES**, [Pseudocode]) for multi-objective optimization. - xNES with a rank-one covariance matrix approximation (
**R1-NES**, [Pseudocode]; [Matlab] by Sun Yi). - Distance-weighted xNES, with some substantial speed improvements
(
**DX-NES**, [Matlab] by Nobusumi Fukushima), for details see*N. Fukushima, Y. Nagata, S. Kobayashi and I. Ono, presented at CEC 2011*.

If not mentioned otherwise, all this code is open-source, free to use and licensed under the BSD software license.