Completely Derandomized Self-Adaptation in Evolution Strategies
Nikolaus Hansen and Andreas Ostermeier
http://www.lri.fr/~hansen
This paper puts forward two useful methods for self-adaptation
of the mutation distribution---the concepts of 'derandomization'
and 'cumulation'. Principle shortcomings of the concept of
*mutative* strategy parameter control and two levels of
derandomization are reviewed. Basic demands on the
self-adaptation of *arbitrary* (normal) mutation distributions
are developed. Applying arbitrary normal mutation distributions
is equivalent to applying a general linear problem encoding.
The underlying objective of mutative strategy parameter control
is roughly to favor previously selected mutation steps in
future. If this objective is pursued *rigorously*, a completely
derandomized self-adaptation scheme results, which adapts
arbitrary normal mutation distributions. This scheme, called
'covariance matrix adaptation' (CMA), meets the previously
stated demands. It can still be considerably improved by
cumulation---utilizing an 'evolution path' rather than single
search steps.
Simulations on various test functions reveal local and global
search properties of the evolution strategy with and without
covariance matrix adaptation. Their performances are comparable
only on perfectly scaled functions. On badly scaled,
non-separable functions usually a speed up factor of several
orders of magnitude is observed. On moderately mis-scaled
functions a speed up factor of three to ten can be expected.
In Evolutionary Computation, 9(2), pp. 159-195 (2001).
http://mitpress.mit.edu/EVCO
ERRATA:
Section 3, footnote 9: "We use the expectation of n/k ln(...)"
must be "We use the expectation of n/(2k) ln(...)".
(same error) Section 3, same page, description Performance:
"...where ln(...)..." must be "...where 1/2 ln(...)...".
ADDITIONAL COMMENTS:
Section 5.1, Table 1: The default parameter setting for w_i makes
sense only, if mu<=lambda/2. Otherwise weights become negative which
is unjustified in our context and was excluded by definition. You
may choose w_i=ln(max(lambda/2,mu)+1/2)-ln(i) instead. This setting
is applicable to any mu<=lambda.
Section 5.1, discussion of parameter c_c: "..., we suspect c_s <= c_c
<= 1 to be a sensible choice for c_c." In fact, up to now we found
no particular evidence against choosing c_c smaller than c_s.
In any case it is necessary to choose c_c > c_cov.
Section 6, Table 3: the input vector argument to functions 2-12 on
the left side of the equations should be rather x, instead of y.