Completely Derandomized Self-Adaptation in Evolution Strategies
            Nikolaus Hansen and Andreas Ostermeier
            http://www.lri.fr/~hansen

 This paper puts forward two useful methods for self-adaptation
 of the mutation distribution---the concepts of 'derandomization'
 and 'cumulation'. Principle shortcomings of the concept of
 *mutative* strategy parameter control and two levels of
 derandomization are reviewed.  Basic demands on the
 self-adaptation of *arbitrary* (normal) mutation distributions
 are developed. Applying arbitrary normal mutation distributions
 is equivalent to applying a general linear problem encoding.

 The underlying objective of mutative strategy parameter control
 is roughly to favor previously selected mutation steps in
 future. If this objective is pursued *rigorously*, a completely
 derandomized self-adaptation scheme results, which adapts
 arbitrary normal mutation distributions. This scheme, called
 'covariance matrix adaptation' (CMA), meets the previously
 stated demands. It can still be considerably improved by
 cumulation---utilizing an 'evolution path' rather than single
 search steps.

 Simulations on various test functions reveal local and global
 search properties of the evolution strategy with and without
 covariance matrix adaptation.  Their performances are comparable
 only on perfectly scaled functions. On badly scaled,
 non-separable functions usually a speed up factor of several
 orders of magnitude is observed. On moderately mis-scaled
 functions a speed up factor of three to ten can be expected.


In Evolutionary Computation, 9(2), pp. 159-195 (2001).
http://mitpress.mit.edu/EVCO


ERRATA:

Section 3, footnote 9: "We use the expectation of n/k ln(...)"
  must be "We use the expectation of n/(2k) ln(...)".

(same error) Section 3, same page, description Performance: 
  "...where ln(...)..." must be "...where 1/2 ln(...)...".

ADDITIONAL COMMENTS: 

Section 5.1, Table 1: The default parameter setting for w_i makes
  sense only, if mu<=lambda/2. Otherwise weights become negative which
  is unjustified in our context and was excluded by definition. You
  may choose w_i=ln(max(lambda/2,mu)+1/2)-ln(i) instead. This setting 
  is applicable to any mu<=lambda.

Section 5.1, discussion of parameter c_c: "..., we suspect c_s <= c_c
  <= 1 to be a sensible choice for c_c." In fact, up to now we found
  no particular evidence against choosing c_c smaller than c_s. 
  In any case it is necessary to choose c_c > c_cov. 

Section 6, Table 3: the input vector argument to functions 2-12 on 
  the left side of the equations should be rather x, instead of y.