Completely Derandomized Self-Adaptation in Evolution Strategies Nikolaus Hansen and Andreas Ostermeier http://www.lri.fr/~hansen This paper puts forward two useful methods for self-adaptation of the mutation distribution---the concepts of 'derandomization' and 'cumulation'. Principle shortcomings of the concept of *mutative* strategy parameter control and two levels of derandomization are reviewed. Basic demands on the self-adaptation of *arbitrary* (normal) mutation distributions are developed. Applying arbitrary normal mutation distributions is equivalent to applying a general linear problem encoding. The underlying objective of mutative strategy parameter control is roughly to favor previously selected mutation steps in future. If this objective is pursued *rigorously*, a completely derandomized self-adaptation scheme results, which adapts arbitrary normal mutation distributions. This scheme, called 'covariance matrix adaptation' (CMA), meets the previously stated demands. It can still be considerably improved by cumulation---utilizing an 'evolution path' rather than single search steps. Simulations on various test functions reveal local and global search properties of the evolution strategy with and without covariance matrix adaptation. Their performances are comparable only on perfectly scaled functions. On badly scaled, non-separable functions usually a speed up factor of several orders of magnitude is observed. On moderately mis-scaled functions a speed up factor of three to ten can be expected. In Evolutionary Computation, 9(2), pp. 159-195 (2001). http://mitpress.mit.edu/EVCO ERRATA: Section 3, footnote 9: "We use the expectation of n/k ln(...)" must be "We use the expectation of n/(2k) ln(...)". (same error) Section 3, same page, description Performance: "...where ln(...)..." must be "...where 1/2 ln(...)...". ADDITIONAL COMMENTS: Section 5.1, Table 1: The default parameter setting for w_i makes sense only, if mu<=lambda/2. Otherwise weights become negative which is unjustified in our context and was excluded by definition. You may choose w_i=ln(max(lambda/2,mu)+1/2)-ln(i) instead. This setting is applicable to any mu<=lambda. Section 5.1, discussion of parameter c_c: "..., we suspect c_s <= c_c <= 1 to be a sensible choice for c_c." In fact, up to now we found no particular evidence against choosing c_c smaller than c_s. In any case it is necessary to choose c_c > c_cov. Section 6, Table 3: the input vector argument to functions 2-12 on the left side of the equations should be rather x, instead of y.