UP
Revisited: The Article Covariance Matrix Adaptation Revisited  the CMSA Evolution Strategy
The paper Covariance Matrix Adaptation Revisited  the CMSA Evolution Strategy by Beyer and Sendhoff [1] proposes the socalled covariance matrix selfadaptation evolution strategy, CMSAES. Compared to CMAES, cumulative stepsize adaptation of the global stepsize is replaced by selfadaptation in CMSAES. Furthermore the internal parameter setting is modified. More specifically, the CMSAES algorithm deviates from default CMAES [3,4] in the following way:
 Stepsize σ is adapted via mutative stepsize control (i.e. "pure" selfadaptation) instead of cumulative stepsize adaptation (i.e. derandomized and nonlocal selfadaptation, AKA cumulative path length control).
 The rankone update of the covariance matrix is omitted. In the default CMAES this can be accomplished setting the learning parameter for the rankone update of the covariance matrix to zero.
(The parameter is denoted as c_{1} in [4], see Appendix A, or as α_{cov}×c_{cov} in [2], or as c_{cov} / μ_{cov} in [3], or as 1 / (τ_{c} μ_{eff}) in [1]. In the notation of [2] this means α_{cov} = 0, in the notation of [3] this means μ_{cov} = ∞, in the notation of [1] the option is not available because μ_{eff} cannot be freely chosen, however setting τ_{p} = 1 comes close).
 The recombination weights are set to 1/μ, with μ ≈ λ/4, like in [2]. The current default recombination weights in CMAES are monotonically decreasing and positive up to μ ≈ λ/2.
In both cases μ_{eff} ≈ λ/4.
The two latter points are simply a deviation from the current default parameter setting of CMAES, where the third point is likely to be of minor relevance.
In [1], experimental results compare CMSAES with CMAES with population sizes λ = 8;4n;4n^{2} on the four functions

Sphere

Schwefels ellipsoid

Rosenbrock

Rastrigin (only for λ = 4n^{2})
The following relevant effects on the number of function evaluations to reach a small target function value can be observed.

With λ = 8 and λ = 4n and n > 10, CMSA becomes worse on the Rosenbrock function by a factor of approximately ten.

With λ = 4n^{2} and n > 10, CMSA becomes better than CMA on all functions but the Rosenbrock function. The effect increases with increasing dimension n and is about a factor of 15 in 80D (with population size 25600).
The presented results suggest that on functions, where covariance matrix adaptation does not play a significant role, CMSAES remarkably outperforms CMAES when the population size λ is as large as at least 1000 and λ ≫ n > 10. However, this conclusion does not hold.
Discussion
Change of the algorithm

Mutative stepsize control (σselfadaptation), as used in CMSAES, yields (almost) invariably a smaller stepsize than cumulative stepsize adaptation as used in CMAES. This means we expect to improve on rather simple functions, where fast convergence is the major concern, however to become worse on more difficult functions, where too small stepsizes are a common problem. For very large population size λ, the effect of cumulative stepsize adaptation in CMAES fades out (the damping has a component roughly proportional to (μ_{eff}/n)^{1/2}), however taking the sum of a large number of steps can counteract this effect and significant stepsize increments can still be observed.

Changing the default recombination weights is expected to have a small effect. The default setting performs typically about 15% faster.

Omitting the rankone update (setting c_{1} = 0) has a severe effect with small population size: it impairs the scaling (of the number of needed fevaluations) of CMAES on the class of cigar and smooth ridgelike functions from linear with the search space dimension to subquadratic (see e.g. Principled design..., Sect 7.2). Furthermore, on functions where the full covariance matrix needs to be adapted, the typical loss factor for setting c_{1} = 0 is 2 to 4. For very large population size the relevance of the rankone update is supposed to diminish, because by default c_{1} → 0 for λ → ∞ (however taking the sum of a larger number of steps might counteract this effect).
Selection of test functions
With small λ, covariance matrix adaptation has a significant impact on the performance only on the Rosenbrock function, in that it speeds up an isotropic ES by a factor of about 10 to 30 (here the effect is more pronounced in smaller dimensions). On Schwefels ellipsoid, the known speed up is about a factor of two. This effect is small compared to the effect on actually illconditioned functions: the speed up on the "classic illconditioned ellipsoid" (with condition number 10^{6} and loguniform factors) is between a factor of 100 and 1000, on the cigar function the factor is 50000 (sic!) and independent of the dimension.
In order to solve the Rastrigin function, a large population size is needed. On the Rastrigin function, the distribution shape does not need to be adapted, but with a large population size covariance matrix adaptation becomes increasingly important for the convergence process.
The shaping of the sample distribution is on the tested functions of remarkable benefit only on the Rosenbrock function.
Results
The following tables show number of function evaluations to reach the target value 10^{10} for dimension n = 80 taken from [1] and from [2] (HybrCMA which entertains equal recombination weights as CMSAES and an odd setting of the parameters for cumulative stepsize adaptation) and simulation results of CMAES with default settings according to [3] (where the defect in [2] was addressed by adjusting stepsize parameters for large population size and by introducing H_{σ} that indirectly modulates c_{1}) and [4]
(with the significant adjustment of c_{c} = (4 + μ_{eff}/n) / (4 + n + 2×μ_{eff}/n), Hansen 2009, for large population size, i.e. large μ_{eff}, like it was done for c_{σ} in [3]).
Table 1. Population Size λ = 4n = 320
function evaluations [× 10000] to reach the target function value 10^{10} for n = 80

sphere  Schwefel  Rosenbrock 
CMSA from [1] 
6 
37 
1100 
CMA from [1] 
10 
19 
150 
HybrCMA from [2] 
20 
N/A 
200 
CMA as in [3] 
10 
19 
130 
CMA as in [4] 
10 
19 
130 
Table 2. Population Size λ = 4n^{2} = 25600
function evaluations [× 25600] to reach the target function value 10^{10} for n = 80

sphere  Schwefel  Rastrigin  Rosenbrock 
CMSA from [1] 
90 
100 
180 
6000 
CMA from [1]

1400 
1500 
3000 
8000 
HybrCMA from [2] 
400 
N/A 
N/A 
6000 
CMA as in [3] 
205 
227 
324 
2250 
CMA as in [4] 
130 
136 
229 
1033 
Table 2 shows the setting (n = 80 and λ = 4n^{2}) where CMSAES seemed to excel the most (Table 1 serves as checkback for moderately large population size). The standard deviation for the simulated results CMA as in [3] and [4] are about ±1%. Results from the literature are read from graphs and can have an error of about ±10%.
The results for CMAES as presented in [1] for λ = 4n^{2} (in shallow red) cannot be reproduced, even with the odd parameter settings from [2].
Review of the source code used in [1] reveals that the results are due to a bug: the update of the sample distribution from the covariance matrix was postponed λ times longer than advised in the original algorithm.
On the sphere, Schwefels function and Rastrigins function, performance with very large population size is to a large extend determined by approaching the optimum fast (on the sphere function this is virtually the only concern). Selfadaptation of the global stepsize in CMSAES reduces the stepsize faster than cumulative stepsize adaptation in CMAES, however the performance difference hardly exceeds a factor of two. In CMAES, for very large population size, the major part of the variance reduction is due to covariance matrix adaptation, and not due to stepsize control (because d_{σ} is large). This also explains, why the bug strongly effects the performance on the sphere function.
The parameter correction for very large population size in [4] seems to be reasonably effective.
Why CMAES is considerably faster on the Rosenbrock function even for very large population size is not immediately obvious. Additional experiments reveal that the rankone update in [4] is irrelevant for this result (in [3] it leads to the observed impairment) and weighted recombination has only a minor effect (indeed about 15%). Consequently, replacing cumulative stepsize adaptation with σselfadaptation must be the reason for the performance decline of CMSA with λ = 4n^{2} on the Rosenbrock function. Without stepsize adaptation, CMAES with λ = 4n^{2} fails to solve the Rosenbrock function (it still works with λ ≤ 4n). We find that also with λ = 4n the stepsize adaptation is the main reason for the performance decline of CMSAES on the Rosenbrock function.
Conclusion
The simulation results in [1], leading to the conclusion that CMSAES performs superior with large population size, are due to a bug in the source code used for CMAES. When comparing CMSAES with a correct implementation of CMAES or results from the literature, the most pronounced advantage does not exceed a factor of two. This advantage is obtained when a rapid decrease of the stepsize is desirable. The advantage can be explained by known algorithm properties: selfadaptation of the global stepsize as used in CMSAES adapts smaller stepsizes and decreases the stepsize more rapidly.
When adaptation of the covariance matrix is decisive, CMAES outperforms CMSAES, where the difference is less pronounced with larger population size and with smaller dimension. Above, we observed a factor of up to 8 on the Rosenbrock function. With population size λ ≥ 4n, selfadaptation of the global stepsize has been identified as the main reason for the performance decline of CMSAES on the Rosenbrock function. Here, unlike on the sphere function, the smaller adapted stepsize becomes a disadvantage.
References
[1] Beyer, H.G., B. Sendhoff (2008). Covariance Matrix Adaptation Revisited  the CMSA Evolution Strategy. In Rudolph et al. (eds.) Parallel
Problem Solving from Nature, PPSN X, Proceedings, pp. 123132,
Springer, (abstract and pdf).
[2] Hansen, N., S.D. Müller and
P. Koumoutsakos (2003). Reducing the Time Complexity of the
Derandomized Evolution Strategy with Covariance Matrix Adaptation
(CMAES). Evolutionary Computation, 11(1), pp. 118,
(abstract & errata,
paper
in pdf).
[3] Hansen, N. and S. Kern (2004). Evaluating the CMA Evolution
Strategy on Multimodal Test Functions. In Eighth International
Conference on Parallel Problem Solving from Nature, PPSN VIII,
Proceedings, pp. 282291, Springer, (abstract and errata, paper in pdf, paper in ps).
[4] Hansen, N. (2011). The CMA evolution strategy: a tutorial. Continuously updated technical report, June 28, 2011.
UP
last change $Date: 20140614 13:19:16 +0200 (Sat, 14 Jun 2014) $