UP

Revisited: The Article Covariance Matrix Adaptation Revisited - the CMSA Evolution Strategy

The paper Covariance Matrix Adaptation Revisited - the CMSA Evolution Strategy by Beyer and Sendhoff [1] proposes the so-called covariance matrix self-adaptation evolution strategy, CMSA-ES. Compared to CMA-ES, cumulative step-size adaptation of the global step-size is replaced by self-adaptation in CMSA-ES. Furthermore the internal parameter setting is modified. More specifically, the CMSA-ES algorithm deviates from default CMA-ES [3,4] in the following way:
  1. Step-size σ is adapted via mutative step-size control (i.e. "pure" self-adaptation) instead of cumulative step-size adaptation (i.e. derandomized and non-local self-adaptation, AKA cumulative path length control).
  2. The rank-one update of the covariance matrix is omitted. In the default CMA-ES this can be accomplished setting the learning parameter for the rank-one update of the covariance matrix to zero. (The parameter is denoted as c1 in [4], see Appendix A, or as αcov×ccov in [2], or as ccov / μcov in [3], or as 1 / (τc μeff) in [1]. In the notation of [2] this means αcov = 0, in the notation of [3] this means μcov = ∞, in the notation of [1] the option is not available because μeff cannot be freely chosen, however setting τp = 1 comes close).
  3. The recombination weights are set to 1/μ, with μ ≈ λ/4, like in [2]. The current default recombination weights in CMA-ES are monotonically decreasing and positive up to μ ≈ λ/2. In both cases μeff ≈ λ/4.
The two latter points are simply a deviation from the current default parameter setting of CMA-ES, where the third point is likely to be of minor relevance.

In [1], experimental results compare CMSA-ES with CMA-ES with population sizes λ = 8;4n;4n2 on the four functions

  1. Sphere
  2. Schwefels ellipsoid
  3. Rosenbrock
  4. Rastrigin (only for λ = 4n2)
The following relevant effects on the number of function evaluations to reach a small target function value can be observed. The presented results suggest that on functions, where covariance matrix adaptation does not play a significant role, CMSA-ES remarkably outperforms CMA-ES when the population size λ is as large as at least 1000 and λ ≫ n > 10. However, this conclusion does not hold.

Discussion

Change of the algorithm

  1. Mutative step-size control (σ-self-adaptation), as used in CMSA-ES, yields (almost) invariably a smaller step-size than cumulative step-size adaptation as used in CMA-ES. This means we expect to improve on rather simple functions, where fast convergence is the major concern, however to become worse on more difficult functions, where too small step-sizes are a common problem. For very large population size λ, the effect of cumulative step-size adaptation in CMA-ES fades out (the damping has a component roughly proportional to (μeff/n)1/2), however taking the sum of a large number of steps can counteract this effect and significant step-size increments can still be observed.
  2. Changing the default recombination weights is expected to have a small effect. The default setting performs typically about 15% faster.
  3. Omitting the rank-one update (setting c1 = 0) has a severe effect with small population size: it impairs the scaling (of the number of needed f-evaluations) of CMA-ES on the class of cigar- and smooth ridge-like functions from linear with the search space dimension to sub-quadratic (see e.g. Principled design..., Sect 7.2). Furthermore, on functions where the full covariance matrix needs to be adapted, the typical loss factor for setting c1 = 0 is 2 to 4. For very large population size the relevance of the rank-one update is supposed to diminish, because by default c1 → 0 for λ → ∞ (however taking the sum of a larger number of steps might counteract this effect).

Selection of test functions

With small λ, covariance matrix adaptation has a significant impact on the performance only on the Rosenbrock function, in that it speeds up an isotropic ES by a factor of about 10 to 30 (here the effect is more pronounced in smaller dimensions). On Schwefels ellipsoid, the known speed up is about a factor of two. This effect is small compared to the effect on actually ill-conditioned functions: the speed up on the "classic ill-conditioned ellipsoid" (with condition number 106 and log-uniform factors) is between a factor of 100 and 1000, on the cigar function the factor is 50000 (sic!) and independent of the dimension.

In order to solve the Rastrigin function, a large population size is needed. On the Rastrigin function, the distribution shape does not need to be adapted, but with a large population size covariance matrix adaptation becomes increasingly important for the convergence process.

The shaping of the sample distribution is on the tested functions of remarkable benefit only on the Rosenbrock function.

Results

The following tables show number of function evaluations to reach the target value 10-10 for dimension n = 80 taken from [1] and from [2] (Hybr-CMA which entertains equal recombination weights as CMSA-ES and an odd setting of the parameters for cumulative step-size adaptation) and simulation results of CMA-ES with default settings according to [3] (where the defect in [2] was addressed by adjusting step-size parameters for large population size and by introducing Hσ that indirectly modulates c1) and [4] (with the significant adjustment of cc = (4 + μeff/n) / (4 + n + 2×μeff/n), Hansen 2009, for large population size, i.e. large μeff, like it was done for cσ in [3]).

Table 1. Population Size λ = 4n = 320
function evaluations [× 10000] to reach the target function value 10-10 for n = 80
sphere Schwefel Rosenbrock
CMSA from [1] 6 37 1100
CMA from [1] 10 19 150
Hybr-CMA from [2] 20 N/A 200
CMA as in [3] 10 19 130
CMA as in [4] 10 19 130

Table 2. Population Size λ = 4n2 = 25600
function evaluations [× 25600] to reach the target function value 10-10 for n = 80
sphere Schwefel Rastrigin Rosenbrock
CMSA from [1] 90 100 180 6000
CMA from [1] 1400 1500 3000 8000
Hybr-CMA from [2] 400 N/A N/A 6000
CMA as in [3] 205 227 324 2250
CMA as in [4] 130 136 229 1033

Table 2 shows the setting (n = 80 and λ = 4n2) where CMSA-ES seemed to excel the most (Table 1 serves as check-back for moderately large population size). The standard deviation for the simulated results CMA as in [3] and [4] are about ±1%. Results from the literature are read from graphs and can have an error of about ±10%.

The results for CMA-ES as presented in [1] for λ = 4n2 (in shallow red) cannot be reproduced, even with the odd parameter settings from [2]. Review of the source code used in [1] reveals that the results are due to a bug: the update of the sample distribution from the covariance matrix was postponed λ times longer than advised in the original algorithm.

On the sphere, Schwefels function and Rastrigins function, performance with very large population size is to a large extend determined by approaching the optimum fast (on the sphere function this is virtually the only concern). Self-adaptation of the global step-size in CMSA-ES reduces the step-size faster than cumulative step-size adaptation in CMA-ES, however the performance difference hardly exceeds a factor of two. In CMA-ES, for very large population size, the major part of the variance reduction is due to covariance matrix adaptation, and not due to step-size control (because dσ is large). This also explains, why the bug strongly effects the performance on the sphere function.

The parameter correction for very large population size in [4] seems to be reasonably effective.

Why CMA-ES is considerably faster on the Rosenbrock function even for very large population size is not immediately obvious. Additional experiments reveal that the rank-one update in [4] is irrelevant for this result (in [3] it leads to the observed impairment) and weighted recombination has only a minor effect (indeed about 15%). Consequently, replacing cumulative step-size adaptation with σ-self-adaptation must be the reason for the performance decline of CMSA with λ = 4n2 on the Rosenbrock function. Without step-size adaptation, CMA-ES with λ = 4n2 fails to solve the Rosenbrock function (it still works with λ ≤ 4n). We find that also with λ = 4n the step-size adaptation is the main reason for the performance decline of CMSA-ES on the Rosenbrock function.

Conclusion

The simulation results in [1], leading to the conclusion that CMSA-ES performs superior with large population size, are due to a bug in the source code used for CMA-ES. When comparing CMSA-ES with a correct implementation of CMA-ES or results from the literature, the most pronounced advantage does not exceed a factor of two. This advantage is obtained when a rapid decrease of the step-size is desirable. The advantage can be explained by known algorithm properties: self-adaptation of the global step-size as used in CMSA-ES adapts smaller step-sizes and decreases the step-size more rapidly.

When adaptation of the covariance matrix is decisive, CMA-ES outperforms CMSA-ES, where the difference is less pronounced with larger population size and with smaller dimension. Above, we observed a factor of up to 8 on the Rosenbrock function. With population size λ ≥ 4n, self-adaptation of the global step-size has been identified as the main reason for the performance decline of CMSA-ES on the Rosenbrock function. Here, unlike on the sphere function, the smaller adapted step-size becomes a disadvantage.

References

[1] Beyer, H.-G., B. Sendhoff (2008). Covariance Matrix Adaptation Revisited - the CMSA Evolution Strategy. In Rudolph et al. (eds.) Parallel Problem Solving from Nature, PPSN X, Proceedings, pp. 123-132, Springer, (abstract and pdf).

[2] Hansen, N., S.D. Müller and P. Koumoutsakos (2003). Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES). Evolutionary Computation, 11(1), pp. 1-18, (abstract & errata, paper in pdf).

[3] Hansen, N. and S. Kern (2004). Evaluating the CMA Evolution Strategy on Multimodal Test Functions. In Eighth International Conference on Parallel Problem Solving from Nature, PPSN VIII, Proceedings, pp. 282-291, Springer, (abstract and errata, paper in pdf, paper in ps).

[4] Hansen, N. (2011). The CMA evolution strategy: a tutorial. Continuously updated technical report, June 28, 2011.


UP
last change $Date: 2014-06-14 13:19:16 +0200 (Sat, 14 Jun 2014) $