![]() |
|
||
Efficient Sampling and Handling of Variance in Tuning Data Mining ModelsPatrick Koch and Wolfgang Konen Department of Computer Science, Cologne University of Applied Sciences, 51643, Gummersbach, Germanypatrick.koch@fh-koeln.de wolfgang.konen@fh-koeln.de Abstract. Computational Intelligence (CI) provides good and robust working solutions for global optimization. CI is especially suited for solving difficult tasks in parameter optimization when the fitness function is noisy. Such situations and fitness landscapes frequently arise in real-world applications like Data Mining (DM). Unfortunately, parameter tuning in DM is computationally expensive and CI-based methods often require lots of function evaluations until they finally converge in good solutions. Earlier studies have shown that surrogate models can lead to a decrease of real function evaluations. However, each function evaluation remains time-consuming. In this paper we investigate if and how the fitness landscape of the parameter space changes, when only fewer observations are used for the model trainings during tuning. A representative study on seven DM tasks shows that the results are nevertheless competitive. On all these tasks, a fraction of 10-15% of the training data is sufficient. With this the computation time can be reduced by a factor of 6-10. Keywords: Machine learning, parameter tuning, sampling, SVM, sequential parameter optimization LNCS 7491, p. 195 ff. lncs@springer.com
|