PPSN XII - LNCS 7491-7492 CD-ROM

Efficient Sampling and Handling of Variance in Tuning Data Mining Models

Patrick Koch and Wolfgang Konen

Department of Computer Science, Cologne University of Applied Sciences, 51643, Gummersbach, Germany
patrick.koch@fh-koeln.de
wolfgang.konen@fh-koeln.de

Abstract. Computational Intelligence (CI) provides good and robust working solutions for global optimization. CI is especially suited for solving difficult tasks in parameter optimization when the fitness function is noisy. Such situations and fitness landscapes frequently arise in real-world applications like Data Mining (DM). Unfortunately, parameter tuning in DM is computationally expensive and CI-based methods often require lots of function evaluations until they finally converge in good solutions. Earlier studies have shown that surrogate models can lead to a decrease of real function evaluations. However, each function evaluation remains time-consuming. In this paper we investigate if and how the fitness landscape of the parameter space changes, when only fewer observations are used for the model trainings during tuning. A representative study on seven DM tasks shows that the results are nevertheless competitive. On all these tasks, a fraction of 10-15% of the training data is sufficient. With this the computation time can be reduced by a factor of 6-10.

Keywords: Machine learning, parameter tuning, sampling, SVM, sequential parameter optimization

LNCS 7491, p. 195 ff.

Full article in PDF | BibTeX