Purpose This study attempts to explore the performance of various stacking models by applying into two kinds of data, which have very different characteristic.
Methods The Base model includes decision tree, random forest, Naive Bayes, logistic regression, whereas support vector machine is adopted as the Meta model. Two kinds of data are ‘hmeq’ data and ‘bankrupt’ data. The performance is measured by accuracy, sensitivity, specificity, false positive ratio, and false negative ratio.
Results For ‘hmeq’ data which are very well refined, random forest results super performance that all stacking models do not exceed. For ‘bankrupt’ data which are raw, containing much noise, stacking models perform, in overall, better than individual base models. Stacking model 5 outperforms, in particular.
Conclusion The empirical analysis results indicate that when data are highly refined and have a limited number of input variables, stacking approach seems not a good strategy. Choosing a highly hyperparameter-trainable model would be a good choice. On the other hand, when data are almost raw with many input variables, stacking can perform better than individual models by integrating individual model’s ability in reading underlying patterns.