Topics

Here we list a selection of possible topics, alongside with some basic literature that you can use to get familiar with the topic.

Basic literature

The following resources discuss model comparison (or certain issues) from a broad perspective

Bürkner, P.-C., Scholz, M., & Radev, S. T. (2023). Some models are useful, but how do we know which ones? Towards a unified bayesian model taxonomy. Statistic Surveys, 17, 216–310.

Breiman, L. (2001). Statistical modeling: The two cultures (with comments and a rejoinder by the author). Statistical Science, 16(3), 199–231.

Dubova, M., Chandramouli, S., Gigerenzer, G., Grünwald, P., Holmes, W., Lombrozo, T., Marelli, M., Musslick, S., Nicenboim, B., Ross, L., Shiffrin, R., White, M., Wagenmakers, E.-J., Bürkner, P.-C., & Sloman, S. J. (2024). Is Occam’s razor losing its edge? New perspectives on the principle of model parsimony. https://doi.org/10.31222/osf.io/bs5xe

McElreath, R. (2023). Statistical rethinking 2023 - 07 - Fitting Over & Under. Richard McElreath channel on YouTube. https://youtu.be/1VgYIsANQck?si=dsRgGkRlyCAcB0xG

Navarro, D. J. (2019). Between the devil and the deep blue sea: Tensions between scientific judgement and statistical model selection. Computational Brain & Behavior, 2(1), 28–34. https://doi.org/10.1007/s42113-018-0019-z

Shmueli, G. (2010). To Explain or to Predict? Statistical Science, 25(3), 289–310. https://doi.org/10.1214/10-STS330

You can find helpful review of different techniques in

Ding, J., Tarokh, V., & Yang, Y. (2018). Model selection techniques: An overview. IEEE Signal Processing Magazine, 35(6), 16–34. https://arxiv.org/abs/1810.09583

Hastie, T., Tibshirani, R., Friedman, J., Hastie, T., Tibshirani, R., & Friedman, J. (2009). Model assessment and selection. In The elements of statistical learning: data mining, inference, and prediction (pp. 219–259). Springer Series in Statistics. Springer, New York, NY. https://link.springer.com/chapter/10.1007/978-0-387-21606-5_7#preview

Individual methods

Below is a list of model comparison techniques. This list is by no means exhaustive; you are welcome to suggest other techniques that you would like to cover.

Cross validation

Bootstrapping

AIC

Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716–723.

Shmueli, G. (2010). To Explain or to Predict? Statistical Science, 25(3), 289–310. https://doi.org/10.1214/10-STS330

BIC

Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 461–464.

Shmueli, G. (2010). To Explain or to Predict? Statistical Science, 25(3), 289–310. https://doi.org/10.1214/10-STS330

Methods for time-series

Bergmeir, C., Hyndman, R. J., & Koo, B. (2018). A note on the validity of cross-validation for evaluating autoregressive time series prediction. Computational Statistics & Data Analysis, 120, 70–83.

Bürkner, P.-C., Gabry, J., & Vehtari, A. (2020). Approximate leave-future-out cross-validation for Bayesian time series models. Journal of Statistical Computation and Simulation, 90(14), 2499–2523.

Hyndman, R., & Athanasopoulos, G. (2021). Forecasting: Principles and practice (3rd ed.). OTexts, Melbourne, Australia. https://otexts.com/fpp3/

Warning

The following topics are somewhat advanced and will require more work (or prior familiarity with Bayesian statistics) to cover well. On the other hand, they are arguably also more fun.

DIC

Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & Van Der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society Series B: Statistical Methodology, 64(4), 583–639.

Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & Linde, A. (2014). The deviance information criterion: 12 years on. Journal of the Royal Statistical Society Series B: Statistical Methodology, 76(3), 485–493.

WAIC (+WBIC)

Watanabe, S., & Opper, M. (2010). Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. Journal of Machine Learning Research, 11(12).

Watanabe, S. (2013). A widely applicable Bayesian information criterion. The Journal of Machine Learning Research, 14(1), 867–897.

Vehtari, A., Gelman, A., & Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing, 27, 1413–1432.

PSIS-LOO

Vehtari, A., Gelman, A., & Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing, 27, 1413–1432.

Bayes factors

Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90(430), 773–795.

Berger, J., & Pericchi, L. (2014). Bayes factors. Wiley StatsRef: Statistics Reference Online, 1–14.

Amortized methods

Radev, S. T., D’Alessandro, M., Mertens, U. K., Voss, A., Köthe, U., & Bürkner, P.-C. (2021). Amortized bayesian model comparison with evidential deep learning. IEEE Transactions on Neural Networks and Learning Systems, 34(8), 4903–4917.

Elsemüller, L., Schnuerch, M., Bürkner, P.-C., & Radev, S. T. (2023). A deep learning method for comparing bayesian hierarchical models. arXiv:2301.11873. https://arxiv.org/abs/2301.11873

Minimum description length

Grünwald, P. (2000). Model selection based on minimum description length. Journal of Mathematical Psychology, 44(1), 133–152.

Grünwald, P. (2005). Minimum description length tutorial. Advances in Minimum Description Length: Theory and Applications, 5, 1–80.

Grünwald, P., & Roos, T. (2019). Minimum description length revisited. International Journal of Mathematics for Industry, 11(01), 1930001.

Shiffrin, R. M., Chandramouli, S. H., & Grünwald, P. D. (2016). Bayes factors, relations to minimum description length, and overlapping model classes. Journal of Mathematical Psychology, 72, 56–77.

Model averaging

Hinne, M., Gronau, Q. F., Bergh, D. van den, & Wagenmakers, E.-J. (2020). A conceptual introduction to Bayesian model averaging. Advances in Methods and Practices in Psychological Science, 3(2), 200–215.

Hoeting, J. A., Madigan, D., Raftery, A. E., & Volinsky, C. T. (1999). Bayesian model averaging: A tutorial. Statistical Science, 14(4), 382–417.

Model stacking

Yao, Y., Vehtari, A., Simpson, D., & Gelman, A. (2018). Using stacking to average Bayesian predictive distributions. Bayesian Analysis, 13(3), 917–1007. https://projecteuclid.org/euclid.ba/1516093227

Comparison of different approaches

Why?

This may seem like more work (“why cover two topics if I can cover one”), but just like when trying to understand behavior of statistical models, sometimes a comparison between different alternatives makes it easier to understand benefits and drawbacks of each individually.

Covering a wider topic also makes it easier to focus on the big picture and leaves less space for you to dive deep into a singular topic, allowing you to focus more on the practical aspects rather than the theoretical justification of the method.

Instead of covering a single method for model comparison, you can also pick two (or three) methods, and discuss how do they differ, in what context one should choose one over another, etc.

For example:

AIC vs BIC: What are the their respective goals? How do they differ in terms of penalizing model complexity? When should one use AIC and when BIC?
BIC vs Bayes factors: Is BIC really Bayesian? Why or why not? How does BIC relate to Bayes factors? When would they give different answers?
MDL vs Bayes factors: How does a minimum description length relate to Bayes factors?
Model averaging vs Model stacking vs Model selection: What are the differences between the different approaches? When would you use one over another approach?
AIC vs Cross-validation: How does AIC relate to LOO-CV? Can you use them interchangeably? Why or why not? When would you chose one or the other?

For literature, can start by looking into the resources that are listed under the relevant subsections in Section 2.