Model Comparison Seminar

TU Dortmund, Summer semester 2024

Today

  1. A change to the schedule - see Course schedule
  2. Discussion of Hastie et al. (2009)
  3. Plan for the next week:
    1. Submit topic preferences
    2. Read Shmueli (2010)

Discussion of Hastie et al. (2009)

Hastie et al. (2009)

  • Who has not read the paper?
  • Who has read the paper?

Train and test error

  • How does the paper define generalization?
  • How is the Training error defined?
    • What does it do as models are increasingly more complex?
  • How is the Test error defined?
    • What does it do as models are increasingly more complex?

Goals and data split

  • What are the two goals when we examine the train and test error of models?
  • What are the purposes of a training set, validation set, and a test set?
    • When is it appropriate to use such splits?

Optimism

  • Why is the training error typically less than the test error?
  • How does the article define an in-sample error?
  • What is an optimism?
    • What does it do when we increase the complexity of the model?
    • What does it do when we increase the sample size?

Methods

  • Which methods estimate the in-sample error

  • Which methods estimate the extra-sample error

  • Why is in-sample error often considered instead of the extra-sample error?

AIC

  • How is AIC motivated (general rationale)?
  • Why does the AIC underestimate the test error in Figure 7.4 (left)for the most complex model?
  • The article says the AIC “does a reasonable job” for the 0-1 loss on Figure 7.5 (right). The AIC consistently overestimates the test error though. Why do authors conclude that the results are “reasonable” then?

BIC

  • How is BIC motivated?
  • How does BIC relate to the Bayes factor?
  • How can we assess relative merit of models using the BIC?

Cross-validation

  • What is the ideal scenario for CV?
  • Why is it not always possible/feasible?
  • How can we get around it?
  • What is the trade off for chosing between large and small K?

Bootstrap

  • What the basic idea behind bootstrap?
  • How can we apply it to assessing model performance?
  • Why is the naive metric in (7.48) not ideal?
  • What is the alternative?
  • What is the idea behind the ‘.632’ estimator?

Questions

Next week…

Submit topic preferences

  • Please see the updated list of topics on the website
  • No later than Friday 26.04.2024
    • If you don’t have a unique topic and a partner
    • If you have a unique topic and a partner
      • Send it to me by e-mail for approval

Plan for the next week

  • Read Shmueli (2010) before the next seminar for discussion
  • Confirmation of topics and groups

References

Hastie, T., Tibshirani, R., Friedman, J., Hastie, T., Tibshirani, R., & Friedman, J. (2009). Model assessment and selection. In The elements of statistical learning: data mining, inference, and prediction (pp. 219–259). Springer Series in Statistics. Springer, New York, NY. https://link.springer.com/chapter/10.1007/978-0-387-21606-5_7#preview
Shmueli, G. (2010). To Explain or to Predict? Statistical Science, 25(3), 289–310. https://doi.org/10.1214/10-STS330