Model Comparison Seminar

TU Dortmund, Summer semester 2024

Today

  1. Revise Hastie et al. (2009)
  2. Discussion of Shmueli (2010)
  3. Plan for the next week:
    1. Topic assignments, presentation dates
    2. Read Navarro (2019)

Revise Hastie et al. (2009)

Hastie et al. (2009)

  • Who has not read the paper by now?

  • Any additional questions about that paper?

Discussion of Shmueli (2010)

Shmueli (2010)

  • Who has read the paper?
  • Who has not read the paper?

Goals

  • What are the differences between explanatory and predictive modeling?
  • What arguments does Shmueli (2010) provide to highlight importance of prediction in statistical modeling?
  • Can you explain the tension between prediction and explanation in terms of:
    • Causation-Association
    • Theory-Data
    • Retrospective-Prospective
    • Bias-Variance
      • Discuss the example from the appendix

Consequences of the distinction

  • What are the consequences of when one wants to take a predictive vs. explanatory view on:
  1. Study design and data collection
  2. Data preparation
  3. EDA
  4. Choice of variables
  5. Choice of methods

Model assessment

  • The distinction between prediction/explanation is especially visible in the model selection literature
    • Why do you think that is the case?
  • How does prediction/explanation inform practices in model validation and model selection?

Use in the literature

  • Do you think that Shmueli (2010) considered one or the other approach overused in academic literature?
    • Can you find examples?
    • Do you agree?
    • What are the consequences of omitting the underused approach, according to Shmueli (2010)?

Tension or opposition

  • Does Shmueli (2010) think that explanation and prediction are mutually exclusive?
    • Do you agree?
    • How does it affect modeling procedures? What recommendations did Shmueli (2010) give regarding reporting practices?

Implications for the field of statistics

  • Shmueli (2010) claims that the field of machine learning filled a void in the literature
    • What void?
    • What are the arguments for that claim?
    • Do you agree?
    • Do you think this situation has changed since 2010?

Suggestions by Shmueli (2010)

  • Think about the two suggestions for statisticians provided by Shmueli (2010)
  • What do you think about these suggestions?

Discussion and questions

  • What did you think about this article?

  • Do you think that Shmueli (2010) has strong arguments for the claims presented in the article?

  • Has reading about this distinction help you think about modeling in general terms?

  • Did it help you think about possible choices when comparing statistical models?

  • Any questions?

Next week

Topic assignments

  • List of assignments: https://moodle.tu-dortmund.de/mod/data/view.php?id=1674369
    • Find yourself in the list of topics and contact your partner in the group
  • You have until Friday 3.5.2024 to let me know if this arrangement does not work for you
    • Date-wise: You cannot do that specific date
    • Topic-wise: I made an error and assigned you to a topic you did not want
    • Group-wise: You could not find your assigned partner

Plan for the next week

  • Read Navarro (2019) before the next seminar for discussion
  • Final confirmation of presentation dates

References

Hastie, T., Tibshirani, R., Friedman, J., Hastie, T., Tibshirani, R., & Friedman, J. (2009). Model assessment and selection. In The elements of statistical learning: data mining, inference, and prediction (pp. 219–259). Springer Series in Statistics. Springer, New York, NY. https://link.springer.com/chapter/10.1007/978-0-387-21606-5_7#preview
Navarro, D. J. (2019). Between the devil and the deep blue sea: Tensions between scientific judgement and statistical model selection. Computational Brain & Behavior, 2(1), 28–34. https://doi.org/10.1007/s42113-018-0019-z
Shmueli, G. (2010). To Explain or to Predict? Statistical Science, 25(3), 289–310. https://doi.org/10.1214/10-STS330