Essential Readings in Data Science

What is considered canon?

Data Science Literature Review

I saw an intriguing question posed on Twitter and some of the responses were illuminating.


Here’s another variant of the question:


Although Data Science has a long history, it’s considered a relatively young field.

This space will be used to document recommended reading for new entrants:

  1. Downey, Allen (2016) There is only one test. source

  2. Wickham, Hadley (2014) Tidy Data. The Journal of Statistical Software, vol 59. original, update

  3. James, G., Witten, D., Hastie, T. & Tibshirani, R. (2014) An Introduction to Statistical Learning with Applications in R. source

  4. Shmueli, G. (2010) To explain or to predict? Statistical Science, 25(3), 289-310. source

  5. Hernan, M.A., Hsu, J. & Healy, B. (2019) A second chance to get causal inference right: A classification of Data Science tasks. Chance, vol 32(1). source

  6. Gelman, A., Pasarica, C. & Dodhia, R. (2002) Let’s practice what we preach: Turning tables into graphs. The American Statistician, vol 56(2). source

  7. Scott Formann-Roe (June, 2012) Understanding the Bias-Variance Tradeoff. source

  8. Donoho, D (2017) 50 Years of Data Science. Journal of Computational and Graphical Statistics, vol 26(4). source

  9. Wilson, G., Bryan, J., Cranston, K., Kitzes, J., Nederbragt, L. & Teal, T.K. (2017) Good enough practices in scientific computing. Plos Computational Biology. source

  10. Kevin Markham (2019) 100 pandas tricks to save you time and energy. source

  11. Chris Albon’s code snippets. source

  12. Howard, J. & Gugger, S. (Aug 4, 2020) Deep Learning for Coders with fastai and PyTorch: AI Applications without a PhD 1st Ed. source

  13. Brandon Rohrer (Jan, 2020) End-to-End Machine Learning: Complete Course Catalog. source; second source

  14. John Rauser (Dec, 2016) How Humans See Data youtube

  15. Broman, K.W. & Woo, K.H. (2018) Data Organization in Spreadsheets. The American Statistician, vol 72(1). source

  16. Sculley, D., Holt, G., Golovin, D., Davydov, E., Phillips, T., Ebner, D., Chaudhary, V., & Young, M. (2014) Machine Learning: The High Interest Credit Card of Technical Debt. source

  17. 3Blue1Brown for Linear Algebra youtube

  18. Jenny Bryan. Stat 545: Data Wrangling, Exploration and Analysis with R. source

Paul Apivat
Paul Apivat
web3 data

My interests include data science, machine learning and R/Python programming.