Getting Started with Data Science
updated: February 2021
Everyone in the world has a “how to” guide to data science… well, maybe not everyone - but there are a lot of “guides” out there. I get this question infrequently, so I thought I would do my best to put together what have been my best resources for learning.
MY STORY
Personally, I learned statistics by getting my Masters in Applied Statistics at Villanova University - it took 2.5 years. I got my introduction to R by working through the Johns Hopkins University Data Science Specialization on Coursera. Similarly for python, I got an online introduction. While the course I took I’d no longer recommend, there are many out there (like this one from EdX).
This was all bolstered by working with these tools at work and in side projects. The repetition of working with these tools every day has made it more fluent.
Here are some resources that I’ve used or know of - I’ve tried to outline them and group them to the best of my ability. There’s many more out there, and you may find some better or worse depending on your style.
LEARNING DATA PROGRAMMING
- Johns Hopkins University Data Science Specialization on Coursera : As mentioned above this course gave me my start with R, RStudio, and git.
- Kaggle: If you are as competitive as I am, this site should get you going - the interactive kernals and social aspects of this site make it a great place to see other data science in action. Plagiarism is greatest form of flattery (and easiest way to learn - thanks, Stack Overflow).
- EdX - R Programming: I haven’t used EdX much, but there is a wealth of MOOCs here.
- EdX - Python Programming This course is from UCSD and will teach you about Python, Jupyter notebooks, & data viz.
LEARNING STATISTICS & OTHER IMPORTANT MATH
- Khahn Academy - Statistics: I have used Khahn Academy on multiple occasions for refreshers in Statistics and Linear Algebra. The classes are interactive, manageable, and self-paced.
- Khahn Academy - Linear Algebra
- Coursera - Statistics with R
- EdX - Data Analytics & Statistics courses
- Of course - higher education, as well.
BOOKS ON ALL FACETS OF DATA SCIENCE
ETHICS & ALGORITHM BOOKS
- Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy - Cathy O’Neil: Cathy O’Neil does a great job of outlining how data algorithms can have unintended negative consequences. Anyone who builds an machine learning algorithm should read.
- The Signal and the Noise: Why So Many Predictions Fail-but Some Don’t - Nate Silver: Nate Silver is [in]famous for predicting elections. This book gets into the details of how he does that. Super interesting for a guy increasingly interested in politics.
- Algorithms of Oppression: How Search Engines Reinforce Racism - Safiya Umoja Noble : Algorithms often perpetuate the worst about human nature. Google provides many examples of this happening. A cautionary tale for Data Scientists.
- Ethics and Data Science - Mike Loukides, Hilary Mason, DJ Patil: A short (and I believe a free) read on ethics as it pertains to data science.
DATA VIZ BOOKS
- The Wall Street Journal Guide to Information Graphics: The Dos and Don’ts of Presenting Data, Facts, and Figures - Dona M. Wong: I have this book on my desk as a reference. Quick read filled with easy to understand rules and objectives for creating data visualizations. Analyzing data is hard - this book teaches tips to build clear and informative visualizations that don’t take away from the message.
THINKING BOOKS
- How Not to Be Wrong: The Power of Mathematical Thinking - Jordan Ellenberg: Critical thinking is crucial in data science and analytics. This book gives some great tips on how to approach “facts” with the right mindset.
- Thinking, Fast and Slow - Daniel Kahneman : Knowing how people think and consume information is SO IMPORTANT to telling a compelling story.
- Superforecasting: The Art and Science of Prediction - Philip E. Tetlock : Knowing how algorithms work and what they are pushing us to do is important for everyone - it is even more important for the people building the algorithms.
- Factfulness: Ten Reasons We’re Wrong About the World–and Why Things Are Better Than You Think - Hans Rosling, Anna Rosling Rönnlund, Ola Rosling : Hans Rosling famously created gapminder, which uses beautiful data visualization to inform people about the state of the world. This book dives deep into the state of the world and how you have to change your frame of perspective in order to get an accurate picture.
- The Black Swan - Nassim Nicholas Taleb: A story about how people are often wrong about probability in the world and how to not fall into that trap.
- The Great Mental Models Volume 1: General Thinking Concepts - Shane Parrish, Rhiannon Beaubien : Literally a book on how to think. I got a ton of strategies on how to think about things in order to process the information better and not fool myself.
- The Joy of Game Theory: An Introduction to Strategic Thinking - Presh Talwalkar : I love game theory and found this book exceptionally accessible. Yet another resource to learn how to think.
BOOKS ABOUT MATH AND STATISTICS
Here is a list of fun books about math and statistics that I’ve enjoyed:
- The Grapes of Math: How Life Reflects Numbers and Numbers Reflect Life - Alex Bellos
- Naked Statistics: Stripping the Dread from the Data - Charles Wheelan
- The Magic of Math: Solving for x and Figuring Out Why - Arthur Benjamin
PODCASTS
- Hidden Brain: NPR podcast covering many topics. I find it super interesting. While not distinctly data related, it frequently covers topics that have tangential importance to being a good data scientist.
- Exponential View: Not primarily focused on data, but is very frequently covering artificial intelligence and machine learning topics. I recommend the newsletter that goes along with this podcast (link below).
- Not So Standard Deviations: Richard Peng and Hilary Parker host a podcast on all things data science.
- The Data Lab Podcast: Local [to Philly] data podcast interviewing local data scientists. I find it reassuring to hear that my habits are often in line with these peoples, plus I’ve picked up many really great tidbits (like the Exponential View newsletter).
- O’Reilly Data Show: I have attended the Strata data conference by O’Reilly. Much like the conference, this podcast covers many relevant data themes.
- Data Skeptic: Another data podcast that covers many good data topics.
BLOGS & NEWSLETTERS
- Exponential View: Billed as a weekly “wondermissive”, the author Azeem Azhar covers many topics relevant to data and the greater technology economy. I truly look forward to getting this newsletter every Sunday morning.
- Farnam Street: A weekly newsletter (and blog) about decision making. I frequently find golden tips on how to think and frame thinking. Must read.
- Twitter: I follow many great data people on twitter and get a great deal of my data news there.