I've been asked by more than two people about what my "journey to the center of data science" was and, well, one of my core values is automation. So here's my response to those people. Next, I'll write a bot that will just auto-reply this appropriately, heh.
Usual warning/caveat: your mileage may vary.
My path
This is in chronological order, starting from 2014 to ~now.
Glossary
*
= how much each resource is still applicable to my day-to-day, where more = better- 💵 = whether I paid any money for it (as opposed to, e.g., having it reimbursed)
- 🐝 = taken as part of Georgia Tech's OMSCS program (this costs 💵 for degree credit, but is freely available on Udacity if you don't want to take it for credit)
I've also put the approximate amount of time spent ingesting (not digesting!) the resource.
Background: BS in economics and mathematics, MPhil in economics + several years working as a research manager and economist in university research labs and non-profit sector.
- 💵 Interactive Data Visualization for the Web by Scott Murray
*
(~1 month) - Udacity - Data Visualization and D3.js
*
(~3-4 weeks) - Udacity - Programming Foundations with Python
***
(~2 weeks + several months working on first project) - Udacity - How to Use Git and GitHub
***
(on and off, maybe 2 weeks) - Coursera - Machine Learning Foundations: A Case Study Approach
**
(~4 weeks) - RailsBridge
*
(1 weekend) - ComputeFest 2016
**
(2 days) - 💵 Harvard Extension School - CS171: Visualization
*
(~240 hours AKA 15 hours/week for a semester) - Boston Python User Group - Talking to other computers with Python
***
(1 day) - OpenVisConf 2016
***
(3 days) - Udacity - Intro to Relational Databases
***
(~2 weeks)
Got first data science job
- Udacity - Intro to Machine Learning
**
(~3 weeks, unfinished!) - 💵 Harvard Extension School - CS109A: Intro to Data Science
***
(~320 hours AKA full semester, 20 hours/week) - 💵 Harvard Extension School - CS109B: Advanced topics in data science
***
(~320 hours AKA full semester, 20 hours/week) - DataCamp - Deep Learning in Python
***
(~4 whirlwind hours, applied to CS109B final project) - Boston Python User Group - Data testing tutorial
***
(1 day) - 💵 Doing Data Science by Rachel Schutt, Cathy O'Neil
**
(~10 hours) - Boston Python User Group - Network analysis made simple
***
(1 day) - ClojureBridge
**
(1 day + a loooong time working on the crypto puzzle) - Bayesian Methods for Hackers: Probabilistic Programming and Bayesian Inference by Cameron Davidson-Pilon
***
(~10 hours) - 💵 Mastering Regular Expressions by Jeffrey E.F. Friedl
***
(~10 hours) - 💵 Learning the bash Shell by Cameron Newham, Bill Rosenblatt
***
(~10 hours)
Got second data science job
- Clojure/conj 2017,
*
(~2 days) - EdX - CS50: Introduction to Computer Science
***
(~60 hours, see the journey here!) - TomTomFest - Applied Machine Learning Conference 2018,
**
(1 day)
Admitted to Georgia Tech's OMSCS
- Lynda - Advanced SQL for Data Scientists
***
(~2 hours) - 🐝 Georgia Tech/Udacity - Knowledge-Based AI
*
(120 hours, AKA full semester at ~10 hours/week) - Lynda - Polyglot Web Development
*
(~2 hours) - PyData DC 2018 -
***
(3 days) - Destroy All Software - A compiler from scratch -
***
(2 days) - Lynda - Ruby Essential Training
***
(~10 hours) - Lynda - Git Intermediate Techniques
**
(~3 hours) - Lynda - DevOps for Data Scientists
*
(~1 hour) - 💵Destroy All Software - A test runner from scratch
***
(2 days) - Lynda - Kafka Essential Training
*
(2 days) - Lynda - Parallel and Concurrent Programming with Python I
***
(2 days) - Lynda - Learning Flask
*
(1 day) - Lynda - Ruby on Rails 5 Essential Training
***
(only did half of it, many hours) - Thoughtbot - Onramp to Vim
***
(~1 week)
You can see what I'm currently doing on LinkedIn.
General resources
Those are things I actually went through, top to bottom. In addition, here are some reference resources I use a lot (and find very helpful):
- YouTube - Data School: Great for getting a handle on
pandas
's syntax. - Chris Albon - Data wrangling tutorials: More
pandas
. - YouTube - MIT 6.034 Artificial Intelligence, Fall 2010: Clear, precise and deep explanations of SVMs, random forests, neural nets and more. I need to go through this top to bottom.
- An Introduction to Statistical Learning by James et al.: The main textbook, with examples in R (unfortunately...).
- Elements of Statistical Learning by Hastie et al.: More main textbook.
- Artificial Intelligence by Winston: Winston is great.
- Khan Academy - Linear algebra: Refreshing the fundamentals!
- Khan Academy - Statistics and probability: More excellent fundamentals!
- YouTube - mathematicalmonk: Very good explanations of more advanced math and probability topics - his video on Markov Chain Monte Carlo (MCMC) sampling was very helpful.
- YouTube - 3Brown1Blue: Enthusiastic math.
- YouTube - Vihart: More enthusiastic math.
- regexr: A beautiful and well-designed site for debugging regular expressions.
- Regex Golf: Think long and hard and don't touch that keyboard.
- Anki: Spaced repetition flashcards app. This changed a lot for me. I'll post a longer blog post about it sometime, but, for now, read Michael Nielsen's post, Augmenting long term memory.
- YouTube/ThePrimeagen: Helpful Vim tutorials.
- VimGolf: Extreme efficiency Vim! Requires a Twitter login, oof.
- Hackerrank: Coding puzzles; helpfully includes SQL (which is rare-ish). The one that slows down my browser and overheats my CPU (why?!).
- LeetCode: More coding puzzles. I prefer the UI on this one. But it's more algorithms + data structures-oriented. The broey one.
- Exercism: Yet more coding puzzles. Also a nice way to learn a new language. The friendly one.
And, of course, Google, StackOverflow, and YouTube.
It never ends
I consider this a start, and I still have a lot of stuff I want to learn. It's a big topic, and the TOLEARN
list is only growing...!