data science portfolio

Most of my professional data science work is closed source. However, you can find my educational and personal projects below.


The shape of text

python (nltk), nlp, javascript, d3.js, html/css

Final project for CS171 (Visualization) at Harvard. I scraped 800+ science fiction short stories from Strange Horizons, conducted some basic natural language processing, and visualized the results.

How do we price risk?

python (pandas, sklearn), supervised machine learning (random forests)

Final project for CS109A (Introduction to Data Science) at Harvard. Collaborated with my friend, Quinn. We took peer-to-peer lending data and tried predicting the interest rates on individual loans.

Predicting movie genres

image processing, deep learning (convolutional neural nets)

Final project for CS109B (Advanced Topics in Data Science) at Harvard. Collaborated with Quinn (again), Pranav, and Johanna. We scraped movie posters from The Movie Database's API, processed those images with numpy and predicted their likely categories (as a multi-label problem) using a convolutional neural net (in keras).

blog/tag/data-science

statistical theory, bayesian stats, linear algebra doodles, and so much more!

Since November 2017, I've started blogging on what inspires and amuses me in data science, statistics, math, art, and programming. This is also where I'll post smaller personal projects.

triestin

compilers, programming languages, computer science, regex

Final project for CS50 (Introduction to Computer Science) (audited). This is a programming language written mostly in Italian, with a bit of the Triestin dialect. It compiles to Python.

Journey to the center of data science

learning out loud

I've been keeping track of every learning resource (course, YouTube tutorial, book) that I've been using since I began my career transition from economics to data science. This is building into a useful library, both for my own reference and for sharing with people interested in taking a similar journey.