applied mathematician/data scientist/writer

this github page describes some of my statistical, data analytic and coding interests and practices. for a more comprehensive view of what i do, follow the links below:

i'm an applied mathematician, freelance writer, hacker & educator currently working at datacamp, an online, interactive education platform for all things data science. You should check it out! my job there is to build out the python curriculum: if you love python & are passionate about education, you should get in touch because i'd love to discuss creating a course with you! you can also find many of my research interests here and much of my work as a freelance writer here.

statistics, data analysis, data science

until recently, i worked in basic science research for a number of years and much of my work on campuses (yale university, max planck institute for cell biology & genetics, dresden) involved consulting with experimentalists on best statistical practices & methodologies. in this work, i aimed to provide experimentalists with a statistical and data analytic framework whereby they may view the processes of statistical inference as not merely necessary evils that occur between doing experiments and publishing results, but as intuitive ways of thinking that can help with all steps of the scientific process, from exploratory research to the posing of well-formed scientific questions to experimental design. Straight off the bat, experimentalists can think about questions such as ‘how many data points will i need to find a statistically significant result?’ and ‘what type of experimental set-up is required to retrieve the signal-to-noise ratio necessary to validate anticipated results?’

practical statistics workshops

to this end, i have taught formal workshops on statistical inference for experimentalists at the max planck institute for cell biology and genetics, dresden, and at yale university, new haven, along with more informal workshops conducted in locations such as the marine biology laboratory, woods hole. in fall, 2015, i taught a ’practical statistics for experimentalists’ workshop in the molecular biophysics & biochemistry department (mb&b) at yale university to 15 graduate students and postdocs from both mb&b and yale medical school. there were 4 workshops in total: the 1st was on (click on links for workshop notes) plotting data, summary statistics and the central limit theorem, the 2nd on statistical hypothesis testing, the 3rd on model fitting and model selection (mle, linear regression, nonlinear least squares) and the 4th on bayesian inference. The emphasis was on developing the experimentalists’ practical statistics toolkit in terms of i) knowledge of statistical tools & ii) practical ability to implement them, using the programming language R. I will teach a similar workshop at yale in early 2016.

(data) scientific reproducibility

statistical literacy is a huge challenge for both the natural & social sciences. another massive challenge is reproducibility. ioannidis’ 2005 paper why most published research findings are false gave a precise formalism to a serious, endemic problem facing scientific communities in all fields. i am interested in and committed to the development of robust statistical frameworks that will contribute to the reproducibility of scientific results as a whole, in contrast to the messy toolbox of statistics that is currently used in high and low-impact journals alike. papers such as ioannidis’ cited above & cummings’ 2013 the new statistics: why and how are a great a place to start to think about potential avenues, for those interested. this issue of reproducibility is also one that will plague data science and big data and so is worth investing in at all levels of society, not merely the academic.

coding and computations

i code primarily in python, r and matlab. you can find some of my code here on github. check out my repo for pattern formation and dynamical systems, much of which my boss and i use to teach a dynamical systems course at yale university. see here for a repo in which i am honing my machine learning chops using both scikit-learn and caret. comments & correspondence are encouraged & repos are works in progress.

hackathons

i live in nyc, where there’s a vibrant hackathon and data science meetup environment. they are both great places to meet like-minded, interesting people. click below to read a write-up i did on some work we completed at a recent ‘science against slavery’ hackathon.

hackathon blog


conference surprise

hit the button below if you’re attending a data science conference and want to do something fun.

surprise conference button