statistics, data analysis, data science
until recently, i worked in basic science research for a number of years and much of my work on campuses (yale university, max planck institute for cell biology & genetics, dresden) involved consulting with experimentalists on best statistical practices & methodologies. in this work, i aimed to provide experimentalists with a statistical and data analytic framework whereby they may view the processes of statistical inference as not merely necessary evils that occur between doing experiments and publishing results, but as intuitive ways of thinking that can help with all steps of the scientific process, from exploratory research to the posing of well-formed scientific questions to experimental design. Straight off the bat, experimentalists can think about questions such as ‘how many data points will i need to find a statistically significant result?’ and ‘what type of experimental set-up is required to retrieve the signal-to-noise ratio necessary to validate anticipated results?’
practical statistics workshops
to this end, i have taught formal workshops on statistical inference for experimentalists at the max planck institute for cell biology and genetics, dresden, and at yale university, new haven, along with more informal workshops conducted in locations such as the marine biology laboratory, woods hole. in fall, 2015, i taught a ’practical statistics for experimentalists’ workshop in the molecular biophysics & biochemistry department (mb&b) at yale university to 15 graduate students and postdocs from both mb&b and yale medical school. there were 4 workshops in total: the 1st was on (click on links for workshop notes) plotting data, summary statistics and the central limit theorem, the 2nd on statistical hypothesis testing, the 3rd on
model fitting and model selection (mle, linear regression, nonlinear least squares) and the 4th on bayesian inference. The emphasis was on developing the experimentalists’ practical statistics toolkit in terms of i) knowledge of statistical tools & ii) practical ability to implement them, using the programming language R. I will teach a similar workshop at yale in early 2016.
(data) scientific reproducibility
statistical literacy is a huge challenge for both the natural & social sciences. another massive challenge is reproducibility. ioannidis’ 2005 paper
why most published research findings are false gave a precise formalism to a serious, endemic problem facing scientific communities in all fields. i am interested in and committed to the development of robust statistical frameworks that will contribute to the reproducibility of scientific results as a whole, in contrast to the messy toolbox of statistics that is currently used in high and low-impact journals alike. papers such as ioannidis’ cited above & cummings’ 2013
the new statistics: why and how
are a great a place to start to think about potential avenues, for those interested. this issue of reproducibility is also one that will plague data science and big data and so is worth investing in at all levels of society, not merely the academic.