Machine Learning in R @ Cold Spring Harbor Laboratory, Summer 2018

This is an Introduction to Machine Learning in R, in which you’ll learn the basics of unsupervised learning for *pattern recognition* and supervised learning for *prediction*. At the end of this workshop, we hope that you’ll

- appreciate the importance of performing exploratory data analysis (or EDA) before starting to model your data.
- understand the basics of
*unsupervised learning*and know the examples of principal component analysis (PCA) and k-means clustering. - understand the basics of
*supervised learning*for prediction and the differences between*classification*and*regression*. - understand modern machine learning techniques and principles, such as test train split, k-fold cross validation and regularization.
- be able to write code to implement the above techniques and methodologies using R, caret and glmnet.

The mathematical foundation for each section is not contained in these pages, as the instructor will explain and elaborate on the whiteboard.

This material has come from many conversations, workshops and online courses over the years, most notably the work that I have done at DataCamp. Some of the material is similar to material that I developed for DataCamp’s Supervised Learning with scikit-learn course, on which I collaborated with Andreas Müller and Yashas Roy, along with community articles that I have written, such as Kaggle Tutorial: EDA & Machine Learning & Experts’ Favorite Data Science Techniques. Finally, I found time to develop this material due to the 20% community time that I have at DataCamp and am indebted to them for this.

## Prerequisites

Basic command of R

Setup | Download files required for the lesson | |

00:00 | 1. Loading and exploring data |
What is Exploratory Data Analysis (EDA) and why is it useful?
How can I do EDA in R? |

00:30 | 2. Unsupervised Learning |
What is principal component analysis (PCA)?
How can I perform PCA in R? What is clustering? |

01:30 | 3. Supervised Learning I: classification | How can I apply supervised learning to a data set? |

02:00 | 4. Supervised Learning II: regression | What if the target variable is numerical rather than categorical? |

02:40 | Finish |

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.