Machine Learning in R: Glossary

Key Points

Loading and exploring data
  • Plots are always useful tools for getting to know your data.

  • Center and scale your numerical variables using the caret package.

Unsupervised Learning
  • Supervised and unsupervised learning are different machine learning techniques that are used for different purposes.

  • PCA can help simplify data analysis.

  • Clustering may reveal hidden patterns or groupings in the data.

  • A cross table is a tool that allows us to measure the performance of an algorithm.

Supervised Learning I: classification
  • The target variable is the variable of interest, while the rest of the variables are known as features or predictor variables.

  • Separate your data set into training and test sets to avoid overfitting.

  • Logistic regression and random forests can be used to predict categorical variables.

Supervised Learning II: regression
  • Regression is a useful tool to predict numerical variables.

  • Use RMSE to measure the regression’s performance.

  • Lasso regression can be used to identify key variables.

Glossary

FIXME