Loading and exploring data
|
|
Unsupervised Learning
|
Supervised and unsupervised learning are different machine learning techniques that are used for different purposes.
PCA can help simplify data analysis.
Clustering may reveal hidden patterns or groupings in the data.
A cross table is a tool that allows us to measure the performance of an algorithm.
|
Supervised Learning I: classification
|
The target variable is the variable of interest, while the rest of the variables are known as features or predictor variables.
Separate your data set into training and test sets to avoid overfitting.
Logistic regression and random forests can be used to predict categorical variables.
|
Supervised Learning II: regression
|
Regression is a useful tool to predict numerical variables.
Use RMSE to measure the regression’s performance.
Lasso regression can be used to identify key variables.
|