# Statistical / Machine Learning

The 'big data' revolution has required us to rethink not only how we conceive of and collect data, but also the methods we use to analyze our data. In this workshop, we will put the parametric statistical models with which we are all familiar (e.g., OLS, logit, probit) in the context of this larger discussion about statistical learning. These ubiquitous models are simple cases of statistical learning algorithims that can be extended in lots of interesting directions. In the course, we will discuss both supervised learning (models that have a dependent variable, like those for regression and classification) and unsupervised learning (models without a dependent variable, like principal components analysis, and clustering). In particular, we will briefly cover OLS regression and GLMs as simple regression/classification tools. We will then turn to extensions of these models with automatic variable selection (ridge regression and the lasso). Next, we will then branch out to cover semi- and non-parametric alternatives, like generalized additive models (GAMs), kernel regularized least squares, multivariate adaptive regression splines and tree-based regression models. Finally, we will cover clustering, principal components analysis and some extensions like finite mixture models. The course will have both theoretical and applied aspects, though it tends to focus more on applications. All of the models mentioned above have straightforward implementation in R, which we will use thorughout the course. The course will have a lecture component, structured labs and more flexible time where you can try out what you're learning on your own data.