asadoughi · MichaelChirico · Sep 12, 2016 · Sep 15, 2016 · Sep 16, 2016 · Sep 19, 2016
diff --git a/ch10/lab.Rmd b/ch10/lab.Rmd
@@ -0,0 +1,74 @@
+Principal Components
+======================
+
+We will use the USArrests data (which is in R)
+
+```{r}
+dimnames(USArrests)
+apply(USArrests, 2L, mean)
+apply(USArrests, 2L, var)
+```
+
+We see that `Assault` has a much larger variance than the other variables. It would dominate the principal components, so we choose to standardize the variables when we perform PCA
+
+```{r}
+pca.out = prcomp(USArrests, scale = TRUE)
+pca.out
+names(pca.out)
+biplot(pca.out, scale = 0)
+```
+
+K-Means Clustering
+====================
+
+K-means works in any dimension, but is most fun to demonstrate in two, because we can plot pictures. Lets make some data with clusters. We do this by shifting the means of the points around.
+
+```{r}
+set.seed(101)
+x = matrix(rnorm(100L * 2L), 100L, 2L)
+xmean = matrix(rnorm(8L, sd = 4), 4L, 2L)
+which = sample(1L:4L, 100L, replace = TRUE)
+x = x + xmean[which, ]
+plot(x, col = which, pch = 19L)
+```
+
+We know the “true” cluster IDs, but we wont tell that to the `kmeans` algorithm.
+
+```{r}
+km.out = kmeans(x, 4L, nstart = 15L)
+km.out
+plot(x, col = km.out$cluster,
+ cex = 2, pch = 1L, lwd = 2L)
+points(x, col = which, pch = 19L) #permuted colors
+points(x, col = (4:1)[which], pch = 19L)
+```
+
+Hierarchical Clustering
+=========================
+
+We will use these same data and use hierarchical clustering
+
+```{r}
+hc.complete = hclust(dist(x), method = "complete")
+plot(hc.complete)
+hc.single = hclust(dist(x), method = "single")
+plot(hc.single)
+hc.average = hclust(dist(x), method = "average")
+plot(hc.average)
+```
+
+Lets compare this with the actualy clusters in the data. We will use the function cutree to cut the tree at level 4. This will produce a vector of numbers from 1 to 4, saying which branch each observation is on. You will sometimes see pretty plots where the leaves of the dendrogram are colored. I searched a bit on the web for how to do this, and its a little too complicated for this demonstration.
+
+We can use table to see how well they match:
+
+```{r}
+hc.cut = cutree(hc.complete, 4L)
+table(hc.cut, which)
+table(hc.cut, km.out$cluster)
+```
+
+or we can use our group membership as labels for the leaves of the dendrogram:
+
+```{r}
+plot(hc.complete, labels = which)
+```
diff --git a/ch10/lab.html b/ch10/lab.html
diff --git a/ch6/lab.Rmd b/ch6/lab.Rmd
@@ -0,0 +1,165 @@
+Model Selection
+================
+
+This is an R Markdown document. Markdown is a simple formatting syntax for authoring web pages, and a very nice way of distributing an analysis. It has some very simple syntax rules.
+
+
+```{r}
+library(ISLR)
+summary(Hitters)
+```
+
+There are some missing values here, so before we proceed we will remove them:
+
+```{r}
+Hitters = na.omit(Hitters)
+with(Hitters, sum(is.na(Salary)))
+```
+
+Best Subset regression
+========================
+
+We will now use the package `leaps` to evaluate all the best-subset models.
+
+```{r}
+library(leaps)
+regfit.full = regsubsets(Salary ~ ., data = Hitters)
+summary(regfit.full)
+```
+
+It gives by default best-subsets up to size 8; lets increase that to 19, i.e. all the variables
+
+```{r}
+regfit.full = regsubsets(Salary ~ ., data = Hitters, nvmax = 19L)
+reg.summary = summary(regfit.full)
+names(reg.summary)
+plot(reg.summary$cp, xlab = "Number of Variables", ylab = "Cp")
+which.min(reg.summary$cp)
+points(10, reg.summary$cp[10L], pch = 20L, col = "red")
+```
+
+There is a plot method for the `regsubsets` object
+
+```{r}
+plot(regfit.full, scale = "Cp")
+coef(regfit.full, 10L)
+```
+
+Forward Stepwise Selection
+============================
+
+Here we use the `regsubsets` function but specify the `method=“forward”` option:
+
+```{r}
+regfit.fwd = regsubsets(Salary ~ ., data = Hitters, 
+ nvmax = 19L, method = "forward")
+summary(regfit.fwd)
+plot(regfit.fwd, scale = "Cp")
+```
+
+Model Selection Using a Validation Set
+========================================
+
+Lets make a training and validation set, so that we can choose a good subset model. We will do it using a slightly different approach from what was done in the the book.
+
+```{r}
+dim(Hitters)
+set.seed(1)
+train = sample(263L, 180L, replace = FALSE)
+train
+regfit.fwd = regsubsets(Salary ~ ., data = Hitters[train, ],
+ nvmax = 19L, method = "forward")
+```
+
+Now we will make predictions on the observations not used for training. We know there are 19 models, so we set up some vectors to record the errors. We have to do a bit of work here, because there is no `predict` method for `regsubsets`.
+
+```{r}
+val.errors = numeric(19L)
+x.test = model.matrix(Salary ~ ., data = Hitters[-train, ]) # notice the -index!
+for (i in 1L:19L) {
+ coefi = coef(regfit.fwd, id = i)
+ pred = x.test[ , names(coefi)] %*% coefi
+ val.errors[i] = mean((Hitters$Salary[-train] - pred)^2)
+}
+plot(sqrt(val.errors), ylab = "Root MSE",
+ ylim = c(300, 400), pch = 19L, type = "b")
+points(sqrt(regfit.fwd$rss[-1L]/180), 
+ col = "blue", pch = 19L, type = "b")
+legend("topright", legend = c("Training", "Validation"), 
+ col = c("blue", "black"), pch = 19L)
+```
+
+As we expect, the training error goes down monotonically as the model gets bigger, but not so for the validation error.
+
+This was a little tedious - not having a predict method for regsubsets. So we will write one!
+
+```{r}
+predict.regsubsets = function(object, newdata, id, ...) {
+ form = as.formula(object$call[[2L]])
+ mat = model.matrix(form, newdata)
+ coefi = coef(object, id = id)
+ mat[ , names(coefi)] %*% coefi
+}
+```
+
+Model Selection by Cross-Validation
+=====================================
+
+We will do 10-fold cross-validation. Its really easy!
+
+```{r}
+set.seed(11)
+folds = sample(rep(1L:10L, length = nrow(Hitters)))
+folds
+table(folds)
+cv.errors = matrix(NA, 10L, 19L)
+for (k in 1L:10L) {
+ best.fit = regsubsets(Salary ~ ., data = Hitters[folds != k, ], 
+ nvmax = 19L, method = "forward")
+ for (i in 1L:19L) {
+ pred = predict(best.fit, Hitters[folds == k, ], id = i)
+ cv.errors[k, i] = mean((Hitters$Salary[folds == k] - pred)^2)
+ }
+}
+rmse.cv = sqrt(apply(cv.errors, 2L, mean))
+plot(rmse.cv, pch = 19L, type = "b")
+```
+
+Ridge Regression and the Lasso
+================================
+
+We will use the package `glmnet`, which does not use the model formula language, so we will set up an x and y.
+
+```{r}
+library(glmnet)
+x = model.matrix(Salary ~ . - 1, data = Hitters)
+y = Hitters$Salary
+fit.ridge = glmnet(x, y, alpha = 0)
+plot(fit.ridge, xvar = "lambda", label = TRUE)
+cv.ridge = cv.glmnet(x, y, alpha = 0)
+plot(cv.ridge)
+```
+
+Now we fit a lasso model; for this we use the default `alpha=1`
+
+```{r}
+fit.lasso = glmnet(x, y)
+plot(fit.lasso, xvar = "lambda", label = TRUE)
+cv.lasso = cv.glmnet(x, y)
+plot(cv.lasso)
+coef(cv.lasso)
+```
+
+Suppose we want to use our earlier train/validation division to select the `lambda` for the lasso. This is easy to do.
+
+```{r}
+lasso.tr = glmnet(x[train, ], y[train])
+lasso.tr
+pred = predict(lasso.tr, x[-train, ])
+dim(pred)
+rmse = sqrt(apply((y[-train] - pred)^2, 2, mean))
+plot(log(lasso.tr$lambda), rmse, type = "b", xlab = "Log(lambda)")
+lam.best = lasso.tr$lambda[order(rmse)[1]]
+lam.best
+coef(lasso.tr, s = lam.best)
+```
diff --git a/ch6/lab.html b/ch6/lab.html
diff --git a/ch7/lab.Rmd b/ch7/lab.Rmd
@@ -0,0 +1,146 @@
+Nonlinear Models
+==================
+
+Here we explore the use of nonlinear models using some tools in R
+
+```{r}
+library(ISLR)
+```
+
+Polynomials
+-------------
+
+First we will use polynomials, and focus on a single predictor age:
+
+```{r}
+fit = lm(wage ~ poly(age, 4L), data = Wage)
+summary(fit)
+```
+
+The `poly()` function generates a basis of orthogonal polynomials. Lets make a plot of the fitted function, along with the standard errors of the fit.
+
+```{r fig.width = 7, fig.height = 6}
+agelims = range(Wage$age)
+age.grid = seq(from = agelims[1L], to = agelims[2L])
+preds = predict(fit, newdata = list(age = age.grid), se = TRUE)
+se.bands = with(preds, fit + 2 * cbind(se.fit, -se.fit))
+with(Wage, plot(age, wage, col = "darkgrey"))
+lines(age.grid, preds$fit, lwd = 2, col = "blue")
+matlines(age.grid, se.bands, col = "blue", lty = 2L)
+```
+
+There are other more direct ways of doing this in R. For example
+
+```{r}
+fita = lm(wage ~ age + I(age^2) + I(age^3) + I(age^4), data = Wage)
+summary(fita)
+```
+
+Here `I()` is a *wrapper* function; we need it because `age^2` means something to the formula language, while `I(age^2)` is protected. The coefficients are different to those we got before! However, the fits are the same:
+
+```{r}
+plot(fitted(fit), fitted(fita))
+```
+
+By using orthogonal polynomials in this simple way, it turns out that we can separately test for each coefficient. So if we look at the summary again, we can see that the linear, quadratic and cubic terms are significant, but not the quartic.
+
+```{r}
+summary(fit)
+```
+
+This only works with linear regression, and if there is a single predictor. In general we would use `anova()` as this next example demonstrates.
+
+```{r}
+fita = lm(wage ~ education, data = Wage)
+fitb = lm(wage ~ education + age, data = Wage)
+fitc = lm(wage ~ education + poly(age, 2L), data = Wage)
+fitd = lm(wage ~ education + poly(age, 3L), data = Wage)
+anova(fita, fitb, fitc, fitd)
+```
+
+### Polynomial logistic regression
+
+Now we fit a logistic regression model to a binary response variable, constructed from `wage`. We code the big earners (`>250K`) as 1, else 0.
+
+```{r}
+fit = glm(I(wage > 250) ~ poly(age, 3L), data = Wage, family = binomial)
+summary(fit)
+preds = predict(fit, list(age = age.grid), se = TRUE)
+se.bands = with(preds, fit + 2 * cbind(fit = 0, lower = -se.fit, upper = se.fit))
+se.bands[1L:5L, ]
+```
+
+We have done the computations on the logit scale. To transform we need to apply the inverse logit mapping
+
+$$ p=\frac{e^\eta}{1+e^\eta}.$$
+
+(Here we have used the ability of MarkDown to interpret TeX expressions.) We can do this simultaneously for all three columns of `se.bands`:
+
+```{r}
+prob.bands = exp(se.bands)/(1 + exp(se.bands))
+matplot(age.grid, prob.bands, col = "blue",
+ lwd = c(2L, 1L, 1L), lty = c(1L, 2L, 2L), 
+ type = "l", ylim = c(0, 0.1))
+with(Wage, points(jitter(age), I(wage > 250)/10, pch = "|", cex = 0.5))
+```
+
+Splines
+---------
+
+Splines are more flexible than polynomials, but the idea is rather similar. Here we will explore cubic splines.
+
+
+```{r plotchunk}
+library(splines)
+fit = lm(wage ~ bs(age, knots = c(25, 40, 60)), data = Wage)
+with(Wage, plot(age, wage, col = "darkgrey"))
+lines(age.grid, predict(fit, list(age = age.grid)),
+ col = "darkgreen", lwd = 2)
+abline(v = c(25, 40, 60), lty = 2L, col = "darkgreen")
+```
+
+The smoothing splines does not require knot selection, but it does have a smoothing parameter, which can conveniently be specified via the effective degrees of freedom or `df`.
+
+```{r, ref.label = 'plotchunk'}
+fit = with(Wage, smooth.spline(age, wage, df = 16))
+lines(fit, col = "red", lwd = 2L)
+```
+
+Or we can use LOO cross-validation to select the smoothing parameter for us automatically:
+
+```{r, ref.label = 'plotchunk'}
+fit = with(Wage, smooth.spline(age, wage, cv = TRUE))
+lines(fit, col = "purple", lwd = 2L)
+fit
+```
+
+Generalized Additive Models
+-----------------------------
+
+So far we have focused on fitting models with mostly single nonlinear terms. The `gam` package makes it easier to work with multiple nonlinear terms. In addition it knows how to plot these functions and their standard errors.
+
+```{r}
+library(gam)
+gam1 = gam(wage ~ s(age, df = 4) + s(year, df = 4) + education, data = Wage)
+par(mfrow = c(1L, 3L))
+plot(gam1, se = TRUE)
+gam2 = gam(I(wage > 250) ~ s(age, df = 4) + s(year, df = 4) + education, 
+ data = Wage, family = binomial)
+plot(gam2)
+```
+
+Lets see if we need a nonlinear terms for year
+
+```{r}
+gam2a = gam(I(wage > 250) ~ s(age, df = 4) + year + education,
+ data = Wage, family = binomial)
+anova(gam2a, gam2, test = "Chisq")
+```
+
+One nice feature of the `gam` package is that it knows how to plot the functions nicely, even for models fit by `lm` and `glm`.
+
+```{r, fig.width = 10, fig.height = 5}
+par(mfrow = c(1L, 3L))
+lm1 = lm(wage ~ ns(age, df = 4) + ns(year, df = 4) + education, data = Wage)
+plot.gam(lm1, se = TRUE)
+```