February 07, 2012

It's Not the Heat that Gets to You, It's the Sustained Conjunction of Heat with Elevated Levels of Atmospheric Pollutants (Advanced Data Analysis from an Elementary Point of View)

In which spline regression becomes a matter of life and death in Chicago.

Assignment

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at February 07, 2012 10:31 | permanent link

Splines (Advanced Data Analysis from an Elementary Point of View)

Kernel regression controls the amount of smoothing indirectly by bandwidth; why not control the irregularity of the smoothed curve directly? The spline smoothing problem is a penalized least squares problem: minimize mean squared error, plus a penalty term proportional to average curvature of the function over space. The solution is always a continuous piecewise cubic polynomial, with continuous first and second derivatives. Altering the strength of the penalty moves along a bias-variance trade-off, from pure OLS at one extreme to pure interpolation at the other; changing the strength of the penalty is equivalent to minimizing the mean squared error under a constraint on the average curvature. To ensure consistency, the penalty/constraint should weaken as the data grows; the appropriate size is selected by cross-validation. An example with the data, including confidence bands. Writing splines as basis functions, and fitting as least squares on transformations of the data, plus a regularization term. A brief look at splines in multiple dimensions. Splines versus kernel regression.

Reading: Notes, chapter 7; Faraway, section 11.2.

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at February 07, 2012 10:30 | permanent link

Three-Toed Sloth:   Hosted, but not endorsed, by the Center for the Study of Complex Systems