August 31, 2009

Books to Read While the Algae Grow in Your Fur, August 2009

Nadia Gordon, Murder Alfresco
Bright and amusing murder mystery among Napa Valley foodies. Sequel to Death by the Glass, followed by Lethal Vintage.
Roberto Bolaño, Nazi Literature in the Americas
Capsule literary biographies of thirty imaginary fascist writers, from the US to (of course) Chile and Argentina. It gains from two remarkable achievements on Bolaño's part: first, while everything is made up, nothing is exaggerated (I feel certain I have read books by both J. M. S. Hill and Zach Sodenstern, and as for the Colonia Renacer in "Willy Schürholz", well...); and second, his literary Nazis are not just caricatures, and in some cases (e.g., "Irma Carrasco") actually affecting. Emphatically recommended if you are in the mood for black hilarity.
Thomas Harlan, Land of the Dead
Third volume of his series of Lovecraftian alternate-history space operas. (On which, see here.) More stuff-blowing-up than I remember from the earlier books, but still plenty of ancient extraterrestrial secrets man was not meant to know.
Greg Rucka and Steve Lieber, Whiteout: Melt
More Antarctic crime-fiction, this time in thriller rather than mystery mode.
The Middleman
I think I am demographically compelled to be charmed by this; and I am.
G. Willow Wilson and M. K. Perker, Air: Letters from Lost Countries
Shivers, a.k.a. They Came From Within
Still scary and disturbing, despite decades of intervening horror movies about zombies, parasites crawling inside people, etc. (Ebert's contemporary review is informative without having much spoilerage.) The crisis was effective, and (in a twisted way) romantic.
Observation: lots of attitudes have certainly changed ("she was twelve"); and while gadgets, clothes, cars, look out-of-date, the kind of life depicted is still very much ours. You could remake this today with hardly a change to the plot at all, except that you'd need keep everybody's cell-phones from working.
Query: why did Romero's zombies (i.e., the re-animated dead) take over the world, rather than Cronenberg's parasite hosts? Did the latter involve too much sex to fly commercially?
Alexandra Sokoloff, The Harrowing
Pleasingly creepy ghost-story about emotionally-scarred undergrads who really should not have played around with Ouija boards. First novel; good enough that I'll look for her others.
Criminal Minds
When did the networks start broadcasting Shadow Unit fan-fiction? (And is it too late for me to change the grade of the student last year who "accidentally" referred to me as "Dr. Reid"?) — Series fatigue set in for me about half-way through the third season.
ObLink: Gladwell on profiling.
Charles Manski, Identification for Prediction and Decision
Review: Better Roughly Right than Exactly Wrong.
Jacob Kogan, Introduction to Clustering Large and High-Dimensional Data
What it says on the label. A short book (160 pp., excluding math review appendix and problem solutions) exclusively devoted to "hard" or "crsip" non-hierarchical clustering, mostly ignoring statistical issues in favor of computational ones, and emphasizing methods that scale to large problems. This includes ingenious tricks for replacing the difficult optimization problems implicit in most clustering algorithms with tractable smooth approximations. The old standby k-means algorithm plays a bigger role than I'd have guessed. Mostly pretty clear, though not an outstanding piece of exposition. (I think the BIRCH algorithm on pp. 42--43 is wrong as written, since step 2 seems to be redundant!) More useful for those in the field, I think, than to those who just want to cluster some data and get on with their lives.

Books to Read While the Algae Grow in Your Fur; Scientifiction and Fantastica; Enigmas of Chance; Pleasures of Detection; Cthulhiana; The Running Dogs of Reaction; The Commonwealth of Letters; The Dismal Science

Posted by crshalizi at August 31, 2009 23:59 | permanent link

August 20, 2009

"Econometric Shrinkage and Model Averaging" (Week-after-next at the Statistics Seminar)

Attention conservation notice: Irrelevant unless you are (a) interested in combining statistical models and (b) in Pittsburgh.

Week after next at the statistics seminar:

Bruce E. Hansen, "Econometric Shrinkage and Model Averaging"
Abstract: Model uncertainty is pervasive in applied econometrics. The traditional solution of model selection is being supplanted by the concept of model averaging. When there are two nested models, model averaging is equivalent with shrinkage estimation. In econometrics, shrinkage theory has been confined to the linear Gaussian regression model, precluding application to most econometric contexts. In this talk, I show that we can apply the modern theory of statistical shrinkage to parametric econometric estimators, including GMM and MLE. The result is that we can construct shrinkage estimators which globally dominate conventional unrestricted GMM and MLE estimators. I extend the classic theory by allowing for arbitrary estimators and weight matrices, and I show how the methods can be used to separate parameters of interest from nuisance parameters. The reduction in risk from shrinkage can be substantial.
Model averaging generalizes shrinkage to the case where the number of models exceeds two. Non-Bayesian model averaging methods have been developed by the author in previous work. The talk will discuss the development of non-Bayesian model averaging methods for general econometric estimators.
Time and Place: Monday, 31 August 2009, 4--5 pm, Doherty Hall 310

The seminar is free and open to the public. Contact me if you would like to meet with Prof. Hansen during his visit to CMU.

Enigmas of Chance; The Dismal Science

Posted by crshalizi at August 20, 2009 14:55 | permanent link

August 17, 2009

Course Announcement: 36-350, Data Mining, Fall 2009

Since the semester starts in a lamentably small number of days:

Title: 36-350, Statistical Data Mining
Prereqs: One of 36-226, 36-310, 36-625, or consent of instructor. In addition, familiarity with vectors and matrices, and comfort with programming, will be very helpful.
Lectures: MWF 10:30--11:20, Porter Hall 226B. (The on-line class schedule thinks the Friday lecture is a lab; it's wrong.)
Course description:
Data mining is the art of extracting useful patterns from large bodies of data; finding seams of actionable knowledge in the raw ore of information. The rapid growth of computerized data, and the computer power available to analyze it, creates great opportunities for data mining in business, medicine, science, government, etc. The aim of this course is to help you take advantage of these opportunities in a responsible way. After taking the class, when you're faced with a new problem, you should be able to (1) select appropriate methods, and justify their choice, (2) use and program statistical software (i.e., R) to implement them, and (3) critically evaluate the results and communicate them to colleagues in business, science, etc.
Data mining is related to statistics and to machine learning, but has its own aims and scope. Statistics is a mathematical science, studying how reliable inferences can be drawn from imperfect data. Machine learning is a branch of engineering, developing a technology of automated induction. We will freely use tools from statistics and from machine learning, but we will use them as tools, not things to study in their own right. We will do a lot of calculations, but will not prove many theorems, and we will do even more experiments than calculations.

The current topic outline, the grading policy, etc., can all be found on the class webpage. This will mostly be very similar to the 2008 iteration of the class, since it seemed to work, with some modifications in light of that experience. Podcast lectures are probably not going to happen, owing to technical incompetence on my part.

(Oh, and in case you're wondering: I'm behind on answering everyone else's email too, not just yours.)

Corrupting the Young; Enigmas of Chance; Self-Centered

Posted by crshalizi at August 17, 2009 14:17 | permanent link

Three-Toed Sloth:   Hosted, but not endorsed, by the Center for the Study of Complex Systems