September 01, 2010

On an Example of Vinneau's

Attention conservation notice: 1600+ dry, pedantic words and multiple equations on how some heterodox economists mis-understand ergodic theory.

Robert Vienneau, at Thoughts on Economics, has posted an example of a stationary but non-ergodic stochastic process. This serves as a reasonable prompt to follow up on my comment, a propos of Yves Smith's book, that the post-Keynesian school of economists seems to be laboring under a number of confusions about "ergodicity".

I hasten to add that there is nothing wrong with Vienneau's example: it is indeed a stationary but non-ergodic process. (In what follows, I have lightly tweaked his notation to suit my own tastes.) Time is indexed in discrete steps, and Xt = YZt, where Z is a sequence of independent, mean-zero, variance 1 Gaussian random variables (i.e., standard discrete-time white noise), and Y is a chi-distributed random variable (i.e., the square root of something which has a chi-squared distribution). Z is transparently a stationary process, and Y is constant over time, so X must also be a stationary process. However, by simulation Vienneau shows that the empirical cumulative distribution functions from different realizations of the process do not converge on a common limit.

In fact, the result can be strengthened considerably. Given Y = y, X is just Gaussian white noise with standard deviation y, so by the Glivenko-Cantelli theorem, the empirical CDF of X converges almost surely on the CDF of that Gaussian. The marginal distribution of Xt for each t is however a mixture of Gaussians of different standard deviations, and not a Gaussian. Conditionally on Y, therefore, the empirical CDF converges to the marginal distribution of the stationary process with probability 0. Since this convergence has conditional probability zero for every value of y, it has probability zero unconditionally as well. So Vienneau's process very definitely fails to be ergodic.

(Proof of the unconditionality claim: Let C be the indicator variable for the empirical CDF converging to the marginal distribution.
\[ 
\mathbf{E}\left[C|Y=y\right] = 0 
 \]
for all y, but
\[ 
\mathbf{E}\left[C\right] = \mathbf{E}\left[\mathbf{E}\left[C|Y=y\right]\right] 
 \]
by the law of total expectation.)

Two things, however, are worth noticing. First, Vienneau's X process is a mixture of ergodic processes; second, which mixture component is sampled from is set once, at the beginning, and thereafter each sample path looks like a perfectly well-behaved realization of an ergodic process. These observations generalize. The ergodic decomposition theorem (versions of which go back as far as von Neumann's original work on ergodic theory) states that every stationary process is a mixture of processes which are both stationary and ergodic. Moreover, which ergodic component a sample path is in is an invariant of the motion — there is no mixing of ergodic processes within a realization. It's worth taking a moment, perhaps, to hand-wave about this.

Start with the actual definition of ergodic processes. Ergodicity is a property of the probability distribution for whole infinite sequences X = (X1, X2, ... Xt, ... ). As time advances, the dynamics chop off the initial parts of this sequence of random variables. Some sets of sequences are invariant under such "shifts" — constant sequences, for instance, but also many other more complicated sets. A stochastic process is ergodic when all invariant sets either have probability zero or probability one. What this means is that (almost) all trajectories generated by an ergodic process belong to a single invariant set, and they all wander from every part of that set to every other part — they are "metrically transitive". (Because: no smaller set with any probability is invariant.) From this follows Birkhoff's individual ergodic theorem, which is the basic strong law of large numbers for dependent data. If X is an ergodic process, then for any (integrable) function f, the average of f(Xt) along a sample path, the "time average" of f, converges to a unique value almost surely. So with probability 1, time averages converge to values characteristic of the ergodic process.

Now go beyond a single ergodic probability distribution. Two distributions are called "mutually singular" if one of them gives probability 1 to an event which has probability zero according to the other, and vice versa. Any two ergodic processes are either identical or mutually singular. To see this, realize that two distributions must give different expectation values to at least one function; otherwise they're the same distribution. Pick such a distinguishing function and call it f, with expectation values f1 and f2 under the two distributions. Well, the set of sample paths where
\[ 
\frac{1}{n}\sum_{t=1}^{n}{f(X_t)} \rightarrow f_1 
 \]
has probability 1 under the first measure, and probability 0 under the second. Likewise, under the second measure the time average is almost certain to converge on f2, which almost never happens under the first measure. So any two ergodic measures are mutually singular.

This means that a mixture of two (or more) ergodic processes cannot, itself, be ergodic. But a mixture of stationary processes is stationary. So the stationary ergodic processes are "extremal points" in the set of all stationary processes. The convex hull of these extremal points are the set of stationary but non-ergodic processes which can be obtained by mixing stationary and ergodic processes. It is less trivial to show that every stationary process belongs to this family, that it is a mixture of stationary and ergodic processes, but this can indeed be done. (See, for instance, this beautiful paper by Dynkin.) Part of the proof shows that which ergodic component a stationary process's sample path is in does not change over time — ergodic components are themselves invariant sets of trajectories. The general form of Birkhoff's theorem thus has time averages converging to a random limit, which depends on the ergodic component the process started in. This can be shown even at the advanced undergraduate level, as in Grimmett and Stirzaker.

At this point, three notes seem in order.

  1. Many statisticians will be more familiar with a special case of the ergodic decomposition, which is de Finetti's result about how infinite exchangeable random sequences are mixtures of independent and identically-distributed random sequences. The ergodic decomposition is like that, only much cooler, and not tainted by the name of a Fascist. (That said, de Finetti's theorem actually covers Vienneau's example.)
  2. Following tradition, I have stated the ergodic decomposition above for stationary processes. However, it is very important that this limitation is not essential. The broadest class of processes I know of for which an ergodic decomposition holds are the "asymptotically mean-stationary processes". The defining property of such processes is that their probability laws converge in Cesaro mean. In symbols, and writing Pt for the law of the process from t onwards, we must have
    \[ 
\lim_{n\rightarrow\infty}{\frac{1}{n}\sum_{t=1}^{n}{P_t(A)}} = P(A) 
 \]
    for some limiting law P. (I learned to appreciate the importance of AMS processes from Robert Gray's Probability, Random Processes and Ergodic Properties, and stole those ideas shamelessly for Almost None.) This allows for cyclic variation in the process, for asymptotic approach to a stationary distribution, for asymptotic approach to a cyclically varying process, etc. Every AMS process is a mixture of ergodic AMS processes, in exactly the way that every stationary process is a mixture of ergodic stationary processes.

    I actually don't know whether the ergodic decomposition can extend beyond this, but I suspect not, since the defining condition for AMS is very close to a Cesaro-mean decay-of-dependence property which turns out to be equivalent to ergodicity, namely that, for any two sets A and B
    \[ 
\lim_{n\rightarrow\infty}{\frac{1}{n}\sum_{t=0}^{n-1}{P_1(A \cap T^{-t} B)}} = P_1(A) P(B) 
 \]
    where T-t are the powers of the back-shift operator (what time series econometricians usually write L), so that T-tB are all the trajectories which will be in the set B in t time-steps. (See Lemma 6.7.4 in the first, online, edition, of Gray, p. 148). This means that, on average, the far future becomes unpredictable from the present.

  3. In light of the previous note, if dynamical systems people want to read "basin of attraction" for "ergodic component", and "natural invariant measure on the attractor" for "limit measure of an AMS ergodic process", they will not go far wrong.

As the last remark suggests, it is entirely possible for a process to be stationary and ergodic but to have sensitive dependence on initial conditions; this is generally the case for chaotic processes, which is why there are classic articles with titles like "The Ergodic Theory of Chaos and Strange Attractors". Chaotic systems rapidly amplify small perturbations, at least along certain directions, so they are subject to positive destabilizing feedbacks, but they have stable long-run statistical properties.

Going further, consider the sort of self-reinforcing urn processes which Brian Arthur and collaborators made famous as models of lock-in and path dependence. (Actually, in the classification of my old boss Scott Page, these models are merely state-dependent, and do not rise to the level of path dependence, or even of phat dependence, but that's another story.) These are non-stationary, but it is easily checked that, so long as the asymptotic response function has only a finite number of stable fixed points, they satisfy the definition of asymptotic mean stationarity given above. (I leave it as an exercise whether this remains true in a case like the original Polya urn model.) Hence they are mixtures of ergodic processes. Moreover, if we have only a single realization — a unique historical trajectory — then we have something which looks just like a sample path of an ergodic process, because it is one. ("[L]imiting sample averages will behave as if they were in fact produced by a stationary and ergodic system" — Gray, p. 235 of 2nd edition.) That this was just one component of a larger, non-ergodic model limits our ability to extrapolate to other components, unless we make strong modeling assumptions about how the components relate to each other, but so what?

I make a fuss about this because the post-Keynesians seem to have fallen into a number of definite errors here. (One may see these errors in e.g., Crotty's "Are Keynesian Uncertainty and Macrotheory Compatible?" [PDF], which however also has insightful things to say about conventions and institutions as devices for managing uncertainty.) It is not true that non-stationarity is a sufficient condition for non-ergodicity; nor is it a necessary one. It is not true that "positive destabilizing feedback" implies non-ergodicity. It is not true that ergodicity is incompatible with sensitive dependence on initial conditions. It is not true that ergodicity rules out path-dependence, at least not the canonical form of it exhibited by Arthur's models.

Enigmas of Chance; The Dismal Science

Posted by crshalizi at September 01, 2010 11:50 | permanent link

Power Law Swag

The admirable Mason Porter, responding to a universal and critical demand, has started the Power Law Shop, celebrating my very favorite class of probability distributions in all the world. This is certainly the funniest thing to come out of the SAMSI complex networks workshop.

Power Laws; Learned Folly

Posted by crshalizi at September 01, 2010 10:10 | permanent link

War Against the Bookmarks

Attention conservation notice: Clearing out my to-blog folder, limiting myself to stuff which isn't too technical and/or depressing.

The late Charles Tilly was, it appears, working on a world history of cities, states and trust networks when he died. The first chapter is online (open access), and makes me really regret that we'll never see the rest. It includes a truly marvelous depiction of the rise of the Mongol Empire, from Marco Polo:

Some time after the migration of the Tartars to [Karakorum], and about the year of our lord 1162, they proceeded to elect for their king a man who was named Chingis-khan, one of approved integrity, great wisdom, commanding eloquence, and eminent for his valour. He began his reign with so much justice and moderation, that he was beloved and revered as their deity rather than their sovereign; and the fame of his great and good qualities spreading over that part of the world, all the Tartars, however dispersed, placed themselves under his command. Finding himself thus at the head of so many brave men, he became ambitious of emerging from the deserts and wildernesses by which he was surrounded, and gave them orders to equip themselves with bows, and such other weapons as they were expert at using, from the habits of their pastoral life. He then proceeded to render himself master of cities and provinces; and such was the effect produced by his character for justice and other virtues, that wherever he went, he found the people disposed to submit to him, and to esteem themselves happy when admitted to his protection and favour.

John Emerson has a slightly different explanation: the culmination of a thousand years of increasingly sophisticated military rivalry in central Eurasia.

My hypothesis is that, for the last several decades during the twelfth century, northern China, Karakitai, the Silk Road between them, and the Mongolian and Manchurian hinterlands served as a pressure cooker or laboratory where strategy, tactics, and military organization were perfected during a period of constant warfare. The Jin Chinese fought against the Song Chinese and sometimes the Xixia or the Mongols, the Xixia fought against the Jin and the Mongols, the Mongols fought with the other two and with each other, and because they were busy with one another they put little pressure on the Karakitai farther west, who were able to concentrate on maintaining their hegemony in Central Asia.

The states in this zone (and the non-state Mongols) hardened up and improved their discipline, organization and skills during decades of practice wars, so that when Genghis Khan finally united the steppe, subjugated the Xixia, and neutralized the Jin (in part because Jin forces had been deserting to the Mongols), he had essentially won the military championship of the toughest league in the world, so that every army he met from then until the Mamluks in Egypt would be far inferior to his. When Genghis Khan gained control of this military high pressure zone, there was no one who could stop him. Furthermore, once Genghis Khan controlled a plurality of the steppe, there was a snowball effect when most of the remaining steppe peoples not allied to his enemies joined him (semi-voluntarily — the alternative was destruction).

Also from Emerson, a selection of Byzantine anecdotes. They really don't make political slanders like they used to, despite some people's best efforts.

Rajiv Sethi ponders The Astonishing Voice of Albert Hirschman; Steve Laniel reviews Exit, Voice, and Loyalty. As an application, consider the plight of would-be refugees from Facebook.

John Dewey writing on economics, economic policy and the financial collapse in 1932, under the rubric of "The Collapse of a Romance" (cached copy). Here Dewey sounds almost Austrian on the connection between uncertainty and the capitalist process — and accordingly condemns the latter as sheer gambling. (Cf.) This line was particularly nice: "Human imagination had never before conceived anything so fantastic as the idea that every individual is actuated in all his desires by an insight into just what is good for him, and that he is equipped with the sure foresight which will enable him to calculate ahead and get just what he is after."

Relatedly, my friend Chris Wiggins observed struggling to save at-risk youth.

Ken MacLeod on Apophatic atheology.

Fifteenth Century Peasant Romance Comics. (Hark, a Vagrant is generally a treasure.)

Ta-Nehisi Coates schools the Freakonomics crowd in the concept of "sample selection bias".

Kalashnikov wanted to be a poet; but war was interested in him.

"Genji, you skank!"

A visual history of lolcats since the 1800s.

Jordan Ellenberg on math in the age of Romanticism.

Becoming death, destroyer of mosquito worlds. How termites evolved from cockroach-like insects. (Do not read while eating.)

"This is why I'll never be an adult" is scarily perceptive --- "Internet FOREVER!", indeed (via unfogged). While on the subject of moral psychology, how to keep someone with you forever (via Edge of the American West).

Cool data-mining tricks for academic libraries. Via Magistra et Mater, seen elsewhere connected Carolingian texts and social media.

Canadian engineers are much stranger than you'd think.

Oleg Grabar on the history of images of Muhammad in Islamicate culture (via Laila Lalami).

Akhond of Swat on "Ideas of India" and The Reading Life of Gandhi, Ambedkar and Nehru.

Southern literature, objectively defined and measured by Jerry Leath Mills:

My survey of around thirty prominent twentieth-century southern authors has led me to conclude, without fear of refutation, that there is indeed a single, simple, litmus-like test for the quality of southernness in literature, one easily formulated into a question to be asked of any literary text and whose answer may be taken as definitive, delimiting, and final. The test is: Is there a dead mule in it? As we shall see, the presence of one or more specimens of Equus caballus x asinus (defunctus) constitutes the truly catalytic element, the straw that stirs the strong and heady julep of literary tradition in the American South.

Jessa Crispin on the pleasures of reading about polar travel, while nowhere near the poles.

"Having a world unfold in one's head is the fundamental SF experience." (Pretty much everything Jo Walton writes is worth reading.)

Bruce Sterling on zombie romance: "Paranormal Romance is a tremendous, bosom-heaving, Harry-Potter-sized, Twilight-shaped commercial success. It sorta says everything about modern gender relations that the men have to be supernatural. It also says everything about humanity that we're so methodically training ourselves to be intimate partners of entities that aren't human."

The Demon-haunted world, or, the past and future of practical city magic.

Linkage; Writing for Antiquity; The Commonwealth of Letters; Afghanistan and Central Asia; Scientifiction and Fantastica

Posted by crshalizi at September 01, 2010 09:50 | permanent link

August 31, 2010

Books to Read While the Algae Grow in Your Fur, August 2010

John A. Hall, Ernest Gellner: An Intellectual Biography
I've said my piece about Gellner himself elsewhere, and I'd just be repeating myself here if I went into that. Hall's book, appropriately for the intellectual biography of a major thinker, mixes relating the story of Gellner's life with an exposition and (fair) criticism of his ideas, more or less in chronological order. The tone is serious, the research into the historical academic backgrounds of various phases of Gellner's life and thought obviously thorough, but the prose is quite readable --- though Hall wisely doesn't even try to match Gellner's style.
(The biggest surprises for me were learning about Gellner's osteoporosis, and the photos of handsome he was as a young man, but then I have been a Gellnerian since a chance encounter with The Psychoanalytic Movement led me to spend the summer of '97 reading my way through all his books.)
See also Scott McLemee; thanks to Henry Farrell for letting me know about this book.
Julia Spencer-Fleming, I Shall Not Want
Laurence Gough, Killers
Partha Dasgupta, Economics: A Very Short Introduction
I wanted to like this a lot, and I can see that it does have some very nice features. It emphasizes that economics is about actual processes of production, distribution and exchange, not about abstract optimization theory, and that the goal is to improve the human condition, especially that of the most destitute. It makes clear that markets are one of many different economic institutions, which have important virtues in many circumstances, but aren't the end-all and be-all of the subject. It puts a lot of emphasis on environmental issues (Dasgupta's professional specialty), and is appropriately skeptical of fellow economists pushing "efficiency" as an end in itself. It is clear and (mostly) correct. Someone who doesn't know any economics would in fact learn a lot from it, and be better prepared both to understand economists and to learn more.
I didn't even dislike reading it. I just didn't like doing so at all, and I'm not sure why --- some incompatiblity of style, or over-familiarity with the subject on my part, maybe. It's probably worth your checking out, if a brief primer on modern economics sounds interesting.
(The one error I noticed: pace what Dasgupta says on p. 78, Oskar Lange was not a "market socialist" because he argued that an ideal central planner, with perfect information, could be as efficient as an idealized market. [If anything, that is an obvious truth about the neo-classical set-up.] Rather, as is easily checked from Lange's papers [I, II], he wanted the socialist economy to actually use markets (and plus a procedure which is basically a sped-up simulation of a Walrasian market), precisely in order to overcome critics like Hayek and von Mises. ["A statue of Professor Mises ought to occupy an honourable place in the great hall of the Ministry of Socialisation or of the Central Planning Board of the socialist state."] In fact, I think it is fair to say that Hayek's two papers on economics and knowledge, while permanent contributions to social science, do not adequately deal with the market socialist idea. But I have written too much about this elsewhere, including the real issues with Lange's proposal, and this is a mere page in Dasgupta's book.)
Shamini Flint, Inspector Singh Investigates: A Most Peculiar Malaysian Murder
Mind-candy. A well-constructed mystery, nice writing, and an adorable detective. Apparently there are at least two other books in the series, published abroad.
Pixu
Collaborative graphic novel about a haunted apartment house. Goes beyond "creepy" into "disturbing".
Don Marquis, Archy and Mehitabel
"There's life in the old dame yet."
Karl Sigmund, The Calculus of Selfishness
I'm reviewing this for American Scientist, so I won't say much right now. To preview: really good, though I am a little dubious about the specific model for public goods games. (It looks like a lot turns on the private utility of the public good going down proportional to the population size, which is not true for many public goods, such as light-houses, sanitation, policing, etc. Also, it's assumed that abstaining from participation is free, but providing, say, a private substitute for the police would be very, very expensive.)
Lauren Willig, The Secret History of the Pink Carnation and The Masque of the Black Tulip
Mind-candy. Fun if you are willing to accept them on their own terms: bucklers are there to be swashed, bodices are there to be ripped, dungeons are there to be escaped from, and graduate students are there to uncover, well, secret histories. (One of these, admittedly, is not like the others.) Query presupposing a mild spoiler for Pink Carnation: Fvapr gur Checyr Tragvna'f vqragvgl orpnzr choyvp, naq Rybvfr jnf fghqlvat uvz vagrafryl, fubhyqa'g fur unir vzzrqvngryl erpbtavmrq gur znvqra anzr bs uvf jvsr, naq xarj jurer gung fgbel jnf urnqrq?
Josh Bazell, Beat the Reaper
Mind-candy; hilarious and griping crime/medical novel; I read it in one sitting. The climax was a truly remarkable instance of the gun casually set on the mantlepiece in Act I going off at the end.
Hope Larson, Gray Horses
Charming little fable about adventures in the dreamlands, coming of age, and Chicago Onion City.

Books to Read While the Algae Grow in Your Fur; The Pleasures of Detection; Commit a Social Science; The Dismal Science; Biology; Mathematics; Scientifiction and Fantastica

Posted by crshalizi at August 31, 2010 23:59 | permanent link

Annual Call to the Adobe Tower

Once again, the Santa Fe Institute is hiring post-docs. Once again, for sheer concentrated intellectual stimulation — not to mention views like this from your office window — there is no better position for an independent-minded young scientist with interdisciplinary interests. The official announcement follows:

The Omidyar Postdoctoral Fellowship at the Santa Fe Institute offers you:
  • unparalleled intellectual freedom
  • transdisciplinary collaboration with leading researchers worldwide
  • up to three years in residence in Santa Fe, NM
  • discretionary research and collaboration funds
  • individualized mentorship and preparation for your next leadership role
  • an intimate, creative work environment with an expansive sky
The Omidyar Fellowship at the Santa Fe Institute is unique among postdoctoral appointments. The Institute has no formal programs or departments. Research is collaborative and spans the physical, natural, and social sciences. Most research is theoretical and/or computational in nature, although it may include an empirical component. SFI typically has 15 Omidyar Fellows and postdoctoral researchers, 15 resident faculty, 95 external faculty, and 250 visitors per year. Descriptions of the research themes and interests of the faculty and current Fellows can be found at http://www.santafe.edu/research. Requirements:
  • a Ph.D. in any discipline (or expect to receive one by September 2011)
  • an exemplary academic record
  • a proven ability to work independently and collaboratively
  • a demonstrated interest in multidisciplinary research
  • evidence of the ability to think outside traditional paradigms
Applications are welcome from:
  • candidates from any country
  • candidates from any discipline
  • women and minorities, as they are especially encouraged to apply.
The Santa Fe Institute is an Equal Opportunity Employer.

Deadline: 1 November 2010
To apply: www.santafe.edu We accept online applications ONLY.
Inquiries: email to ofellowshipinfo at santafe dot edu

The Santa Fe Institute is a private, independent, multidisciplinary research and education center founded in 1984. Since its founding, SFI has devoted itself to creating a new kind of scientific research community, pursuing emerging synthesis in science. Operating as a visiting institution, SFI seeks to catalyze new collaborative, multidisciplinary research; to break down the barriers between the traditional disciplines; to spread its ideas and methodologies to other institutions; and to encourage the practical application of its results.

The Omidyar Fellowship at the Santa Fe Institute is made possible by a generous gift from Pam and Pierre Omidyar.

Complexity

Posted by crshalizi at August 31, 2010 20:00 | permanent link

August 26, 2010

36-757, Advanced Data Analysis: Teaching Handouts (Fall 2010)

The students are just starting on their projects, so, rather than say anything of substance, I try to extract the rational kernel from the traditional shell of the practices by which our cultural formation strives to reproduce itself. (Background.)

Syllabus and Orientation to the Course
Or, what the statistics department hopes to achieve by making you spend the academic year analyzing real data.
Some Advice on Process
Or, riding the big hairy research project.

Corrupting the Young; Enigmas of Chance

Posted by crshalizi at August 26, 2010 12:08 | permanent link

August 19, 2010

Overcoming the Binary (Next Week at the Statistics Seminar)

Every human relationship is a unique and precious snowflake, but do we treat them that way when we model them mathematically? No. No we do not. Join us next week to hear not just why this is wrong, but what to do instead. As always, the seminar is free and open to the public.

Joe Blitzstein, "Strengths of Ties in Network Modeling and Network Sampling"
Abstract: Measuring and modeling the strengths of ties in a social network has a long history, and an even longer history of being ignored. How much does it matter for inference if the strengths are discarded? Dichotomizing a network may seem to be an appealing simplification, but we show that it comes at a heavy cost, through quantifying the information loss. Closely related issues arise in respondent-driven sampling, a popular method for surveying ``hidden'' populations. We suggest ways to incorporate strength of tie information in this setting, comparing design-based and model-based estimation approaches in the context of an AIDS study.
Based on joint works with Sergiy Nesterko and Andrew Thomas.
Time and place: Monday, Aug. 23, 2010, 4:00--5:00 PM in Doherty Hall A310

Let me add that Prof. Blitzstein will be visiting us from the Bible college of a prophecy-obsessed, theocratic Puritan cult clinging to the rudiments of civilization in a plague-blasted post-apocalyptic wasteland*, so I expect a good turn out to show him how we do these things around here.

*: No, really.

Manual trackback (!): The Inverse Square

Enigmas of Chance; Networks

Posted by crshalizi at August 19, 2010 16:30 | permanent link

August 17, 2010

Fall 2010 Classes: 36-757 and 36-835

I will not be teaching data mining this fall; 36-350 is being taken over this year by my friend and mentor Chris Genovese. Instead, I will be teaching 36-757 (if you'd be interested, you're already in it*), and co-teaching 36-835 with Rob Kass. Here's the announcement for the latter:

36-835 Seminar on Statistical Modeling
First meeting: Tuesday, 24 August, 1:30 pm in Porter Hall A20A (organizational)
This course will be a weekly journal club on the principles and practice of statistical modeling, organized through the careful reading and group discussion of important recent papers. Readings will be selected by the class from sources such as JASA or Annals of Applied Statistics. Discussion will emphasize the relationship between scientific questions and statistical methods. Each week students will be required to post, in an online discussion group, one cogent question or comment about the reading, and will be required to participate in the discussion. Each student will also be responsible for leading at least one class discussion. The course is intended for graduate students in Statistics or Machine Learning. Others are welcome.

If there's interest, I'll post the reading list. Our first paper will definitely be Breiman's "Statistical Modeling: The Two Cultures" (Statistical Science 16 (2001): 199--231).

Update, 26 August: handouts for 757, which may be of broader interest.

*: This is the first half of "advanced data analysis", a year-long project our doctoral students do on analyzing data provided by an outside investigator, under the supervision of a faculty member. ADA culminates in the student presenting their findings in written and oral form, which serves as one of their three qualifying exams. The goal is to solve genuine scientific questions, not (or not just) to use the most shiny methodological toys. If you have some real-world data which need to be analyzed, and which seem like they might benefit from the attention of a very smart statistics graduate student, please get in touch. (I promise nothing.)

Corrupting the Young; Enigmas of Chance; Self-centered

Posted by crshalizi at August 17, 2010 14:57 | permanent link

July 31, 2010

Books to Read While the Algae Grow in Your Fur, July 2010

Stephen L. Morgan and Christopher Winship, Counterfactuals and Causal Inference: Methods and Principles for Social Research
The first reasonable introductory textbook on modern approaches to causal inference I have seen. (Books like Causation, Prediction and Search, or Pearl's Causality, are not suitable as textbooks.) It alternates between talking about counterfactual random variables and using graphical models (being clear that the latter have at least as much expressive power as the former). After the introduction, which gives a very nice tour of the rest of the book, the first few chapters cover a simple example of the kind of effect estimation we want to do; how to use conditioning and Pearl's back-door condition to control for other variables; matching methods and propensity scores; and regression and why it is problematic. They then turn to methods which might be applicable when adequate conditioning is not, like instrumental variables (about which, soundly, they are very dubious), Pearl's front-door criterion, and longitudinal and regression-discontinuity designs. (Their discussion of the front-door criterion draws very interesting links to the literature on explanation-by-mechanisms, as in Elster, Tilly, or indeed DeLanda, which I need to think about more.) Manski's partial identification approach also gets looked at. The last chapter is a sort of victory lap.
The implied reader of this book is a social scientist who likes quantitative data but is not very interested, and perhaps not very comfortable with, mathematical data; everything has been brought to the level where it can be followed, with a little work, by someone who remembers how to do ordinary least squares regression, but is fuzzy about why XTX controls the standard errors of the coefficient estimates*. Readers who know more statistical theory but not causal inference can, I think, just skip the worked numerical examples, and generally go faster through the book, but will still learn a lot, and not have to unlearn any of it later. (At no point did I notice any lies-told-to-children.) Non-social-scientists interested in what can be said about causal relationships from observational, non-experimental data will also find it useful.
Disclaimer: Winship is an editor at a journal where I have a paper under review.
*: Because it's the generalization of the sum-of-squares for the independent variable in univariate regression; and the more points you have from a line, and the more widely spaced they are, the better you know the slope of the line.
W. W. Tarn, The Greeks in Bactria and India
Teeth-grindingly Eurocentric, and erects massive conjectures on what seem to me to be the most flimsy evidential foundations (e.g., those Seleucid princesses!), but did a monumental job in surveying the evidence from Greek literature about the Hellenistic presence in what is now Central Asia, Afghanistan, Pakistan and India; and also the coins, as they were known in the 1930s. (He tries to bring in Indian and Chinese literature as well, but doesn't know the languages and is self-conscious about relying on translations.) Someone should really try integrating this with what we now know from archaeology; maybe they have.
Laura E. Reeve, Pathfinder
Mind-candy. Previously: 1, 2.
Sarah Vowell, The Wordy Shipmates
Otto J. Maenchen-Helfen, The World of the Huns: Studies in Their History and Culture
How on Earth do we know that any of these archaeological finds belong to Huns?
Laurence Gough, The Goldfish Bowl
First in the series. As hard-boiled as possible, under the circumstances.
Sauna
Creepy and moody historical horror movie. Not sure if some parts of it would be less weird if I were Finnish.

Books to Read While the Algae Grow in Your Fur; Writing for Antiquity; Afghanistan and Central Asia; Scientifiction and Fantastica; The Pleasures of Detection; The Beloved Republic; Enigmas of Chance

Posted by crshalizi at July 31, 2010 23:59 | permanent link

July 26, 2010

"Generalization Error Bounds for State Space Models: With an Application to Economic Forecasting"

Attention conservation notice: 500 words on a student's thesis proposal, combining all the thrills of macroeconomic forecasting with the stylish vivacity of statistical learning theory. Even if you care, why not check back in a few years when the work is further along?

Daniel McDonald is writing his thesis, under the joint supervision of Mark Schervish and myself. I can use the present participle, because on Thursday he successfully defended his proposal:

"Generalization Error Bounds for State Space Models: With an Application to Economic Forecasting" [PDF]
Abstract: In this thesis, I propose to derive entirely data dependent generalization error bounds for state space models. These results can characterize the out-of-sample accuracy of many types of forecasting methods. The bounds currently available for time series data rely both on a quantity describing the dependence properties of the data generating process known as the mixing rate and on a quantification of the complexity of the model space. I will derive methods for estimating the mixing behavior from data and characterize the complexity of state space models. The resulting risk bounds will be useful for empirical researchers at the forefront of economic forecasting as well as for economic policy makers. The bounds can also be applied in other situations where state space models are employed.

Some of you may prefer the slides (note that Daniel is using DeLong's reduction of DSGEs to D2 normal form), or an even more compact visual summary:

Most macroeconomic forecasting models are, or can be turned into, "state-space models". There's an underlying state variable or variables, which evolves according to a nice Markov process, and then what we actually measure is a noisy function of the state; given the current state, future states and current observations are independent. (Some people like to draw a distinction between "state-space models" and "hidden Markov models", but I've never seen why.) The calculations can be hairy, especially once you allow for nonlinearities, but one can show that, asymptotically, maximum likelihood estimation, as well as various regularizations, have all the nice asymptotic properties one could want.

Asymptotic statistical theory is, of course, useless for macroeconomics. Or rather: if our methods weren't consistent even with infinite data, we'd know we should just give up. But if the methods only begin to give usably precise answers when the number of data points gets over 1024, we should give up too. Knowing that things could work with infinite data doesn't help when we really have 252 data points, and serial dependence shrinks the effective sample size to about 12 or 15. The wonderful thing about modern statistical learning theory is that it gives non-asymptotic results, especially risk bounds that hold at finite sample sizes. This is, of course, the reason why ergodic theorems, and the correlation time of US GDP growth rates, have been on my mind recently. In particular, this is why we are thinking about ergodic theorems which give not just finite-sample bounds (like the toy theorem I posted about), but can be made to do so uniformly over whole classes of functions, e.g., the loss functions of different macro forecasting models and their parameterizations.

Anyone wanting to know how to deal with non-stationarity is reminded that Daniel is proposing a dissertation in statistics, and not a solution to the problem of induction.

Enigmas of Chance; The Dismal Science; Incestuous Amplification

Posted by crshalizi at July 26, 2010 15:30 | permanent link

Social Carbon Banditry (Dept. of Modest Proposals for Keeping Civilization from Suffocating In Its Own Waste)

Attention conservation notice: A consideration of social banditry as a tool of climate-change policy. Sadly, this mockery apparently has about as much chance of actually helping as does action by the world's leading democracy.

Only on Unfogged would the comments on a post about visual penis jokes turn to a discussion of what, if anything, civil disobedience could do about climate change; but they did.

One of the goals of classic civil disobedience is to make maintaining an unjust institution costly, though I'm not sure how often it is put in these terms. Ordinarily, those who are disadvantaged or subordinated by a prevailing institution go along with it, they follow its norms and conventions without having to be forced. — whether because they accept those norms, or because they reasonably fear the retaliation that would come if they flouted them makes little difference. This makes maintaining the injustice a good deal for the oppressors: not only do they get the immediate benefits of the institution, they don't have to expend a lot of effort maintaining it. Mass civil disobedience disrupts this state of affairs. Even if the oppressors can live with the evidence of seeing that they are, in fact, the kind of people who will engage in brutality to retain their privileges, the time policemen spend working over Sunday-school teachers, etc., is time they do not spend patrolling the streets, catching burglars, etc. Mass civil disobedience, especially if prolonged, raises the cost of perpetuating injustice. The implicit challenge to Pharaoh is: "Are you really willing to pay what it takes to keep us in bondage?"

What does this suggest when it comes to climate change? Burning fossil fuels is not an act with any intrinsic moral significance. The trouble with it is that my burning those fuels inflicts costs on everyone else, and there is no mechanism, yet, for bringing those costs home to me, the burner. The issue is not one of unjust institutions, but of an unpriced externality. The corresponding direct action, therefore, is not making oppressors actually enforce their institutions, but internalizing the externality. I envisage people descending on oil refineries, coal mines, etc., and forcing the operators to hand over sums proportional to the greenhouse-gas contribution of their sales. What happened to the money afterwards would be a secondary consideration at best (though I wouldn't recommend setting it on fire). The situation calls not for civil disobedience but for social carbon banditry.

Of course, to really be effective, the banditry would need to be persistent, universal, and uniform. Which is to say, the banditry has to become a form of government again, if not necessarily a part of the state.

Modest Proposals; The Dismal Science; The Continuing Crises

Posted by crshalizi at July 26, 2010 14:30 | permanent link

July 09, 2010

"Inferring Hierarchical Structure in Networks and Predicting Missing Links" (Next Week at the [Special Summer Bonus] Statistics Seminar)

Attention conservation notice: Only of interest if you are (1) in Pittsburgh next Tuesday, and (2) care about statistical network modeling and community discovery. Also, the guest is a friend, collaborator and mentor; but, despite his undiscriminating taste in acquaintances, an excellent speaker and scientist.

Usually, during the summer the CMU statistics seminar finds a shaded corner and drowses through the heat, with no more activity than an occasional twitch of its tail. Next week, however, it rouses itself for an exceptional visitor:

Cristopher Moore, "Inferring Hierarchical Structure in Networks and Predicting Missing Links"
Abstract: Given the large amounts of data that are now becoming available on social and biological networks, we need automated tools to extract important structural features from this data. Moreoever, for many networks, observing their links is a costly and imperfect process — food webs require field work, protein networks require combining pairs of proteins in the laboratory, and so on. Based on the part of the network we have seen so far, we would like to make good guesses about what pairs of vertices are likely to be connected, so we can focus limited resources on those pairs.
I will present a Bayesian approach to this problem, where we try to infer the hierarchical structure of the network, with communities and subcommunities at multiple levels of organization. We start with a rich model of random networks of this type, and then use a Monte Carlo Markov Chain to explore the space of these models. This approach performs quite well on real networks, often outperforming simple heuristics such as assuming that two vertices with neighbors in common are likely to be connected. In particular, it can handle both "assortative" behavior like that seen in many social networks, and "disassortative" behavior as in food webs.
Joint work with Aaron Clauset and Mark Newman.
Place and time: Tuesday, July 13, 2010, 4:00--5:00 p.m. in Porter Hall 125B

As usual, the seminar is free and open to the public.

Networks; Enigmas of Chance; Incestuous Amplification

Posted by crshalizi at July 09, 2010 14:33 | permanent link

July 03, 2010

Variations on a Patriotic Theme

"They'd ask me, 'Raf, what abut this Revolution of yours? What kind of world are you really trying to give us?' I've had a long time to consider that question."

"And?"

"Did you ever hear the Jimi Hendrix Rendition of 'The Star-Spangled Banner'?"

Starlitz blinked. "Are you kidding? That cut still moves major product off the back catalog."

"Next time, really listen to that piece of music. Try to imagine a country where that music truly was the national anthem. Not weird, not far-out, not hip, not a parody, not a protest against some war, not for young Yankees stoned on some stupid farm in New York. Where music like that was social reality. That is how I want people to live...."

[Bruce Sterling, A Good Old-Fashioned Future, pp. 104--105]

"I wasn't born in America. In point of fact, I wasn't even born. But I work for our government because I believe in America. I happen to believe that this is a unique society. We have a unique role in the world."

Oscar whacked the lab table with an open hand. "We invented the future! We built it! And if they could design or market it a little better than we could, then we just invented something else more amazing yet. If it took imagination, we always had that. If it took enterprise, we always had it. If it took daring and even ruthlessness, we had it — we not only built the atomic bomb, we used it! We're not some crowd of pious, sniveling, red-green Europeans trying to make the world safe for boutiques! We're not some swarm of Confucian social engineers who would love to watch the masses chop cotton for the next two millennia! We are a nation of hands-on cosmic mechanics!"

"And yet we're broke," Greta said.

[Bruce Sterling, Distraction, p. 90]

The Beloved Republic

Posted by crshalizi at July 03, 2010 22:30 | permanent link

July 02, 2010

The World's Simplest Ergodic Theorem

Attention conservation notice: Equation-filled attempt at a teaching note on some theorems in mathematical probability and their statistical application. (Plus an oblique swipe at macroeconomists.)

The "law of large numbers" says that averages of measurements calculated over increasingly large random samples converge on the averages calculated over the whole probability distribution; since that's a vague statement, there are actually several laws of large numbers, from the various ways of making this precise. As traditionally stated, they assume that the measurements are all independent of each other. Successive observations from a dynamical system or stochastic process are generally dependent on each other, so the laws of large numbers don't, strictly, apply, but they have analogs, called "ergodic theorems". (Blame Boltzmann.) Laws of large numbers and ergodic theorems are the foundations of statistics; they say that sufficiently large samples are representative of the underlying process, and so let us generalize from training data to future or currently-unobserved occurrences.

Here is the simplest route I know to such a theorem; I can't remember if I learned it from Prof. A. V. Chubukov's statistical mechanics class, or from Uriel Frisch's marvellous Turbulence. Start with a sequence of random variables X1, X2, ... Xn. Assume that they all have the same (finite) mean m and the same (finite) variance v; also assume that the covariance, E[XtXt+h] - E[Xt] E[Xt+h], depends only on the difference in times h and not on the starting time t. (These assumptions together comprise "second-order" or "weak" or "wide-sense" stationarity. Stationarity is not actually needed for ergodic theorems, one can get away with what's called "asymptotic mean stationarity", but stationarity simplifies the presentation here.) Call this covariance ch. We contemplate the arithmetic mean of the first n values in X, called the "time average":

\[ 
A_n = \frac{1}{n}\sum_{t=1}^{n}{X_t} 
 \]

What is the expectation value of the time average? Taking expectations is a linear operator, so

\[ 
\mathbf{E}[A_n] = \frac{1}{n}\sum_{t=1}^{n}{\mathbf{E}[X_t]} = \frac{n}{n}m = m 
 \]
which is re-assuring: the expectation of the time average is the common expectation. What we need for an ergodic theorem is to show that as n grows, An tends, in some sense, to get closer and closer to its expectation value.

The most obvious sense we could try is for the variance of An to shrink as n grows. Let's work out that variance, remembering that for any random variable Y, Var[Y] = E[Y2] - (E[Y])2.


\begin{eqnarray*} 
\mathrm{Var}[A_n] & = & \mathbf{E}[A_n^2] - m^2\\ 
& = & \frac{1}{n^2}\mathbf{E}\left[{\left(\sum_{t=1}^{n}{X_t}\right)}^2\right] - m^2\\ 
& = & \frac{1}{n^2}\mathbf{E}\left[\sum_{t=1}^{n}{\sum_{s=1}^{n}{X_t X_s}}\right] - m^2\\ 
& = & \frac{1}{n^2}\sum_{t=1}^{n}{\sum_{s=1}^{n}{\mathbf{E}\left[X_t X_s\right]}} - m^2\\ 
& = & \frac{1}{n^2}\sum_{t=1}^{n}{\sum_{s=1}^{n}{ c_{s-t} + m^2}} - m^2\\ 
& = & \frac{1}{n^2}\sum_{t=1}^{n}{\sum_{s=1}^{n}{ c_{s-t}}}\\ 
& = & \frac{1}{n^2}\sum_{t=1}^{n}{\sum_{h=1-t}^{n-t}{ c_h}} 
\end{eqnarray*}

This used the linearity of expectations, and the definition of the covariances ch. Imagine that we write out all the covariances in an n*n matrix, and average them together; that's the variance of An. The entries on the diagonal of the matrix are all c0 = v, and the off-diagonal entries are symmetric, because (check this!) c-h = ch. So the sum over the whole matrix is the sum on the diagonal, plus twice the sum of what's above the diagonal.

\[ 
\mathrm{Var}[A_n] = \frac{v}{n} + \frac{2}{n^2}\sum_{t=1}^{n-1}{\sum_{h=1}^{n-t}{c_{h}}} 
 \]

If the Xt were uncorrelated, we'd have ch = 0 for all h > 0, so the variance of the time average would be O(n-1). Since independent random variables are necessarily uncorrelated (but not vice versa), we have just recovered a form of the law of large numbers for independent data. How can we make the remaining part, the sum over the upper triangle of the covariance matrix, go to zero as well?

We need to recognize that it won't automatically do so. The assumptions we've made so far are compatible with a process where X1 is chosen randomly, and then all subsequent observations are copies of it, so that then the variance of the time average is v, no matter how long the time series; this is the famous problem of checking a newspaper story by reading another copy of the same paper. (More formally, in this situation ch = v for all h, and you can check that plugging this in to the equations above gives v for variance of An for all n.) So if we want an ergodic theorem, we will have to impose some assumption on the covariances, one weaker than "they are all zero" but strong enough to exclude the sequence of identical copies.

Using two inequalities to put upper bounds on the variance of the time average suggests a natural and useful assumption which will give us our ergodic theorem.


\begin{eqnarray*} 
\sum_{t=1}^{n-1}{\sum_{h=1}^{n-t}{c_{h}}} & \leq & \sum_{t=1}^{n-1}{\sum_{h=1}^{n-t}{|c_h|}}\\ 
& \leq & \sum_{t=1}^{n-1}{\sum_{h=1}^{\infty}{|c_h|}} 
\end{eqnarray*}
Covariances can be negative, so we upper-bound the sum of the actual covariances by the sum of their magnitudes. (There is no approximation here if all covariances are positive.) Then we extend the inner sum so it covers all lags. This might of course be infinite, and would be for the sequence-of-identical-copies. Our assumption then is
\[ 
\sum_{h=1}^{\infty}{|c_h|} < \infty 
 \]
This is a sum of covariances over time, so let's write it in a way which reflects those units: $ \sum_{h=1}^{\infty}{|c_h|} = v T $ , where T is called the "(auto)covariance time", "integrated (auto)covariance time" or "(auto)correlation time". We are assuming a finite correlation time. (Exercise: Suppose that $ c_h = v e^{-h \tau} $ , as would be the case for a first-order linear autoregressive model, and find T. This confirms, by the way, that the assumption of finite correlation time can be satisfied by processes with non-zero correlations.)

Returning to the variance of the time average,


\begin{eqnarray*} 
\mathrm{Var}[A_n] & = & \frac{v}{n} + \frac{2}{n^2}\sum_{t=1}^{n-1}{\sum_{h=1}^{n-t}{c_{h}}}\\ 
& \leq & \frac{v}{n} + \frac{2}{n^2}\sum_{t=1}^{n-1}{v T}\\ 
& = & \frac{v}{n} + \frac{2(n-1) vT}{n^2}\\ 
& \leq & \frac{v}{n} + \frac{2 vT}{n}\\ 
& = & \frac{v}{n}(1+ 2T) 
\end{eqnarray*}
So, if we can assume the correlation time is finite, the variance of the time averages goes is O(n-1), just as if the data were independent. However, the convergence is slower than for independent data by an over-all factor which depends only on T. As T shrinks to zero, we recover the result for uncorrelated data, an indication that our approximations were not too crude.

From knowing the variance, we can get rather tight bounds on the probability of An's deviations from m if we assume that the fluctuations are Gaussian. Unfortunately, none of our assumptions so far entitle us to assume that. For independent data, we get Gaussian fluctuations of averages via the central limit theorem, and these results, too, can be extended to dependent data. But the assumptions needed for dependent central limit theorems are much stronger than merely a finite correlation time. What needs to happen, roughly speaking, is that if I take (nearly) arbitrary functions f and g, the correlation between f(Xt) and g(Xt+h) must go to zero as h grows. (This idea is quantified as "mixing" or "weak dependence".)

However, even without the Gaussian assumption, we can put some bounds on deviation probabilities by bounding the variance (as we have) and using Chebyshev's inequality:

\[ 
\mathrm{Pr}\left(|A_n - m| > \epsilon\right) \leq \frac{\mathrm{Var}[A_n]}{\epsilon^2} \leq \frac{v}{\epsilon^2} \frac{2T+1}{n} 
 \]
which goes to zero as n grows. So we have just proved convergence "in mean square" and "in probability" of time averages on their stationary expectation values, i.e., the mean square and weak ergodic theorems, under the assumptions that the data are weakly stationary and the correlation time is finite. There were a couple of steps in our argument where we used not very tight inequalities, and it turns out we can weaken the assumption of finite correlation time. The necessary and sufficient condition for the mean-square ergodic theorem turns out to be that, as one might hope,
\[ 
\lim_{n\rightarrow 0}{\frac{1}{n}\sum_{h=1}^{n}{c_h}} = 0 
 \]
though I don't know of any way of proving it rigorously without using Fourier analysis (which is linked to the autocovariance via the Wiener-Khinchin theorem; see chapters 19 and 21 of Almost None of the Theory of Stochastic Processes).

Reverting to the case of finite correlation time T, observe that we have the same variance from n dependent samples as we would from n/(1+2T) independent ones. One way to think of this is that the dependence shrinks the effective sample size by a factor of 2T+1. Another, which is related to the name "correlation time", is to imagine dividing the time series up into blocks of that length, i.e., a central point and its T neighbors in either direction, and use only the central points in our averages. Those are, in a sense, effectively uncorrelated. Non-trivial correlations extend about T time-steps in either direction. Knowing T can be very important in figuring out how much actual information is contained in your data set.

To give an illustration not entirely at random, quantitative macroeconomic modeling is usually based on official statistics, like GDP, which come out quarterly. For the US, which is the main but not exclusive focus of these efforts, the data effectively start in 1947, as what national income accounts exist before then are generally thought too noisy to use. Taking the GDP growth rate series from 1947 to the beginning of 2010, 252 quarters in all, de-trending, I calculate a correlation time of just over ten quarters. (This granting the economists their usual, but absurd, assumption that economic fluctuations are stationary.) So macroeconomic modelers effectively have 11 or 12 independent data points to argue over.

Constructively, this idea leads to the mathematical trick of "blocking". To extend a result about independent random sequences to dependent ones, divide the dependent sequence up into contiguous blocks, but with gaps between them, long enough that the blocks are nearly independent of each other. One then has the IID result for the blocks, plus a correction which depends on how much residual dependence remains despite the filler. Picking an appropriate combination of block length and spacing between blocks keeps the correction small, or at least controllable. This idea is used extensively in ergodic theory (including the simplest possible proof of the strong ergodic theorem) and information theory (see Almost None again), in proving convergence results for weakly dependent processes, in bootstrapping time series, and in statistical learning theory under dependence.

Manual trackback: An Ergodic Walk (fittingly enough); Thoughts on Economics

Update, 7 August: Fixed typos in equations.

Enigmas of Chance

Posted by crshalizi at July 02, 2010 13:40 | permanent link

July 01, 2010

Posed While the Algae Grow in Their Fur

The last post was really negative; to cleanse the palate, look at the Sloth Sanctuary of Costa Rica, dedicated to rescuing orphaned and imperiled sloths.

(Via Environmental Grafitti, via Matthew Berryman, and with thanks to John Emerson)

Linkage

Posted by crshalizi at July 01, 2010 14:40 | permanent link

June 30, 2010

Books to Read While the Algae Grow in Your Fur, June 2010

Yves Smith, Econned: How Unenlightened Self Interest Undermined Democracy and Corrupted Capitalism
I found this a bit of a frustrating read, actually, but I still recommend it overall. When it comes to the details how financial markets work, and for whom, and how that has changed over the years, it's very good. The criticisms of the economic profession are a mixed bag. On the moral point, that the economists have managed to secure a uniquely influential and privileged position among the social sciences (arguably among all the sciences), and have not risen to this by uniquely valuable and correct advice, or even by taking seriously and learning from their failures, she's correct. On the sheer insanity of a lot of neo-classical economics and its pretensions, especially as applied to finance, she is correct. Her most technical attacks fail, but I think those are not really needed for the arguments she wants to make. (More below.) When she discusses policy and the Obama Administration, there is something about her tone which I do not care for, though I think most of her actual positive suggestions are pretty good ideas. I suspect I would have liked this book more had I read less in the area beforehand.
(Smith complains about the neo-classicists reliance on assumptions of "ergodicity". But when she uses the term, she runs together (i) actual ergodicity, (ii) stationarity, (iii) homogeneity [as of a Markov process], (iv) [lack of] sensitive dependence on initial conditions, (v) the existence of a unique and rapidly attracting static equilibrium, (vi) [lack of] path dependence, (vii) [lack of] state dependence, (viii) [lack of] positive feedbacks, (ix) mixing or decay of correlations and (x) the existence of generating probability distribution, of which the actual historical trajectory of the economy is a realization. Those of us who work in the area have separated these concepts because they are in fact distinct, with complicated inter-relations, and if I take what she says about these matters literally it is a tissue of fallacies and equivocations. But Smith is merely being misled by her authorities, the so-called post-Keynesian political economists, who seem to have originated these errors. To repeat, I think these parts of the book could have been cut without any loss to the important messages.)
Amitav Ghosh, The Hungry Tide
About the tide country of Bengal; being an American innocent abroad; being an ineffectual left-wing Bengali intellectual; being a self-centered member of the modern Indian upper-middle class; being at the mercy of the elements. Also a very well-turned work of scientist-fiction. (I listened to the audio book while exercising; it was read well.)
Shirley Jackson, Novels and Stories
Shirley Jackson now has a Library of America edition, and I am well-pleased. Contents: The Lottery and Other Stories; The Haunting of Hill House; We Have Always Lived in the Castle; and some miscellaneous short stories. Those are almost certainly her two best novels — The Haunting of Hill House is flat-out one of the greatest pieces of fantastic literature ever — but, since I am greedy, I am a bit disappointed that they didn't fit in more of her novels (The Bird's Nest, say, or especially The Sundial). Still: Shirley Jackson now has a Library of America edition, and I am well-pleased.
Colin Martindale, The Clockwork Muse: The Predictability of Artistic Change
No purchase link because this is an anti-recommendation: life is short, ignore this. It's got some of the worst data analysis I have ever seen, and the argument rests entirely on those analyses. And yet people who know even less evidently take it seriously, perhaps because Martindale didn't realize he had no idea what he was doing and so presents his howlers as obviously correct. This book alone seems (if I can trust Google Scholar) to have over 200 citations. Why oh why can't we have a better republic of scholars?
(Thanks, of a sort, to Carlos Yu for finally getting me to read this.)
An elaboration of this snarl of contempt.

Books to Read While the Algae Grow in Your Fur; Scientifiction and Fantastica; Minds, Brains, and Neurons; The Continuing Crises Writing for Antiquity; The Dismal Science; The Commonwealth of Letters; Learned Folly

Posted by crshalizi at June 30, 2010 23:59 | permanent link

In which Dunning-Krueger meets Slutsky-Yule, and they make music together

Attention conservation notice: Over 2500 words on how a psychologist who claimed to revolutionize aesthetics and art history would have failed undergrad statistics. With graphs, equations, heavy sarcasm, and long quotations from works of intellectual history. Are there no poems you could be reading, no music you could be listening to?

I feel I should elaborate my dismissal of Martindale's The Clockwork Muse beyond a mere contemptuous snarl.

The core of Martindale's theory is this. Artists, and still more consumers of art, demand novelty; they don't just want the same old thing. (They have the same old thing.) Yet there is also a demand, or a requirement, to stay within the bounds of a style. Combining this with a notion that coming up with novel ideas and images requires "regressing" to "primordial" modes of thought, he concludes

Each artist or poet must regress further in search of usable combinations of ideas or images not already used by his or her predecessors. We should expect the increasing remoteness or strangeness of similes, metaphors, images, and so on to be accompanied by content reflecting the increasingly deeper regression toward primordial cognition required to produce them. Across the time a given style is in effect, we should expect works of art to have content that becomes increasingly more and more dreamlike, unrealistic, and bizarre.

Eventually, a turning point to this movement toward primordial thought during inspiration will be reached. At that time, increases in novelty would be more profitably attained by decreasing elaboration — by loosening the stylistic rules that govern the production of art works — than by attempts at deeper regression. This turning point corresponds to a major stylistic change. ... Thus, amount of primordial content should decline when stylistic change occurs. [pp. 61--64, his emphasis; the big gap corresponds to some pages of illustrations, and not me leaving out a lot of qualifying text]

Reference to actual work in cognitive science on creativity, both theoretical and experimental (see, e.g., Boden's review contemporary with Martindale's work), is conspicuously absent. But who knows, maybe his uncritical acceptance of these sub-Freudian notions has lead in some productive direction; let us judge them by their fruits.

Here is Martindale's Figure 9.1 (p. 288), supposedly showing the amount of "primordial content" in Beethoven's musical compositions from 1795 through 1826, or rather a two-year moving average of this.

Let us leave to one side the very difficult questions of how to measure "primordial content"; Martindale, like too many psychologists, is slave to quite confused ideas about "construct validity". The dots are the moving averages, the solid black line is a guide to the eye, and the dashed line is a parabola fit to the moving averages. In the main text, Martindale combines the parabolic trend with a second order autoregression, getting the fitted model (p. 289)
PCt = -1.59 + 0.23t - 0.01 t2 + 0.58 PCt-1 - 0.55 PCt-2
which, he says, has an R2 of 50%. Primordial content is supposed to go up as an artist (or artistic community) "works out the possibilities of a style", but go down with a switch to a new, fresh style. Martindale tries (p. 289) to match up his peaks and troughs with what the critics say about the development of Beethoven's style, and succeeds to his own satisfaction, at least "in broad outline".

Now, here is the figure which was, so help me, the second run of some R code I wrote.

Here, however, instead of having people try to figure out how much primordial content there was in Beethoven's music, I simply took Gaussian white noise, with mean zero and variance 1, with one random number per year, and treated that exactly the same way that Martindale did: two-year moving averages, a quadratic fit over time (displayed), and a quadratic-plus-AR(2) over-all model, which kept 45% of the variance. My final fitted model was
PCt = -0.61 + 0.15t - 0.004 t2 + 0.63 PCt-1 - 0.51 PCt-2
Was this a fluke? No. When I repeat this 1000 times, the median R2 is 43%, and 28% of the runs have an R2 greater than what Martindale got. His fit is no better than one would expect if his measurements are pure noise.

What is going on here? All of the apparent structure revealed in Martindale's analysis is actually coming from his having smoothed his data, from having taken the two-year moving average. Remarkably enough, he realized that this could lead to artifacts, but brushed the concern aside:

One has to be careful in dealing with smoothed data. The smoothing by its very nature introduces some autocorrelation because the score for one year is in part composed of the score for the prior year. However, autocorrelations introduced by smoothing are positive and decline regularly with increase lags. That is not at all what we find in the case of Beethoven — or in other cases where I have used smoothed data. The smoothing is not creating correlations where non existed; it is magnifying patterns already in the data. [p. 289]

What this passage reveals is that Martindale did not understand the difference between the autocorrelation function of a time series, and the coefficients of an autoregressive model fit to that time series. (Indeed I suspect he did not understand the difference between correlation and regression coefficients in general.) The autoregressive coefficients correspond, much more nearly, to the partial autocorrelation function, and the partial autocorrelations which result from applying a moving average to white noise have alternating signs — just like Martindale's do. In fact, the coefficients he got are entirely typical of what happens when his procedure is applied to white noise:


Small dots: Autoregressive coefficients from 1000 runs of Martindale's analysis applied to white noise. Large X: his estimated coefficients for Beethoven.

I could go on about what has gone wrong in just the four pages Martindale devotes to Beethoven's style, but I hope my point is made. I won't say that he makes every conceivable mistake in his analysis, because my experience as a teacher of statistics is that there are always more possible errors than you would ever have suspected. But I will say that the errors he's making — creating correlations by averaging, confusing regression and correlation coefficients, etc. — are the sort of things which get covered in the first few lessons of a good course on time series. The fact that averaging white noise produces serial correlations, and a particular pattern of autoregressive coefficients, is in particular famous as the Yule-Slutsky effect, after its two early-20th-century discoverers. (Slutsky, interestingly, appears to have thought of this as an actual explanation for many apparent cycles, particularly of macroeconomic fluctuations under capitalism, though how he proposed to reconcile this with Marx I don't know.) I am not exaggerating for polemical effect when I say that I would fail Martindale from any class I taught on data analysis; or that every single one of the undergraduate students who took 490 this spring has demonstrated more skill at applied statistics than he does in this book.

Martindale's book has about 200 citations in Google Scholar. (I haven't tried to sort out duplicates, citation variants, and self-citations.) Most of these do not appear to be "please don't confuse us with that rubbish" citations. Some of them are from intelligent scholars, like Bill Benzon, who, through no fault of their own, are unable to evaluate Martindale's statistics, and so take his competence on trust. (Similarly with Dutton, who I would not describe as an "intelligent scholar".) This trust has probably been amplified by Martindale's rhetorical projection of confidence in his statistical prowess. (Look at that quote above.) — Oh, let's not mince words here: Martindale fashions himself as someone bringing the gospel of quantitative science to the innumerate heathen of the humanities, complete with the expectation that they'll be too stupid to appreciate the gift. For many readers, those who project such intellectual arrogance are not just more intimidating but also more credible, though rationally, of course, they shouldn't be. (If you want to suggest that I exploit this myself, well, you'd have a point.)

Could there be something to the idea of an intrinsic style cycle, of the sort Martindale (like many others) advocates? I actually wouldn't be surprised if there were situations when some such mechanism (shorn of the unbearably silly psychoanalytic bits) applies. In fact, the idea of this mechanism is much older than Martindale. For example, here is a passage from Marshall G. S. Hodgson's The Venture of Islam, which I happen to have been re-reading recently:

After the death of [the critic] Ibn-Qutaybah [in 889], however, a certain systematizing of critical standards set in, especially among his disciples, the "school of Baghdad". ... Finally the doctrine of the pre-eminence of the older classics prevailed. So far as concerned poetry in the standard Mudâi Arabic, which was after all, not spoken, puristic literary standards were perhaps inevitable: an artificial medium called for artificial norms. That critics should impose some limits was necessary, given the definition of shi`r poetry in terms of imposed limitations. With the divorce between the spoken language of passion and the formal language of composition, they had a good opportunity to exalt a congenially narrow interpretation of those limits. Among adîbs who so often put poetry to purposes of decoration or even display, the critics' word was law. Generations of poets afterwards strove to reproduce the desert qasîdah ode in their more serious work so as to win the critics' acclaim.

Some poets were able to respond with considerable skill to the critics' demands. Abû-Tammâm (d. c. 845) both collected and edited the older poetry and also produced imitations himself of great merit. But work such as his, however admirable, could not be duplicated indefinitely. In any case, it could appear insipid. A living tradition could not simply mark time; it had to explore whatever openings there might be for working through all possible variations on its themes, even the grotesque. Hence in the course of subsequent generations, taste came to favor an ever more elaborate style both in verse and in prose. Within the forms which had been accepted, the only recourse for novelty (which was always demanded) was in the direction of more far-fetched similes, more obscure references to educated erudition, more subtle connections of fancy.

The peak of such a tendency was reached in the proud poet al-Mutanabbi', "the would-be prophet" (915--965 — nicknamed so for a youthful episode of religious propagandizing, in which his enemies said he claimed to be a prophet among the Bedouin), who travelled whenever he did not meet, where he was, with sufficient honor for his taste. He himself consciously exemplified, it is said, something of the independent spirit of the ancient poets. Though he lived by writing panegyrics, he long preferred, to Baghdad, the semi-Bedouin court of the Hamdânid Sayf-al-dawlah at Aleppo; and on his travels he died rather than belie his valiant verses, when Bedouin attacked the caravan and he defended himself rather than escape. His verse has been ranked as the best in Arabic on the ground that his play of words showed the widest range of ingenuity, his images held the tension between fantasy and actuality at the tautest possible without falling into absurdity.

After him, indeed, his heirs, bound to push yet further on the path, were often trapped in artificial straining for effect; and sometimes they appear simply absurd. In any case, poetry in literary Arabic after the High Caliphal Period soon became undistinguished. Poets strove to meet the critics' norms, but one of the critics' demands was naturally for novelty within the proper forms. But such novelty could be had only on the basis of over-elaboration. This the critics, disciplined by the high, simple standards of the old poetry, properly rejected too. Within the received style of shi`r, good further work was almost ruled out by the effectively high standards of the `Abbâsî critics. [volume I, pp. 463--464, omitting some diacritical marks which I don't know how to make in HTML]

Now, it does not matter here what the formal requirements of such poetry were, still less those of the qasidah; nor is it relevant whether Hodgson's aesthetic judgments were correct. I quote this because he points to the very same mechanism — demand for novelty plus restrictions of a style leading to certain kinds of elaboration and content — decades before Martindale (Hodgson died, with this part of his book complete, in 1968), and with no pretense that he was making an original argument, as opposed to rehearsing a familiar one.

But there are obvious problems with turning this mechanism into the Universal Scientific Law of Artistic Change, as Martindale wants to do. Or rather problems which should be obvious, many of which were well put by Joseph (Abu Thomas) Levenson in Confucian China and Its Modern Fate:

Historians of the arts have sometimes led their subjects out of the world of men into a world of their own, where the principles of change seem interior to the art rather than governed by decisions of the artist. Thus, we have been assured that seventeenth-century Dutch landscape bears no resemblance to Breughel because by the seventeenth century Breughel's tradition of mannerist landscape had been exhausted. Or we are treated to tautologies, according to wich art is "doomed to become moribund" when it "reaches the limit of its idiom", and in "yielding its final flowers" shows that "nothing more can be done with it" — hece the passing of the grand manner of the eighteenth entury in Europe and the romantic movement of the nineteenth.

How do aesthetic valuies really come to be superseded? This sort of thing, purporting to be a revelation of cause, an answer to a question, leaves the question still to be asked. For Chinese painting, well before the middle of the Ch'ing period, with its enshrinement of eclectic virtuosi and connoisseurs, had, by any "internal" criteria, reached the limit of its idiom and yielded its final flowers. And yet the values of the past persisted for generations, and the fear of imitation, the feeling that creativity demanded freshness in the artist's purposes, remained unfamiliar to Chinese minds. Wang Hui was happy to write on a landscape he painted in 1692 that it was a copy of a copy of a Sung original; while his colleague, Yün Shou-p'ing, the flower-painter, was described approvingly by a Chi'ing compiler as having gone back to the "boneless" painting of Hsü Ch'ung-ssu, of the eleventh century, and made his work one with it. (Yün had often, in fact, inscribed "Hsü Ch'ung-ssu boneless flower picture" on his own productions.) And Tsou I-kuei, another flower-painter, committed to finding a traditional sanction for his art, began a treatise with the following apologia:

When the ancients discussed painting they treated landscape in detail but slighted flowering plants. This does not imply a comparison of their merits. Flower painting flourished in the northern Sung, but Hsü [Hsi] and Huang [Ch'üan] could not express themselves theoretically, and therefore their methods were not transmitted.

The lesson taught by this Chinese experience is that an art-form is "exhausted"when its practitioners think it is. And a circular explanation will not hold — they think so not when some hypothetically objective exhaustion occurs in the art itself, but when outer circumstances, beyond the realm of purely aesthetic content, has changed their subjective criteria; otherwise, how account for the varying lengths of time it takes for different publics to leave behind their worked-out forms? [pp. 40–41]

Martindale seems to be completely innocent of such considerations. What he brings to this long-running discussion is, supposedly, quantitative evidence, and skill in its analysis. But this is precisely what he lacks. I have only gone over one of his analyses here, but I claim that the level of incompetence displayed here is actually entirely typical of the rest of the book.

Manual trackback: Evolving Thoughts; bottlerocketscience

Minds, Brains, and Neurons; Writing for Antiquity; The Commonwealth of Letters; Learned Folly; Enigmas of Chance

Posted by crshalizi at June 30, 2010 15:00 | permanent link

June 28, 2010

Reminder: The Link Distribution of Weblogs Is Not a Power Law

For some reason, Clay Shirky's 2003 essay "Power Laws, Weblogs, and Inequality" seems to be making the rounds again. Allow me to remind the world that, at least as of 2004, the distribution of links to weblogs was definitely not a power law. Whether this matters to Shirky's broader arguments about the development of new media is a different question; perhaps all that's needed is for the distribution to be right skewed and heavy tailed. But the actual essay stresses the power law business, which is wrong.

If you have more recent data and would like an updated analysis, you can use our tools and do it yourself.

Enigmas of Chance; Networks; Power Laws

Posted by crshalizi at June 28, 2010 09:31 | permanent link

June 26, 2010

Praxis and Ideology in Bayesian Data Analysis

Attention conservation notice: 750+ self-promoting words about a new preprint on Bayesian statistics and the philosophy of science. Even if you like watching me ride those hobby-horses, why not check back in a few months and see if peer review has exposed it as a mass of trivialities, errors, and trivial errors?

I seem to have a new pre-print:

Andrew Gelman and CRS, "Philosophy and the Practice of Bayesian Statistics", arxiv:1006.3868
Abstract: A substantial school in the philosophy of science identifies Bayesian inference with inductive inference and even rationality as such, and seems to be strengthened by the rise and practical success of Bayesian statistics. We argue that the most successful forms of Bayesian statistics do not actually support that particular philosophy but rather accord much better with sophisticated forms of hypothetico-deductivism. We examine the actual role played by prior distributions in Bayesian models, and the crucial aspects of model checking and model revision, which fall outside the scope of Bayesian confirmation theory. We draw on the literature on the consistency of Bayesian updating and also on our experience of applied work in social science.
Clarity about these matters should benefit not just philosophy of science, but also statistical practice. At best, the inductivist view has encouraged researchers to fit and compare models without checking them; at worst, theorists have actively discouraged practitioners from performing model checking because it does not fit into their framework.

As the two or three people who still read this blog may recall, I have long had a Thing about Bayesianism, or more exactly the presentation of Bayesianism as the sum total of rationality, and the key to all methodologies. (Cf.) In particular, the pretense that all a scientist really wants, or should want, is to know the posterior probability of their theories — the pretense that Bayesianism is a solution to the problem of induction — bugs me intensely. This is the more or less explicit ideology of a lot of presentations of Bayesian statistics (especially among philosophers, economists* and machine-learners). Not only is this crazy as methodology — not only does it lead to the astoundingly bass-ackwards mistake of thinking that using a prior is a way of "overcoming bias", and to myths about Bayesian super-intelligences — but it doesn't even agree with what good Bayesian data analysts actually do.

If you take a good Bayesian practitioner and ask them "why are you using a hierarchical linear model with Gaussian noise and conjugate priors?", or even "why are you using that Gaussian process as your prior distribution over regression curves?", if they have any honesty and self-awareness they will never reply "After offering myself a detailed series of hypothetical bets, the stakes carefully gauged to assure risk-neutrality, I elicited it as my prior, and got the same results regardless of how I framed the bets" — which is the official story about operationalizing prior knowledge and degrees of belief. (And looking for "objective" priors is hopeless.) Rather, data analysts will point to some mixture of tradition, mathematical convenience, computational tractability, and qualitative scientific knowledge and/or guesswork. Our actual degree of belief in our models is zero, or nearly so. Our hope is that they are good enough approximations for the inferences we need to make. For such a purpose, Bayesian smoothing may well be harmless. But you need to test the adequacy of your model, including the prior.

Admittedly, checking your model involves going outside the formalism of Bayesian updating, but so what? Asking a Bayesian data analyst not just whether but how their model is mis-specified is not, pace Brad DeLong, tantamount to violating the Geneva Convention. Instead, it is recognizing them as a fellow member of the community of rational inquirers, rather than a dumb numerical integration subroutine. In practice, good Bayesian data analysts do this anyway. The ideology serves only to give them a guilty conscience about doing good statistics, or to waste time in apologetics and sophistry. Our modest hope is to help bring an end to these ideological mystifications.

The division of labor on this paper was very simple: Andy supplied all the worthwhile parts, and I supplied everything mistaken and/or offensive. (Also, Andy did not approve this post.)

*: Interestingly, even when economists insist that rationality is co-extensive with being a Bayesian agent, none of them actually treat their data that way. Even when they do Bayesian econometrics, they are willing to consider that the truth might be outside the support of the prior, which to a Real Bayesian is just crazy talk. (Real Bayesians enlarge their priors until they embrace everything which might be true.) Edward Prescott forms a noteworthy exception: under the rubric of "calibration", he has elevated his conviction that his prior guesses are never wrong into a new principle of statistical estimation.

Manual trackback: Andrew Gelman; Build on the Void; The Statistical Mechanic; A Fine Theorem; Evolving Thoughts; Making Sense with Facilitated Systems; Vukutu; EconTech; Gravity's Rainbow; Nuit Blanche; Smooth; Andrew Gelman again (incorporating interesting comments from Richard Berk); J.J. Hayes's Amazing Antifolk Explicator and Philosophic Analyzer; Manuel "Moe" G.

Bayes, anti-Bayes; Enigmas of Chance; Philosophy; Self-Centered

Posted by crshalizi at June 26, 2010 15:58 | permanent link

June 24, 2010

The Old Country, Back in the Day

In the late 1950s, my grandfather, Abdussattar Shalizi, was the president of the planning office in Afghanistan's ministry of planning; back then Afghanistan had a planning office and a ministry of planning which were not just jokes. During that time he wrote a book called Afghanistan: Ancient Land with Modern Ways, mostly consisting of his photographs of the signs of the country's progress. This was, as you might guess, a propaganda piece, but I can testify that it was an utterly sincere propaganda piece. So far as I know my grandfather did not erect any Potemkin factories, schools, houses, irrigation works, record stores, Girl Scout troops, or secure roads for his photographs. Re-reading the book now fills me with pity and, to be honest, anger.

But it is important to remember, when people ignorantly mutter about a country stuck in the 12th century, not just that the 12th century meant something very different there than it did in Scotland, but that 1960 in Afghanistan actually happened. So I am very pleased to see, via my brother, a photo essay in Foreign Policy, by Mohammad Qayoumi, consisting of scanned photos from my grandfather's book, with his original captions and Qayoumi's commentary. Go look.

(My plan to post something positive at least once a week was a total failure. I am contemplating requiring every merely-critical post to be paired with a positive one.)

Manual trackback: Gaddeswarup

Afghanistan and Central Asia

Posted by crshalizi at June 24, 2010 21:30 | permanent link

Confounded Divorce ("Why Oh Why Can't We Have a Better Press Corps?" Dept.)

Attention conservation notice: 1000+ words about how I am irritated by journalists being foolish, and about attempts at causal inference on social networks. As novel as a cat meowing or a car salesman scamming.

I have long thought that most opinion writers could be replaced, to the advantage of all concerned, by stochastic context-free grammars. Their readers would be no less well-informed about how the world is and what should be done about it, would receive no less surprise and delight at the play of words and ideas, and the erstwhile writers would be free to pursue some other trade, which did not so corrode their souls. One reason I feel this way is that these writers habitually make stuff up because it sounds good to them, even when actual knowledge is attainable. They have, as a rule, no intellectual conscience. Yesterday, therefore, if you had told me that one of their number actually sought out some social science research, I would have applauded this as a modest step towards a better press corps.

Today, alas, I am reminded that looking at research is not helpful, unless you have the skills and skepticism to evaluate the research. Exhibit A is Ross "Chunky Reese Witherspoon Lookalike" Douthat, who stumbled upon this paper from McDermott, Christakis, and Fowler, documenting an association between people getting divorced and those close to them in the social network also getting divorced. Douthat spun this into the claim that "If your friends or neighbors or relatives get divorced, you're more likely to get divorced --- even if it's only on the margins --- no matter what kind of shape your marriage is in." It should come as no surprise that McDermott et al. did not, in any way whatsoever, try to measure what shape peoples' marriages were in.

Ezra Klein, responding to Douthat, suggests that the causal channel isn't making people who are happy in their marriages divorce, but leading people to re-evaluate whether they are really happily married, by making it clear that there is an alternative to staying married. "The prevalence of divorce doesn't change the shape your marriage is in. It changes your willingness to face up to the shape your marriage is in." (In other words, Klein is suggesting that many people call their marriages "happy" only through the mechanism of adaptive preferences, a.k.a. sour grapes.) Klein has, deservedly, a reputation for being more clueful than his peers, and his response shows a modicum of critical thought, but he is still relying on Ross Douthat to do causal inference, which is a sobering thought.

Both of these gentlemen are assuming that this association between network neighbors' divorces must be due to some kind of contagion — Douthat is going for some sort of imitation of divorce as such, Klein is looking to more of a social learning process about alternatives and their costs. Both of them ignore the possibility that there is no contagion here at all. Remember homophily: People tend to be friends with those who are like them. I can predict your divorce from your friends' divorces, because seeing them divorce tells me what kind of people they are, which tells me about what kind of person you are. From the sort of observational data used in this study, it is definitely impossible to say how much of the association is due to homophily and how much to contagion. (The edge-reversal test they employ does not work.) It seems to be impossible to even say whether there is any contagion at all.*

To be clear, I am not castigating columnists for not reading my pre-prints; on balance I'm probably happier that they don't. But the logical issue of running together influence from friends and inference from the kind of friends you have is clear and well known. (Our contribution was to show that you can't escape the logic through technical trickery.) One would hope it would have occurred to people to ponder it before calling for over-turning family law, or saying, in effect, "You should stay together, for the sake of your neighbors' kids". I also have no problem with McDermott et al. investigating this. It's a shame that their data is unable to answer the causal questions, but without their hard work in analyzing that data we wouldn't know there was a phenomenon to be explained.

I hope it's obvious that I don't object to people pontificating about whatever they like; certainly I do enough of it. If people can get paying jobs doing it, more power to them. I can even make out a case why ideologically committed opinionators have a role to play in the social life of the mind, like so. It's a big complicated world full of lots of things which might, conceivably, matter, and it's hard to keep track of them all, and figure out how one's principles apply** — it takes time and effort, and those are always in short supply. Communicating ideas takes more time and effort and skill. People who can supply the time, effort and skill to the rest of us, starting from more or less similar principles, thereby do us a service. But only if they are actually trustworthy — actually reasoning and writing in good faith — and know what they are talking about.

(Thanks, of a kind, to Steve Laniel for bringing this to my attention.)

*: Arbitrarily strong predictive associations of the kind reported here can be produced by either mechanism alone, in the absence of the other. We are still working on whether there are any patterns of associations which could not be produced by homophily alone, or contagion alone. So far the answer seems to be "no", which is disappointing.

**: And sometimes you reach conclusions so strange or even repugnant that the principles they followed from come into doubt themselves. And sometimes what had seemed to be a principle proves, on reflection, to be more like a general rule, adapted to particular circumstances. And sometimes one can't articulate principles at all. All of this, too, could and should be part of our public conversation; but let me speak briefly in the main text.

(Typos corrected, 26 June)

Manual trackback: The Monkey Cage.

Networks; The Running Dogs of Reaction

Posted by crshalizi at June 24, 2010 20:45 | permanent link

June 01, 2010

Brush Your Teeth!

Attention conservation notice: Combines quibbles about what's in an academic paper on tooth-brushing with more quibbles about the right way to do causal inference.

Chris Blattman finds a new paper which claims not brushing your teeth is associated with higher risk of heart disease, and is unimpressed:

Toothbrushing is associated with cardiovascular disease, even after adjustment for age, sex, socioeconomic group, smoking, visits to dentist, BMI, family history of cardiovascular disease, hypertension, and diagnosis of diabetes.

...participants who brushed their teeth less often had a 70% increased risk of a cardiovascular disease event in fully adjusted models.

The idea is that inflamed gums lead to certain chemicals or clot risks.

In the past five days I've seen this study reported in five newspapers, half a dozen radio news shows, and several blogs. These researchers know how to use a PR firm.

Sounds convincing. What could be wrong there?

OH WAIT. MAYBE PEOPLE WHO BRUSH THEIR TEETH TWICE A DAY GENERALLY TAKE BETTER CARE OF THEMSELVES AND WATCH WHAT THEY EAT.

I'm consistently blown away by what passes for causal analysis in medical journals.

Now, I am generally of one mind with Blattman about the awfulness of causal inference in medicine — I must write up the "neutral model of epidemiology" sometime soon — but here, I think, he's being a bit unfair. (I have not read or listened to any of the press coverage, but I presume it's awful, because it always is.) If you read the actual paper, which seems to be open access, one of the covariates is actually a fairly fine-grained set of measures of physical activity, albeit self-reported. (I'm not sure why the didn't list it in the abstract.) It would be nice to have information about diet, and of course self-reports are always extra dubious for moralized behaviors like exercise. Still, it's not right to say, IN ALL CAPS, that the authors of the paper did nothing about this.

In fact, the real weakness of the paper is that they have a reasonably clear mechanism in mind, and enough information to test it, but didn't do so. As Blattman says, the idea is that not brushing your teeth causes tooth and gum disease, tooth and gum disease cause inflammatory responses, and inflammation causes heart disease. Because of this, the authors measured the levels of two chemical markers of inflammation, and found that they were positively predicted by not brushing, even adjusting for their other variables (including physical activity). So far so good. Following the logic of Pearl's front-door criterion, what they should have done next, but did not, was see whether conditioning on the levels of these chemical markers substantially reduced the dependence of heart disease on tooth brushing. (The dependence should be eliminated by conditioning on the complete set of chemicals mediating the inflammatory response.) This is what one would expect if that mechanism I mentioned actually works, but not if the association comes down to not brushing being a sign that one's an unhealthy slob.

The moral is: brush your teeth, for pity's sake, unless you want to end up like this poor soul.

Enigmas of Chance; The Natural Science of the Human Species

Posted by crshalizi at June 01, 2010 13:25 | permanent link

May 31, 2010

Books to Read While the Algae Grow in Your Fur, May 2010

Bruce Sterling, designed by Lorraine Wild, Shaping Things
I'll let the introduction speak for itself. — The graphic design is actually really very nice, though it doesn't call attention to itself.
John Stuart Mill, On Bentham and Coleridge
Finally read the old paperback volume of these two essays which I appear to have bought at a friends-of-the-library sale in 1997. (That had an introduction by F. R. Leavis, which I ignored.) Enjoyable, but it really did not succeed in convincing me to take Coleridge seriously. And of course when he wrote these essays Mill made his living as a functionary of the British Empire (in the form of the East India Company), and so had a personal stake in anti-democratic arguments — you can see him straining to find some way to evade the force of the idea that those who hold power ought to be accountable to those over whom it is wielded...
I realize that I shouldn't have laughed out loud reading Mill's contrast between the national characters of the Germans and the Italians, but come on.
S. N. Lahiri, Resampling Methods for Dependent Data
This is a thorough discussion of variants of the bootstrap for time series (chapters 2--11) and spatial random fields (chapter 12). Lahiri does not presume any previous knowledge of the bootstrap, though that would help; familiarity with theoretical statistics at the level of, say, Lehmann or Schervish is essential. A few proofs are referred to Lahiri's papers, otherwise this is self-contained, with an appendix reminding readers of the most essential results on stochastic processes.
Chapter 1 is an introduction to the idea of bootstrapping and a tour of the book. Chapter 2 discusses the forms of bootstrapping for time series, which are all based on the idea that rather than resampling individual observations, one needs to resampling whole blocks of consecutive data points. This preserves the dependence structure within each block, but messes it up at the transition between blocks; one therefore wants to let the length of the blocks grow as one gets more data. There are various ways of resampling blocks, with mostly but not entirely identical properties. Chapter 3 in particular discusses the estimation of sample means using these various block boostraps, assuming various sorts of strong mixing (most such results would carry over to cases with mere weak dependence). Chapter 4 shows how to reduce other problems to ones of sample means for transformations of the data. (I hadn't realized that a lot of the techniques for smooth functionals in statistics go back to von Mises.) Chapter 5 compares the first-order properties of the different bootstraps. Chapter 6 uses Edgeworth expansions to get at second-order properties; personally I found this chapter (unlike the others) nearly unreadable. (Since Edgeworth expansions are about series expansions for generating functions obeying certain combinatorial rules, it feels like it should be possible somehow to express them as Feynman diagrams, which would be a lot easier to grasp. If someone has done this, though, I can't find it.) Chapter 7 is about estimating how big the blocks should be, by more resampling. Chapters 8 and 9 are about alternatives to block bootstraps: either bootstrapping from parametric models (just linear ARMA models, chapter 8), or from the Fourier transform (chapter 9).
The previous theoretical results all rely on comparatively rapid decay of correlations and on the control of higher moments. Chapter 10 gives results on how to bootstrap when there is long-range dependence, and chapter 11 considers modifications for heavy-tailed data. (Also for maxima and minima, since the extreme values drawn from even light-tailed distributions tend to look heavy-tailed.) Interestingly, for both long-range dependence and heavy tails, one really needs the surrogate data to be, in an appropriate sense, smaller than the original --- trying to produce something just as large as the original time series turns out to lead to inconsistent estimates.
Finally, chapter 12 gives block bootstraps for spatial data, either on regular grids (in the limit of growing sampling regions of fixed shape) or irregularly spaced. The former is pretty straightforward, aside from annoyances about how to cover the edges of the sampling region. The latter is considerably more involved, but can handle both growing regions and increasingly dense sampling from a fixed region.
I am glad I read this, but I recommend it only for those with a serious interest in the theory of the bootstrap. (Even for them, chapter 6, oy.)
Laurence Gough, Silent Knives = Death on a No. 8 Hook and Hot Shots
Mind candy. Procedural series mystery set in Vancouver. I'd picked up a much later book in the series years ago (Heartbreaker) and loved it, but hadn't been able to lay hands on this more until I ran across this one, which I devoured instantly.
Quatermass and the Pit
Mind candy. I saw the 1950s BBC TV serial, rather than the 1967 movie remake the latter link indicates, this is very Lovecraftian science fiction: ancient alien strangeness implicated, in the most horrible way, in the oldest history of humanity. (Think "The Rats in the Walls" and At the Mountains of Madness) There is also I think some influence from Stapledon's great The Last and the First Men (ROT-13'd: fcrpvsvpnyyl, gur jnl uvf yngre fcrpvrf bs uhzna npuvrir gryrcnguvp cbjref guebhtu uloevqvmvat jvgu Znegvnaf).
The Objective and Red Sands
Mind candy. These are the only American movies (except arguably Iron Man) I've run across about the American experience in Afghanistan since 2001; both are horror movies, which probably means something. Some people would deduce from these that, by venturing into Afghanistan, we fear we are entangling ourselves with something very old and dangerous and better left alone; but I think that kind of criticism is B.S., and anyway that ship sailed long ago. (Pointers to other movies, especially ones not in this genre, would be appreciated.) ROT-13'd spoilers: Gur Bowrpgvir fhssref sebz gur snpg gung vs V gnxr frevbhfyl gur yrnq npgbe'f nccrnenapr va aneengvir-2002, ur jbhyq unir whfg orra fgnegvat gb funir jura gur PVN jnf jbexvat jvgu gur zhwnuvqrra. Zber shaqnzragnyyl, cybg fhssref sebz vgf vafvfgrapr ba fraqvat n zna gb qb n HNI'f wbo, gb fnl abguvat bs gur fho-iba-Qnavxra qrhf rk gevnathyb ng gur raq. Erq Fnaqf jnf na nygbtrgure fhcrevbe ubeebe zbivr, ohg znqr gur zvfgnxr bs fubjvat gur zbafgre ng gur raq; cergraq gung qvqa'g unccra.
Ta-Nehisi Coates, The Beautiful Struggle: A Father, Two Sons, and an Unlikely Road to Manhood
Coates grew up a year younger than me, and forty-odd miles north; this is a wonderfully-written memoir of what it was like to grow up a dreamy, head-in-the-clouds boy into fantasy stories and role-playing games in Maryland in the '80s, and it makes me remember that life very well. (Even some of what he says about his father struck a chord.) But in so, so many ways he and I might as well have grown up in different worlds, and that makes me angry on his behalf.
It's a beautiful book; read it.
Charles Tilly, Explaining Social Processes
A collection of Tilly's papers on the methods and objects of the social sciences and social history. I find them agreeable and sensible: Tilly stresses the importance of causal explanation of social phenomena by concatenating robust mechanisms; of path dependence; of networks, relations, and the aggregation of recurrent interactions into durable social structures; and the uselessness of thinking about "societies", or some kind of invariant pattern of "social change". His substantive research — on migration chains, "contentious" politics, and the development of the modern state — appears repeatedly, but appropriately, as examples. The essays are unedited and so overlap with each other (and even cite each other in the journal versions), but not so badly that I felt put upon.
That said, I was a bit disappointed in this collection, because, I guess, I was expecting a more systematic statement of Tilly's views on how social phenomena are put together and how they should be investigated. (He rightly insists that the two questions are linked.) In particular, methodological individualism seems to me to have a lot more going for it than Tilly allowed; even explanation by invariants is in better shape. As every school-child knows, when you put agents with invariant decision-making mechanisms in an environment which largely consists of other agents and let them go, their macroscopic behavior is generally path-dependent, and many models of small-scale behavior will only work as local approximations. (See, for instance.) Would Tilly say that this was nonetheless missing something? That assuming an individualistic and invariant infrastructure to explain relational and transient phenomena violates Occam's Razor? Distinguish this from what he meant by "methodological individualism" and "invariance"? I wish I knew; sadly, there will be no chance to find out.
— Many, but not all, of the essays reprinted in this book are available online.
Jack Vance, The Dying Earth
Re-read for the first time in more than a decade, it's still just as good as it always was. (Actually, I got this on CD to listen to while exercising, which worked surprisingly well, and then re-read it immediately after finishing the disc.)
Thomas Barfield, Afghanistan: A Cultural and Political History
The best synoptic history I know of (at least in English), running from the 17th century to the present. Barfield is an anthropologist who did ethnographic fieldwork in Afghanistan in the 1970s, so "culture" here really means social organization and widely-shared ideas about political legitimacy, rather than literature, music, etc.; pleasingly, one of his touchstones is ibn Khaldun. A full review will appear Any Day Now. In the meanwhile: strongly, strongly recommended if you have any reason to care about Afghanistan.
Kathleen George, Afterimage
Mind candy. Police procedural with really good characters, plus local color for Pittsburgh. (I think I can identify every restaurant, except I can't think of anywhere in Shadyside where someone could be watched from the second floor and get a gourmet pizza.) Third (?) book in a series; I'll be tracking down the others.
Jack Campbell, Victorious
Mind candy. A fitting and triumphal conclusion to the long anabasis.

Books to Read While the Algae Grow in Your Fur; Scientifiction and Fantastica; Pleasures of Detection, Portraits of Crime; Writing for Antiquity; Progressive Forces; Enigmas of Chance; Afghanistan and Central Asia; Commit a Social Science; The Continuing Crises; Philosophy; Cthulhiana

Posted by crshalizi at May 31, 2010 23:59 | permanent link

May 15, 2010

"I Don't Feel Like I Gotta Do Nothing"

And why not listen to Eilen Jewell while doing your nothing? (Also: allow me to recommend the Thunderbird Cafe for all your smoky blues-bar needs in Pittsburgh.)

Postcards

Posted by crshalizi at May 15, 2010 23:10 | permanent link

May 14, 2010

Special Demi-Issues on Network Data Analysis in Annals of Applied Statistics

The Annals of Applied Statistics is running a special issue on "modeling and analysis of network data", or rather is spreading it over the current issue and the next. Go look, starting with Steve Fienberg's introduction. You need to subscribe, but then you or your institution should subscribe to AoAS. (Alternately, you could wait about six months for them to show up on arxiv.org.)

Disclaimer: I am an associate editor of AoAS, and helped handle many of the papers for this section.

Networks; Enigmas of Chance; Incestuous Amplification

Posted by crshalizi at May 14, 2010 13:00 | permanent link

May 09, 2010

The Atlantic's Observance of Confederate History Month

Continuing, or in some cases reviving, long-standing but utterly unwelcome customs, several southern states declared April "Confederate History Month". The occasion redeemed itself by provoking a long series of posts from Ta-Nehisi Coates at The Atlantic, each of which "observ[s]e some aspect of the Confederacy—but through a lens darkly". These begin with one whose peroration is worthy of Mencken,

This is who they are—the proud and ignorant. If you believe that if we still had segregation we wouldn't "have had all these problems," this is the movement for you. If you believe that your president is a Muslim sleeper agent, this is the movement for you. If you honor a flag raised explicitly to destroy this country then this is the movement for you. If you flirt with secession, even now, then this movement is for you. If you are a "Real American" with no demonstrable interest in "Real America" then, by God, this movement of alchemists and creationists, of anti-science and hair tonic, is for you.
The whole of it is a moving, empathic, and thereby all the more devastating meditation on memory, pride, shame, racism, heroism, moral courage, myths, the great personalities of the Civil War, and the enduring legacy of one of America's two great founding sins; on just how it is that we can be a country where a month set aside to remember a heritage of treason in defense of slavery is intended as a time of celebration and not of soul-searching.

(Owing to the folly of that venerable magazine's web design, there doesn't seem to be a single page collecting them, but I think this is the entire sequence: 1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 11; 12; 13; 14; 15; 16; 17; 18; 19; 20; 21.)

(Incidentally, last week Coates asked his readers to explain financial derivatives to him, and this week he's move on to nuclear weapons. I speculate that if enough people buy his book, he is certain to not try out the business plan "1. Take a big position in the end-of-the-world trade; 2. Enrich uranium; 3. Profit!")

The Beloved Republic; Writing for Antiquity

Posted by crshalizi at May 09, 2010 09:00 | permanent link

April 30, 2010

Books to Read While the Algae Grow in Your Fur, April 2010

Jean-Guy Prévost, A Total Science: Statistics in Liberal and Fascist Italy
A history of the Italian statistical community (or, as he prefers, "field") from around 1900 through the fall of Fascism, with a brief glance at the immediate post-war era. This is not about the history of statistical technique, but about the development of statistics as an autonomous academic discipline, with pretensions of in fact being the key discipline for all empirical investigation, especially into social and biological matters. So we get a lot about university positions, internal disputes (as Prévost says, one mark of a field is precisely that there are recurring internal arguments, with well-worn positions), how methodology came to be seen as more important than either mathematical theory or applications, the conflict with political economy, etc. Naturally, this extends to looking at how the statistical establishment eagerly sought to serve the Fascist state, proposals for "corporatist" and "totalitarian" statistics, and the elaboration of Fascist ideology by leading statisticians, relying on their self-presentation as polymaths. (Tukey's line about how "The best thing about being a statistician is that you get to play in everyone else's backyard" assumes a new significance when you imagine it being uttered by a blackshirt.) In all this, including the last, Gini is the central figure; quite honestly he should have been purged after the war, but somehow escaped justice.
Some familiarity with the history of both Fascism and of the intellectual content is presupposed. If you have that, and are willing to tolerate a minimal (almost homeopathic) dose of Bourdieu, this provides a lot of interesting, if unhappy, food for thought.
(Prévost makes it clear how Gini's work on measuring inequality was in the tradition launched by Pareto's laws of income and wealth distribution. Cantelli was a friend and collaborator of Gini's, and the Glivenko-Cantelli theorem is the kind of result which would guarantee non-parametric consistency of estimated Gini coefficients from sample data. Was this what motivated Cantelli?)
Robert B. Reich, Supercapitalism: The Transformation of Business, Democracy, and Everyday Life
This is aiming to be something like The Affluent Society or The New Industrial State for modern times; it does a pretty good job. Basically, his argument is that Galbraith was more or less right about how the economy worked during the post-WWII golden age of capitalism: large, autonomous, oligopolistic firms more interested in continued steady growth, exploiting economies of scale, than anything else. JKG's mistake was in thinking this regime would continue. Reich sees Galbraithian capitalism as being upset not so much through deliberate political action in the 1980s as through new technologies in the 1970s, especially improvements in logistics, communications and information technology, which made it possible and efficient to replace the vertically-integrated firm with global supply networks, and to replace investment financed out of retained earnings with global financial markets. (As Reich points out in some detail, all of the key technologies, from container shipping through microelectronics and the Internet, were devised by the military-industrial-university complex to fight the Cold War; sowing the dragon's teeth, as it were.) Deregulation, to Reich's way of thinking, was more a consequence than a cause — the legal superstructure accommodating changes in the forces of production, though he doesn't use such language. The result, he says, is a system more responsive to consumer demand and to investors, but where most of the population sees no gains from economic growth, inequality soars, countervailing power evaporates, security is steadily eroded, and the primary check on the political influence of corporations is the opposing commercial interests of other corporations. (He also has an ingenious argument as to why decreasing regulation led to increasing lobbying.) This he calls "supercapitalism"; I dislike the term and will avoid it.
The way the system is set up, he says, the people running corporations simply have no choice but to do whatever they can to maximize profit in the short term; if they won't, they will shortly be replace by those who will. Calls for corporate social responsibility, still less trying to shame or pressure individual corporations, therefore misses the point. The goal, rather, has to be to change the laws under which all corporations must act, ultimately, to neuter corporations politically, and creating a non-corporate social safety net. (The idea that health insurance, for instance, should be provided by one's employer is just nuts.) Something he does not adequately address, though, is that laws and regulations must be enforced, which is hard to do when one of the two parties regards them as necessarily illegitimate...
So, criticisms: (1) As I hinted, I think Reich underplays the role of ideology and political action, in favor of technological developments and market forces. It would be interesting to try to synthesize this with Krugman's take in Conscience of a Liberal. (2) There are some bits where the economics is a bit odd. For instance, economies of scale are certainly important in information production, just as in making steel. (Cf.) Arguably though the sheer magnitude of the fixed costs, and the time-scale, has shrunk, and that would be enough for Reich's argument. Also, profits decline as industries become more competitive, falling to the cost of capital plus the cost of the entrepreneur's time.
(Picked up after someone, I forget who, pointed me at Lessig's review.)
Nunzio DeFilippis, Christina Weir and Christopher J. Mitten, Past Lies
Brandon Graham, King City
Michael Alan Nelson, Emma Rios and Cris Peter, Hexed
Mark Waid and Minck Oosterveer, The Unknown
Brian Michael Bendis and Michael Avon Oeming, Powers: 1 (Who Killed Retro Girl?), 2 (Roleplay)
Robert M. Solow, Monopolistic Competition and Macroeconomic Theory
"Monopolistic competition" is the slightly oxymoronic name for the situation where there are a number of goods which are all more or less close substitutes for each other, but each good has a monopoly producer. It can arise in a number of ways, from legal restrictions (e.g., copyright on particular pieces of software) or from increasing returns to scale. (Successful branding convinces consumers that basically identical commodities are really different, and so creates monopolistic competition.) In monopolistic competition, firms have some control over their prices, but to maximize profits they need to forecast quantitative demand. The theory is quite well-established microeconomics, having begun its real development in the 1930s with Chamberlin and Robinson, and is a standard part of industrial organization (Cabral's textbook has an especially nice treatment).
This extremely short (88 pages including the index) book consists of Solow pointing out that once you admit monopolistic competition is not just possible but actually common, a lot of the conclusions of macroeconomic models which rest on the idea of perfect competition in all markets evaporate, and one is led to Keynesian conclusions, even if one assumes that everyone in the economy is a perfectly foresightful utility maximizer. In particular, the way is opened for the existence of multiple equilibria: low-output equilibria in which everyone correctly forecasts that there will not be a lot of demand, so they produce little, pay little, and buy little, and high-output equilibria in which everyone correctly forecasts high demand, produces a lot, pays a lot and buys a lot. Everyone prefers the high-level equilibrium to the low, but that doesn't mean they'll manage to coordinate on it. Solow takes this insight, and related ones, and does what he does best, namely build and solve elegant little models of the resulting macroeconomy. He is quite open about these being toy models, and that in some places he has to stipulate some macro-level relations which he doesn't directly derive from the micro assumptions. (But, though he doesn't mention this, the same is true of the usual representative-agent macro models which purport to be aggregations of perfect competition.) The results are not strictly in line with every detail of the General Theory, but are clearly closely related, and make a lot of sense.
This book is very enjoyable, if you have any taste for elegant economic modeling, though alas the price of the actual physical artifact (twenty six cents per page in paperback) is insane. But I've ranted about this before.
Richard A. Berk, Statistical Learning from a Regression Perspective
A gentle introduction to modern nonparametric regression and classification, for people who are comfortable with running linear and logistic regressions, and curious about data mining and/or machine learning. After a brief review of regression (following the lines laid down in his earlier book), Berk covers smoothing (especially with splines), additive models, classification and regression trees, bagging, random forests, boosting, and support vector machines. There are many real-data examples and exercises, all done in R, and all of them I think from the social sciences, with a certain emphasis on his own field of criminology.
Berk relies very heavily on The Elements of Statistical Learning as an authority, and one might think of this as a simplified presentation of the key parts of that book, for social scientists, or advanced undergraduates in statistics — I used it as a supplementary text in my data mining class last fall, and would happily do so again.

Books to Read While the Algae Grow in Your Fur; Scientifiction and Fantastica; Pleasures of Detection, Portraits of Crime; The Dismal Science; Enigmas of Chance; Writing for Antiquity; The Continuing Crises; The Running-Dogs of Reaction

Posted by crshalizi at April 30, 2010 23:59 | permanent link

April 29, 2010

The Republic Hath Need of Thee

Carlos Yu on Facebook yesterday: "What this country really needs is William Tecumseh Sherman." He went on:

... leaving a ten-mile wide trail of burned-out mobile homes and meth labs behind him, Sherman paused in his March to the Tea to regroup his forces. Water was always an issue for Sherman's armies, campaigning as they did in the dusty steppes surrounding Bakersfield, in the deserts of Arizona, and throughout the drought-stricken former Confederacy. Nowhere was their lack of water worse than among the abandoned exurban developments of central Florida, where the water table had been permanently damaged...
This, I think, sums up everything admirably.

The Beloved Republic; The Continuing Crises; Modest Proposals

Posted by crshalizi at April 29, 2010 14:57 | permanent link

April 28, 2010

Return of "Homophily, Contagion, Confounding: Pick Any Three", or, The Adventures of Irene and Joey Along the Back-Door Paths

Attention conservation notice: 2700 words on a new paper on causal inference in social networks, and why it is hard. Instills an attitude of nihilistic skepticism and despair over a technical enterprise you never knew existed, much less cared about, which a few feeble attempts at jokes and a half-hearted constructive suggestion at the end fail to relieve. If any of this matters to you, you can always check back later and see if it survived peer review.

Well, we decided for a more sedate title for the actual paper, as opposed to the talk:

CRS and Andrew C. Thomas, "Homophily and Contagion Are Generically Confounded in Observational Social Network Studies", arxiv:1004.4704, submitted to Sociological Methods and Research
Abstract: We consider processes on social networks that can potentially involve three phenomena: homophily, or the formation of social ties due to matching individual traits; social contagion, also known as social influence; and the causal effect of an individual's covariates on their behavior or other measurable responses. We show that, generically, all of these are confounded with each other. Distinguishing them from one another requires strong assumptions on the parametrization of the social process or on the adequacy of the covariates used (or both). In particular we demonstrate, with simple examples, that asymmetries in regression coefficients cannot identify causal effects, and that very simple models of imitation (a form of social contagion) can produce substantial correlations between an individual's enduring traits and their choices, even when there is no intrinsic affinity between them. We also suggest some possible constructive responses to these results.
R code for our simulations

The basic problem here is as follows. (I am afraid this will spoil some of the jokes in the paper.) Consider the venerable parental question: "If your friend Joey jumped off a bridge, would you jump too?" The fact of the matter is that the answer is "yes"; but why does Joey's jumping off a bridge mean that Joey's friend Irene is more likely to jump off one too?

  1. Influence or social contagion: Because they are friends, Joey's example inspires Irene to jump. Or, more subtly: seeing Joey jump re-calibrate's Irene's tolerance for risky behavior, which makes jumping seem like a better idea.
  2. Biological contagion: Joey is infected with a parasite which suppresses the fear of heights and/or falling, and, because they are friends, Joey passes it on to Irene.
  3. Manifest homophily: Joey and Irene are friends because they both like to jump off bridges (hopefully with bungee cords attached).
  4. Latent homophily: Joey and Irene are friends because they are both hopeless adrenaline junkies, and met through a roller-coaster club; their common addiction leads both of them to take up bridge-jumping.
  5. External causation: Sometimes, jumping off a bridge is the only sane thing to do:

For Irene's parents, there is a big difference between (1) and (2) and the other explanations. The former suggest that it would be a good idea to keep Irene away from Joey, or at least to keep Joey from jumping off the bridge; with the others, however, that's irrelevant. In the case of (3) and (4), in fact, knowing that Irene is friends with Joey is just a clue as to what Irene is really like; the damage was already done, and they can hang out together as much as they want. The difference between these accounts is one of causal mechanisms. (Of course there can be mixed cases.)

What the statistician or social scientist sees is that bridge-jumping is correlated across the social network. In this it resembles many, many, many behaviors and conditions, such as prescribing new antibiotics (one of the classic examples), adopting other new products, adopting political ideologies, attaching tags to pictures on flickr, attaching mis-spelled jokes to pictures of cats, smoking, drinking, using other drugs, suicide, literary tastes, coming down with infectious diseases, becoming obese, and having bad acne or being tall for your age. For almost all of these conditions or behaviors, our data is purely observational, meaning we cannot, for one reason or another, just push Joey off the bridge and see how Irene reacts. Can we nonetheless tell whether bridge-jumping spreads by (some form) of contagion, or rather is due to homophily, or, if it is both, say how much each mechanism contributes?

A lot of people have thought so, and have tried to come at it in the usual way, by doing regression. Most readers can probably guess what I think about that, so I will just say: don't you wish. More sophisticated ideas, like propensity score matching, have also been tried, but people have pretty much assumed that it was possible to do this sort of decomposition. What Andrew and I showed is that in fact it isn't, unless you are willing to make very strong, and generally untestable, assumptions.

This becomes clear as soon as you draw the relevant graphical model, which goes like so:

Here i stands for Irene and j for Joey. Y(i,t) is 1 if Irene jumps off the bridge on day t and 0 otherwise; likewise Y(j,t-1) is whether Joey jumped off the bridge yesterday. We want to know whether the latter variable influences the former. A(i,j) is how we represent the social network --- it's 1 if Irene regards Joey as a friend, 0 otherwise. Lurking in the background are the various traits which might affect whether or not Irene and Joey are friends, and whether or not they like to jump off bridges, collectively X. Suppose that, all else equal, being more similar makes it more likely that people become friends.

Now it's easy to see where the trouble lies. If we learn that Joey jumped off a bridge yesterday, that tells us something about what kind of person Joey is, X(j). If Joey and Irene are friends, that tells us something about what kind of person Irene is, X(i), and so about whether Irene will jump off a bridge today. And this is so whether or not there is any direct influence of Joey's behavior on Irene's, whether or not there is contagion. The chain of inferences — from Joey's behavior to Joey's latent traits, and then over the social link to Irene's traits and thus to Irene's behavior — constitutes what Judea Pearl strikingly called a "back-door path" connecting the variables at either end. When such paths exist, as here, Y(i,t) will be at least somewhat predictable from Y(j,t-1), and sufficiently clever regressions will detect this, but they cannot distinguish how much of the predictability is due to the back door path and how much to direct influence. If this sounds hand-wavy to you, and you suspect that with some fancy adjustments you can duck and weave through it, read the paper.

To switch examples to something a little more serious than jumping off bridges, let's take it as a given that (as Christakis and Fowler famously reported), if Joey became obese last year, the odds of Irene becoming obese this year go up substantially. They interpreted this as a form of social contagion, and one can imagine various influences through which it might work (changing Irene's perception of what normal weight is, changing Irene's perception of what normal food consumption is, changes in happiness leading to changes in comfort food and/or comfort alcohol consumption, etc.). Now suppose that there is some factor X which affects both whether Joey and Irene become friends, and whether and when they become obese. For example:

So long as we cannot measure X, the back-door path linking Joey and Irene remains open, and our inferences about contagion are confounded. It would be enough to measure the aspect of X which influences link formation, or the aspect which influences obesity; but without that, there will always be many ways of combining homophily and contagion to produce any given pattern of association between Joey's obesity status last year and Irene's this year. And it's not matter of not being able to decide among some causal alternatives due to limited data; the different causal alternatives all produce the same observable outcomes. (More on this notion of "identification".)

Christakis and Fowler made an interesting suggestion in their obesity paper, however, which was actually one of the most challenging things for us to deal with. They noticed that friendships are sometimes not reciprocated, that Irene thinks of Joey as a friend, but Joey doesn't think of Irene that way — or, more cautiously, Irene reports Joey as a friend, but Joey doesn't name Irene. For these asymmetric pairs in their data, Christakis and Fowler note, it's easier to predict the person who named a friend from the behavior of the nominee than vice versa. This is certainly compatible with contagion, in the form of being influenced by those you regard as your friends, but is there any other way to explain it?

As it happens, yes. One need only suppose that being a certain kind of person — having certain values of the latent trait X — make you more likely to be (or be named as) a friend. Suppose that there is just a one-dimensional trait, like your location on the left-right political axis, or perhaps some scale of tastes. (Perhaps Irene and Joey are neo-conservative intellectuals, and the trait in question is just how violent they like their Norwegian black metal music.) Having similar values of the trait makes you more likely to be friends (that's homophily), but there is always an extra tendency to be friends with those who are closer to the median of the distribution, or at least to say those are who your friends are. (Wherever neo-conservatives really are on the black metal spectrum, they tend to say, on Straussian grounds, that their friends are those who prefer only the median amount of church-burning with their music.) If Irene thinks of Joey as a friend, but Joey does not, this is a sign that Irene has a more extreme value of the trait than Joey does, which changes how much their behavior predicts each other. Putting together a very basic model of this sort shows that it robustly generates the kind of asymmetry Christakis and Fowler found, even when there is really no contagion.

To be short about it, unless you actually know, and appropriately control for, the things which really lead people to form connections, you really have no way of distinguishing between contagion and homophily.

All of this can be turned around, however. Suppose that you want to know whether, or how strongly, some trait of people influences their choices. Following a long tradition with many illustrious exponents, for instance, people are very convinced that social class influences political choices, and there is indeed a predictive relationship here, though many people are totally wrong about what that relationship is. The natural supposition is that this predictive relationship reflects causation. But suppose that there is contagion, that you can catch ideology or even just choices from your friends. Social class is definitely a homophilous trait; this means that an opinion or attitude or choice can become entrenched among one social class, and not another, simply through diffusion, even if there is no intrinsic connection between them. And there's nothing special about class here; it could be any trait or combination of traits which leads to homophily.

Here, for example, is a simple simulation done using Andrew's ElectroGraph package.

To explain: Each individual has a social type or trait, which takes one of two values and stays fixed — think of this as social class, if you like. People are more likely to form links with those of the same type, so when we plot the graph in a way which brings linked nodes closer to each other, we get a nice separation into two sub-communities, with all the upper-class individuals in the one on top and all the lower-class individuals in the one below. Also, each individual makes a "choice" which can change over time, which again is binary, here "red" or "blue". Initially, choices are completely independent of traits, so there's just as much red among the high-class individuals as among the low.

Now let the choices evolve according to the simplest possible rule: at each point in time, a random individual picks one of their neighbors, again at random, and copies their opinion. After a few hundred such updates, the lower class has turned red, and the upper class has turned blue:

And this isn't just a fluke; the pattern of color separation repeats quite reliably, though which color goes with which class is random. If you wanted to be more quantitative about it, you could, say, run a logistic regression, and discovery that in the homophilous network, statistically-significant prediction of choice from trait is possible, but not in an otherwise-matched network without homophily; you can see those results in the paper. A bit more abstractly, when I learned cellular automata from David Griffeath, one of the topics was something called the "voter model", which is just the rule I gave above for copying choices. On a regular two-dimensional grid, the voter model self-organizes from random noise into blobs of homogeneous color with smooth boundaries; this is just the corresponding behavior on a graph. As I have said several times before, I think this phenomenon — correlating traits and choices by homophily plus contagion — seriously complicates a lot of what people want to do in the social sciences and even the humanities, but since I have gone on about that already, I won't re-rant today.

In their own way, each of the two models in our paper is sheer elegance in its simplicity, and I have been known to question the relevance of such models for actual social science. I don't think I'm guilty of violating my own strictures, however, because I'm not saying that the processes of, say, spreading political opinions really follows a voter model. (The reality is much more complicated.) The models make vivid what was already proved, and show that the conditions needed to produce the phenomena are not actually very extreme.

My motto as a writer might as well be "the urge to destroy is also a creative urge", but in this paper we do hold out some hope, which is that even if the causal effects of contagion and/or homophily cannot be identified, they might be bounded, following the approach pioneered by Manski for other unidentifiable quantities. Even if observable associations would never let us say exactly how strong contagion is, for instance, they might let us say that it has to lie inside some range, and if that range excludes zero, we know that contagion must be at work. (Or, if the association is stronger than contagion can produce, something else must be at work.) I suspect (with no proof) that one way to get useful bounds would be to use the pattern of ties in the network to divide it into sub-networks or, as we say in the trade, communities, and use the estimated communities as proxies for the homophilous trait. That is, if people tend to become friends because they are similar to each other, then the social network will tend to become a set of clumps of similar people, as in the figures above. So rather than just looking at the tie between Joey and Irene, we look at who else they are friends with, and who their friends are friends with, and so on, until we figure out how the network is divided into communities and that (say) Irene and Joey are in the same community, and therefore likely have the similar values of X, whatever it is. Adjusting for community might then approach actually adjusting for X, though it couldn't be quite the same. Right now, though, this idea is just a conjecture we're pursuing.

Manual trackback: The Monkey Cage; Citation Needed; Healthy Algorithms; Siris; Gravity's Rainbow; Orgtheory

Networks; Enigmas of Chance; Complexity; Commit a Social Science; Self-Centered

Posted by crshalizi at April 28, 2010 18:00 | permanent link

April 21, 2010

Outsourced Heavy Flagella Blogging

I was going to blog about this paper

Adrián López García de Lomana, Qasim K. Beg, G. de Fabritiis and Jordi Villà-Freixa, "Statistical Analysis of Global Connectivity and Activity Distributions in Cellular Networks", Journal of Computational Biology forthcoming (2010), arxiv:1004.3138
Abstract: Various molecular interaction networks have been claimed to follow power-law decay for their global connectivity distribution. It has been proposed that there may be underlying generative models that explain this heavy-tailed behavior by self-reinforcement processes such as classical or hierarchical scale-free network models. Here we analyze a comprehensive data set of protein-protein and transcriptional regulatory interaction networks in yeast, an E. coli metabolic network, and gene activity profiles for different metabolic states in both organisms. We show that in all cases the networks have a heavy-tailed distribution, but most of them present significant differences from a power-law model according to a stringent statistical test. Those few data sets that have a statistically significant fit with a power-law model follow other distributions equally well. Thus, while our analysis supports that both global connectivity interaction networks and activity distributions are heavy-tailed, they are not generally described by any specific distribution model, leaving space for further inferences on generative models.
since they are very definitely not making the baby Gauss cry, but Aaron beat me to it, so you should just go read him.

(Study of the scholarly misconstruction of reality suggests that this will lead to at most a marginal reduction in the number of claims that biochemical networks follow power laws.)

Power Laws; Biology; Networks

Posted by crshalizi at April 21, 2010 18:30 | permanent link

On Eyjafjallajökull

It evidently takes a week to find a priest and a nubile virgin in Europe.

Update: "On the other hand", as J.B. told me as soon as I posted this, "find one and you're not far from the other."

Posted by crshalizi at April 21, 2010 08:24 | permanent link

April 20, 2010

"Inference for Unlabelled Graphs" (This Week at the Statistics Seminar)

Attention conservation notice: Only of interest if you (1) care about the community discovery problem for networks and (2) will be in Pittsburgh on Friday.

I've talked about the community discovery problem here before, and even contributed to it; if you want a state-of-the-field you should read Aaron. This week, the CMU statistics seminar delivers a very distinguished statistician's take:

Peter Bickel, "Inference for Unlabelled Graphs"
Abstract:A great deal of attention has recently been paid to determining subcommunities on the basis of relations, corresponding to edges, between individuals, corresponding to vertices of an unlabelled graph (Newman, SIAM Review 2003; Airoldi et al, JMLR 2008; Leskovec, Kleinberg et al, SIGKDD 2005). We have developed a nonparametric framework for probabilistic ergodic models of infinite unlabelled graphs (PNAS 2009) and made some connections with modularities arising in the physics literature and community models in the social sciences. A fundamental difficulty in implementing these procedures is computational complexity. We show how a method of moments approach can partially bypass these difficulties.
This is joint work with Aiyou Chen and Liza Levina.
Place and time: Giant Eagle Auditorium, Baker Hall A51, 4:30--5:30 PM on Friday, April 23, 2010

As always, seminars are free and open to the public.

(This might motivate me to finally finish my post on Bickel and Chen's paper...)

Networks; Enigmas of Chance

Posted by crshalizi at April 20, 2010 16:57 | permanent link

April 19, 2010

The Bootstrap

My "Computing Science" column for American Scientist, "The Bootstrap", is now available for your reading pleasure. Hopefully, this will assuage your curiosity about how to use the same data set not just to fit a statistical model but also to say how much uncertainty there is in the fit. (Hence my recent musings about the cost of bootstrapping.) And then the rest of the May-June issue looks pretty good, too.

I have been reading American Scientist since I started graduate school, lo these many years ago, and throughout that time one of the highlights for me has been the "Computing Science" column by Brian Hayes; it was quite thrilling to be asked about being one of the substitutes while he's on sabbatical, and I hope I've come close to his standard.

After-notes to the column itself:

Enigmas of Chance; Self-Centered

Posted by crshalizi at April 19, 2010 08:45 | permanent link

April 15, 2010

Dept. of "I Told You So"

Me, going on three years ago: "It is a further sign of our intellectual depravity that people take Bryan Caplan seriously, even when he is obviously a cheap imitation of The Onion."

Today: Holbo (and again), Warring, Henley, DeLong.

The Running Dogs of Reaction

Posted by crshalizi at April 15, 2010 23:13 | permanent link

Got Plenty of Time (The Porosity of the Avante Garde)

Empirically, the time needed for something to seep from self-consciously advanced subcultures to complete innocuousness really is about one generation. (Second link via Tapped.)

Linkage

Posted by crshalizi at April 15, 2010 22:30 | permanent link

April 08, 2010

"A Two-scale Framework for Variable Selection with Ultrahigh-dimensionality" (Next Week at the Statistics Seminar)

Attention conservation notice: Only of interest if you (1) have a vast number of variables you could use in your statistical models and want to reliably learn which ones matter, and (2) are in Pittsburgh in Monday.

As always, the seminar is free and open to the public:

Jianqing Fan, "A Two-scale Framework for Variable Selection with Ultrahigh-dimensionality"
Abstract: Ultrahigh-dimensionality characterizes many contemporary statistical problems from genomics and genetics to finance and economics. We outline a unified framework to ultrahigh dimensional variable selection problems: Iterative applications of vast-scale screening followed by moderate-scale variable selection. The framework is widely applicable to many statistical contexts: from multiple regression, generalized linear models, survival analysis to machine learning and compress sensing.
The fundamental building blocks are marginal variable screening and penalized likelihood methods. How high dimensionality can such methods handle? How large can false positive and negative be with marginal screening methods? What is the role of penalty functions? This talk will provide some fundamental insights into these problems. The focus will be on the sure screening property, false selection size, the model selection consistency and oracle properties. The advantages of using folded-concave over convex penalty will be clearly demonstrated. The methods will be convincingly illustrated by carefully designed simulation studies and the empirical studies on disease classifications using microarray data and forecast home price indexes at zip level.
Place and time: 4--5 pm on Monday, 12 April, in Porter Hall 125C, CMU

Let add that Fan and Yao's book on time series is one of the best available.

Enigmas of Chance

Posted by crshalizi at April 08, 2010 13:45 | permanent link

March 31, 2010

Books to Read While the Algae Grow in Your Fur, March 2010

Dylan Meconis, Bite Me! A Vampire Farce
Funny comic book satirizing, simultaneously, vampires a la Anne Rice and the French Revolution. Meconis apparently wrote and drew most it, online, while in high school; it's people like her what cause unrest.
I came to this by way of Meconis's current web-serial, Family Man, which has superior drawing and a more serious plot, but a similar sensibility. (If the idea of a comic about Spinozism and lycanthropy in eighteenth-century central Europe sounds the least bit interesting, you really need to read Family Man.)
E. M. Butler, The Tyranny of Greece over Germany
More exactly: how Winckelmann invented an ideal of ancient Greek life and art, and how that ideal influenced Lessing, Herder, Goethe, Schiller, Holderlin and Heine, followed by a sort of appendix on Nietzsche, Stefan George and (of all people) Heinrich Schliemann. This is a very curious book of a sort that I think humanists have mostly abandoned. Butler is not just relentlessly biographical (readers are expected to have Goethe's sexual history memorized), but very free with her speculations about the inner-most drives and natures of her heroes, and even about what they should have done to be "saved", or reconciled with their hypostatized "genius". Worse, she presents these guesses as just as certain as the prosaic facts of their biographies, sometimes to unintentionally comic effect: Nietzsche's mind was not, after all, "rent asunder by ecstatic worship of the god Dionysus", but by syphilis; he needed penicillin, not a convincing modern mythology. (Likewise Holderlin's "reason was destroyed" by schizophrenia, as Butler herself admits, and calling this "homesickness for the land of the gods" is unilluminating.) No comparison is attempted to imitations or admiration of the ancient Greeks in other times and places, or to contemporary German attitudes to other ancient and foreign cultures (except for some stray remarks about Herder), so it's hard to pick out what was particular to this tradition, as opposed to more general antiquarianism/primitivism and exoticism. Still, it is an interesting tradition...
Clark Glymour, Theory and Evidence
I'm not sure how much of this even Clark would still argue for (it was published in 1980!), so I won't belabor it, but I also think the most fundamental point is sound. Namely: it's possible to use parts of a theory, plus empirical evidence, to test other parts of the theory, or even (using different pieces of evidence) the same parts of the theory. (For instance, many theories include hypotheses which say that certain quantities must be constants, and provide multiple routes to estimating those constants; the estimates need to agree.) This means that theories which make the same predictions are not necessarily equally tested by those predictions, and that the Quine-Duhem problem of not being able to assign credit or blame to parts of theories is soluble. I think the account of what makes something a severe test in Error is superior, at least for statistical theories, but clearly this was pointing in the same direction.
(Insert the usual disclaimers here.)
Lucy A. Snyder, Spellbent
Mind-candy contemporary fantasy, set in Columbus, Ohio and adjacent hells. As good as one might expect from the author of the brilliant "Installing Linux on a Dead Badger", but much grimmer.
Carrie Vaughn, Kitty's House of Horrors
Mind-candy. The continuing adventures of a werewolf named "Kitty". What could go wrong with volunteering for a reality show to be filmed in middle-of-nowhere Montana? — One of the nice features of Vaughn's stories is that the supernatural is announcing its presence in a world otherwise much like ours, and people are reacting in ways that seem plausible, ranging from scientific research through media sensationalism... (Previously: 1, 2, 3, 4, 5, 6; but they're not necessary to read this.)
A. C. Davison and D. V. Hinkley, Bootstrap Methods and Their Applications
One of the most useful textbooks on the bootstrap that I've read. They are good at combining just enough theory to make it clear why some things work and others don't with lots of carefully-chosen examples and advice on practicalities. Background familiarity with statistical inference at the level of, e.g., All of Statistics is required, but no more. The code, in S, forms the basis of the R package boot; most of the examples I re-tried ran without any modification. Recommended without reservation for self-study (do the exercises!); it would also make for an excellent text for a computationally-oriented course for beginning graduate students, or even (selecting chapters) advanced undergraduates.
Davison's page on the book has errata and reviews.
Leann Sweeney, Pick Your Poison, A Wedding to Die For, Dead Giveaway, Shoot from the Lip
Mind-candy. Amiable series mystery centering around adoption.
Philip Palmer, Redclaw
Mind-candy. I rather liked the first two hundred pages or so, but the last half dragged on too long for my taste. (It would've been better at, say, 50 pages.) Recommended for those who enjoy scientifictional Lord of the Flies scenarios more than I do.
Stephen S. Cohen and J. Bradford DeLong, The End of Influence: What Happens When Other Countries Have the Money
I admit I bought this out of a certain sense of obligation: DeLong's website, in its various incarnations, has been entertaining and informing me since the mid-1990s, and it seemed only fair to reciprocate somehow. But it's actually a good (if very short and somewhat repetitive) book, which is really about guessing what might be coming next in international political economy, now that the "neo-liberal dream" is, or ought to be, thoroughly discredited by events.
Since my reaction to the book is largely positive, but I find it hard to convey that except by writing a summary, I will follow academic/Internet tradition and dwell on annoyances. First, they're not, obviously arguing that the US will become an uninfluential country; even if we gave up spending more than most of the rest of the world put together on our military, etc., we'd still have 5% of the world's population, in an extremely advanced, diversified and prosperous economy, and a state which, whatever its frustrations, is highly effective. Cohen and DeLong know this; a better title might've been something like The End of Supremacy. For that matter they never clearly say what they mean by "other countries having the money", or what it meant for the US to "have the money"; something like "be a major net lender to other countries" seems to what they have in mind, but it's unclear. And the suggestion that becoming a net debtor nation will undermine US cultural and intellectual influence is seriously, seriously under-argued.
Diana Rowland, Blood of the Demon
Mind-candy. Continuing contemporary fantasy/police procedural series. A bit more angsty this time; still fun. Cries out for sequels.
Call of Cthulhu
Mind-candy. Silent movie of the short story made a few years ago by the H. P. Lovecraft Historical Society. Nice Expressionist-influenced sets for R'lyeh, and the worst of the creepy racist bits thoughtfully elided. Worth 45 minutes of your Netflix-streaming time if you're into Cthulhiana.
Brotherhood of the Wolf
Mind-candy. Am wrong to I suspect that only in France could you make a big silly monster action movie centered on the struggle between les philosophes and the reactionary elements of the Church? Pairs well with a suspension of the critical faculties and a few glasses of Côtes du Rhône.
Jen Van Meter et al., Hopeless Savages vols. 2 and 3
More adorable first-family-of-punk mind candy. Sadly, this seems to be the end of the series.
James H. Schmitz, The Demon Breed (a.k.a. The Tuvela)
Mind-candy. Intensely enjoyable lone-human-and-her-otters-versus-alien-invaders-in-a-floating-jungle novel from 1968. (Update: the original cover image, which I just ran across, via.) Re-read in connection with donating, back in January, several hundred books my parents had been storing for me for over a dozen years. This was as fun as I remembered it, though very short by modern standards. (hough I must say it boggles the mind that when one of an advanced, technological civilization's domestic animals acquires both language and tool-use by apparent macromutation, the response is "huh, aren't they cute?", as opposed to a massive research effort. The old SF writers were often really lazy at thinking through their conceits... (The completely superfluous mentions of psychic powers at the beginning and end are in a different category, namely placating Schmitz's editor at Analog, the crankish and credulous but talented John W. Campbell.)
Relatedly, I finally got around to reading an earlier book my Schmitz I'd owned since c. 1995, Legacy, which didn't work nearly as well, because the early-1960s-vintage gender politics were inseparable from the story, while entirely absent from Demon Breed. It seems doubtful that Schmitz had his consciousness raised between 1962 and 1968 so I guess he simply improved his craft...

Books to Read While the Algae Grow in Your Fur; Scientifiction and Fantastica; Pleasures of Detection, Portraits of Crime; Enigmas of Chance; Writing for Antiquity; The Commonwealth of Letters; Philosophy; The Dismal Science; The Continuing Crises; Cthulhiana

Posted by crshalizi at March 31, 2010 23:59 | permanent link

March 30, 2010

One Must Imagine Liberman Happy

Back in the day, when the blogs were young, one of the gods decided to travel the world incognito as an incoherent mumbler. A certain phonologist regarded this as an imposition, and devised a scheme whereby mortals would never have to worship incomprehensibilities. This angered the gods, who cursed the professor to spend eternity rolling a stone uphill only to keep having it fall back down patiently debunking reactionary appropriations of neuroscience as carefully as though they were actual attempts to advance human knowledge, and not meretricious myth-making. (An incomplete sampling of episodes, in no particular order except for the first being the most recent: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40). But one must imagine Liberman happy; the alternative is too terrible to contemplate.

Minds, Brains and Neurons; The Natural Science of the Human Species; Learned Folly

Posted by crshalizi at March 30, 2010 11:45 | permanent link

March 21, 2010

The Visual Display of Morally Obligatory Consequences

Via paperpools.

Enigmas of Chance; Learned Folly

Posted by crshalizi at March 21, 2010 21:20 | permanent link

Recognition from Alma Mater

Yes, I've seen this. Yes, those are (so far as I can recall) accurate quotes. No, I really don't track page-views, so I honestly don't know what the most-viewed things I've written are. Yes, it's an entirely undeserved honor to be named in such company. Yes, I do wish my writing was more positive and constructive, and less negative and critical. Yes, I realize it's easily within my power to change that. No, I do not seem to be doing too well on that front.

Self-Centered; Linkage

Posted by crshalizi at March 21, 2010 17:00 | permanent link

Learning Your Way to Maximum Power

Attention conservation notice: 2300 words about a paper other people wrote on learning theory and hypothesis testing. Mostly written last year as part of a never-used handout for 350, and rescued from the drafts folder as an exercise in structured procrastination so as to avoid a complete hiatus while I work on my own manuscripts.

P.S. Nauroz mubarak.

In a previous installment, we recalled the Neyman-Pearson lemma of statistical hypothesis testing: If we are trying to discriminate between signal and noise, and know the distribution of our data (x) both for when a signal is present (q) and when there is just noise (p), then the optimal test says "signal" when the likelihood ratio q(x)/p(x) exceeds a certain threshold, and "noise" otherwise. This is optimal in that, for any given probability of thinking noise is signal ("size"), it maximizes the power, the probability of detecting a signal when there is one.

The problem with just applying the Neyman-Pearson lemma directly to problems of interest is the bit about knowing the exact distributions of signal and noise. We should, forgive the expression, be so lucky. The traditional approach in theoretical statistics, going back to Neyman and Pearson themselves, has been to look for circumstances where we can get a single test of good power against a whole range of alternatives, no matter what they are. The assumptions needed for this are often rather special, and teaching this material means leading students through some of the more arid sections of books like these; the survivors are generally close to insensible by the time they reach the oases of confidence regions.

At the other extreme, a large part of modern statistics, machine learning and data mining is about classification problems, where we take feature-vectors x and assign them to one of a finite number of classes. Generally, we want to do this in a way which matches a given set of examples, which are presumed to be classified correctly. (This is obviously a massive assumption, but let it pass.) When there are only two classes, however, this is exactly the situation Neyman and Pearson contemplated; a binary classification rule is just a hypothesis test by another name. Indeed, this really the situation Neyman discussed in his later work (like his First Course in Probability and Statistics [1950]), where he advocated dropping the notion of "inductive inference" in favor of that of "inductive behavior", asking, in effect, what rule of conduct a learning agent should adopt so as to act well in the future.

The traditional approach in data-mining is to say that one should either (i) minimize the total probability of mis-classification, or (ii) assign some costs to false positives (noise taken for signal) and false negatives (signal taken for noise) and minimize the expected cost. Certainly I've made this recommendations plenty of times in my teaching. But this is not what Neyman and Perason would suggest. After all, the mis-classification rate, or any weighted combination of the error rates, will depend on what proportions of the data we look at actually are signal and noise. Which decision rule minimizes the chance of error depends on the actual proportion of instance of "signal" to those of "noise". If that ratio changes, a formerly optimal decision rule can become arbitrarily bad. (To give a simple but extreme example, suppose that 99% of all cases used to be noise. Then a decision rule which always said "noise" would be right 99% of the time. The minimum-error rule would be very close to "always say 'noise'". If the proportion of signal to noise should increase, the formerly-optimal decision rule could become arbitrarily bad. — The same is true, mutatis mutandis, of a decision rule which minimizes some weighted cost of mis-classifications.) But a Neyman-Pearson rule, which maximizes power subject to a constraint on the probability of false positives, is immune to changes in the proportions of the two classes, since it only cares about the distribution of the observables given the classes. But (and this is where we came in) the Neyman-Pearson rule depends on knowing the exact distribution of observables for the two classes...

This brings us to tonight's reading.

Clayton Scott and Robert Nowak, "A Neyman-Pearson Approach to Statistical Learning", IEEE Transactions on Information Theory 51 (2005): 3806--3819 [PDF reprint via Prof. Scott, PDF preprint via Prof. Nowak]
Abstract: The Neyman-Pearson (NP) approach to hypothesis testing is useful in situations where different types of error have different consequences or a priori probabilities are unknown. For any α>0, the NP lemma specifies the most powerful test of size α, but assumes the distributions for each hypothesis are known or (in some cases) the likelihood ratio is monotonic in an unknown parameter. This paper investigates an extension of NP theory to situations in which one has no knowledge of the underlying distributions except for a collection of independent and identically distributed (i.i.d.) training examples from each hypothesis. Building on a "fundamental lemma" of Cannon et al., we demonstrate that several concepts from statistical learning theory have counterparts in the NP context. Specifically, we consider constrained versions of empirical risk minimization (NP-ERM) and structural risk minimization (NP-SRM), and prove performance guarantees for both. General conditions are given under which NP-SRM leads to strong universal consistency. We also apply NP-SRM to (dyadic) decision trees to derive rates of convergence. Finally, we present explicit algorithms to implement NP-SRM for histograms and dyadic decision trees.

Statistical learning methods take in data and give back predictors --- here, classifiers. Showing that a learning method works generally means first showing that one can estimate the performance of any individual candidate predictor (with enough data), and then extending that to showing that the method will pick a good candidate.

The first step is an appeal to some sort of stochastic limit theorem, like the law of large numbers or the ergodic theorem: the data-generating process is sufficiently nice that if we fix any one prediction rule, its performance on a sufficiently large sample shows how it will perform in the future. (More exactly: by taking the sample arbitrarily large, we can have arbitrarily high confidence that in-sample behavior is arbitrarily close to the expected future behavior.) Here we can represent every classifier by the region R of x values where it says "signal". P(R) is the true false positive rate, or size, of the classifier, and Q(R) is the power. If we fix R in advance of looking at the data, then we can apply the law of large numbers separately to the "signal" and "noise" training samples, and conclude that, with high P-probability, the fraction of "noise" data points falling into R is close to P(R), and likewise with high Q-probability the fraction of "signal" points in R is about Q(R). In fact, we can use results like Hoeffding's inequality to say that, after n samples (from the appropriate source), the probability that either of these empirical relative frequencies differs from their true probabilities by as much as ±h is at most 2 e-2 nh2. The important point is that the probability of an error of fixed size goes down exponentially in the number of samples.

(Except for the finite-sample bound, this is all classical probability theory of the sort familiar to Neyman and Pearson, or for that matter Laplace. Neyman might well have known Bernstein's inequality, which gives similar though weaker bounds here than Hoeffding's; and even Laplace wouldn't've been surprised at the form of the result.)

Now suppose that we have a finite collection of classifier rules, or equivalently of "say 'signal'" regions R1, R2, ... Rm. The training samples labeled "noise" give us an estimate of the P(Ri), the false positive rates, and we just saw above that the probability of any of these estimates being very far from the truth is exponentially small; call this error probability c. The probability that even one of the estimates is badly off is at most cm. So we take our sample data and throw out all the classifiers whose false positive rate exceeds α (plus a small, shrinking fudge factor), and with at least probability 1-cm all the rules we're left with really do obey the size constraint. Having cut down the hypothesis space, we then estimate the true positive rates or powers Q(Ri) from the training samples labeled "signal". Once again, the probability that any one of these estimates is far from the truth is low, say d, and by the union bound again the probability that any of them are badly wrong is at most dm. This means that the sample maximum has to be close to the true maximum, and picking the Ri with the highest true positive rate then is (probabilistically) guaranteed to give us a classifier with close to the maximum attainable power. This is the basic strategy they call "NP empirical risk minimization". Its success is surprising: I would have guessed that in adapting the NP approach we'd need to actually estimate the distributions, or at least the likelihood ratio as a function of x, but Scott and Nowak show that's not true, that all we need to learn is the region R. So long as M is finite and fixed, the probability of making a mistake (of any given magnitude ±h) shrinks to zero exponentially (because c and d do), so by the Borel-Cantelli lemma we will only ever make finitely many mistakes. In fact, we could even let the number of classifiers or regions we consider grow with the number of samples, so long as it grows sub-exponentially, and still come to the same conclusion.

Notice that we've gone from a result which holds universally over the objects in some collection to one which holds uniformly over the collection. Think of it as a game between me and the Adversary, in which the Adversary gets to name regions R and I try to bound their performance; convergence means I can always find a bound. But it matters who goes first. Universal convergence means the Adversary picks the region first, and then I can tailor my convergence claim to the region. Uniform convergence means I need to state my convergence claim first, and then the Adversary is free to pick the region to try to break my bound. What the last paragraph showed is that for finite collections which don't grow too fast, I can always turn a strategy for winning at universal convergence into one for winning at uniform convergence. [1]

Nobody, however, wants to use just a finite collection of classifier rules. The real action is in somehow getting uniform convergence over infinite collections, for which the simple union bound won't do. There are lots of ways of turning this trick, but they all involve restricting the class of rules we're using, so that their outputs are constrained to be more or less similar, and we can get uniform convergence by approximating the whole collection with a finite number of representatives. Basically, we need to count not how many rules there are (infinity), but how many rules we can distinguish based on their output (at most 2n). As we get more data, we can distinguish more rules. Either this number keeps growing exponentially, in which case we're in trouble, or it ends up growing only polynomially, with the exponent being called the "Vapnik-Chervonenkis dimension". As any good book on the subject will explain, this is not the same as the number of adjustable parameters.

So, to recap, here's the NP-ERM strategy. We have a collection of classifier rules, which are equivalent to regions R, and this class is of known, finite VC dimension. One of these regions or classifiers is the best available approximation to the Neyman-Pearson classifier, because it maximizes power at fixed size. We get some data which we know is noise, and use it to weed out all the regions whose empirical size (false positive rate) is too big. We then use data which we know is signal to pick the region/classifier whose empirical power (true positive rate) is maximal. Even though we are optimizing over infinite spaces, we can guarantee that, with high probability, the size and power of the resulting classifier will come arbitrarily close to those of the best rule, and even put quantitative bounds on the approximation error given the amount of data and our confidence level. The strictness of the approximation declines as the VC dimension grows. Scott and Nowak also show that you can also pull the structural risk minimization trick here: maximize the the in-sample true positive rate, less a VC-theory bound on the over-fitting, and you still get predictive consistency, even if you let the capacity of the set of classifiers you'll use grow with the amount of data you have.

What's cool here is that this is a strategy for learning classifiers which gives us some protection against changes in the distribution, specifically against changes in the proportion of classes, and we can do this without having to learn the two probability density functions p and q, one just learns R. Such density estimation is certainly possible, but densities are much more complicated and delicate objects than mere sets, and the demands for data are correspondingly more extreme. (An interesting question, to which I don't know the answer, is how much we can work out about the ratio q(x)/p(x) by looking at the estimated maximum power as we vary the size α.) While Scott and Nowak work out detailed algorithms for some very particular families of classifier rules, their idea isn't tied to them, and you could certainly use it with, say, support vector machines.

[1] I learned this trick of thinking about quantifiers as games with the Adversary from Hintikka's Principles of Mathematics Revisited, but don't remember whether it was original to him or he'd borrowed it in turn. — Gustavo tells me that game semantics for logic began with Paul Lorenzen.

Enigmas of Chance

Posted by crshalizi at March 21, 2010 15:00 | permanent link

March 17, 2010

How the Social Scientists Got Their *s

Somewhere in the vastness of the scholarly literature there exists a sound, if not complete, history of the reception of statistical inference, especially regression, across the social sciences in the 20th century. I have not found it and would appreciate pointers, though I can only offer acknowledgments in return. If the history end neither with "thus did our fathers raise fertile gardens of rigor in the sterile deserts of anecdata" nor "thus did a dark age of cruel scientism overwhelm all, save a few lonely bastions of humanity", so much the better.

(I specifically mean the 20th century and not the 19th, and statistical inference and not "statistics" in the sense of aggregated numerical data. Erich Lehmann's "Some Standard Statistical Models" is in the right direction, but too focused inwards on statistics.)

Enigmas of Chance; Commit a Social Science

Posted by crshalizi at March 17, 2010 13:30 | permanent link

March 04, 2010

The True Price of Models Pulling Themselves Up by Their Bootstraps

For a project I just finished, I produced this figure:

I don't want to give away too much about the project (update, 19 April: it's now public), but the black curve is a smoothing spline which is trying to predict the random variable Rt+1 from Rt; the thin blue lines are 800 additional splines, fit to 800 bootstrap resamplings of the original data; and the thicker blue lines are the resulting 95% confidence bands for the regression curve [1]. (The tick marks on the horizontal axis show the actual data values.) Making this took about ten minutes on my laptop, using the boot and mgcv packages in R.

The project gave me an excuse to finally read Efron's original paper on the bootstrap, where my eye was caught by "Remark A" on p. 19 (my linkage):

Method 2, the straightforward calculation of the bootstrap distribution by repeated Monte Carlo sampling, is remarkably easy to implement on the computer. Given the original algorithm for computing R, only minor modifications are necessary to produce bootstrap replications R*1, R*2, ..., R*N. The amount of computer time required is just about N times that for the original computations. For the discriminant analysis problem reported in Table 2, each trial of N = 100 replications, [sample size] m = n = 20, took about 0.15 seconds and cost about 40 cents on Stanford's 370/168 computer. For a single real data set with m = n = 20, we might have taken N=1000, at a cost of $4.00.

My bootstrapping used N = 800, n = 2527. Ignoring the differences between fitting Efron's linear classifier and my smoothing spline, creating my figure would have cost $404.32 in 1977, or $1436.90 in today's dollars (using the consumer price index). But I just paid about $2400 for my laptop, which will have a useful life of (conservatively) three years, a ten-minute pro rata share of which comes to 1.5 cents.

The inexorable economic logic of the price mechanism forces me to conclude that bootstrapping is about 100,000 times less valuable for me now than it was for Efron in 1977.

Update: Thanks to D.R. for catching a typo.

[1]: Yes, yes, unless the real regression function is a smooth piecewise cubic there's some approximation bias from using splines, so this is really a confidence band for the optimal spline approximation to the true regression curve. I hope you are as scrupulous when people talk about confidence bands for "the" slope of their linear regression models. (Added 7 March to placate quibblers.)

Enigmas of Chance

Posted by crshalizi at March 04, 2010 13:35 | permanent link

March 03, 2010

Rhetorical Autognosis

The way I usually prepare for a lecture or a seminar is to spend a couple of hours pouring over my notes and references, writing and re-writing a few pages of arcane formulas, until I have the whole thing crammed into my head. When I actually speak I don't look at the notes at all. Fifteen minutes after I'm done speaking, I retain only the haziest outline of anything.

Which is to say, having finally realized that I've unconsciously modeled the way I teach and give talks on the magicians in Jack Vance, I really need to come up with better titles.

Self-Centered

Posted by crshalizi at March 03, 2010 10:15 | permanent link

March 02, 2010

36-490, Undergraduate Research, Spring 2010

What I've been doing instead of blogging. (I am particularly fond of the re-written factor analysis notes; and watch for the forthcoming notes on Markov models and point processes.) Fortunately for the kids, one of us knows what he's doing.

Enigmas of Chance; Corrupting the Young; Self-Centered

Posted by crshalizi at March 02, 2010 10:00 | permanent link

February 28, 2010

Books to Read While the Algae Grow in Your Fur, February 2010

Eric D. Kolaczyk, Statistical Analysis of Network Data: Methods and Models
This is the best available textbook on the subject. (I say this with all due respect for Wasserman and Faust, which was published sixteen years ago.)
Chapter one gives examples of networks, emphasizing that many non-social assemblages are networks, or have networks embedded in them, and can be profitably studied as such; this is story-telling and pretty pictures. Chapter two is background, divided into graph theory and graph algorithms (aimed at statisticians), and the essentials of probability and statistical inference (aimed at computer scientists). Chapter 3 deals with data collection (what do we measure? how do we gather the data? how do we organize it?) and visualization (how do we make those pretty pictures?). Chapter 4 covers descriptive statistics for networks, including ideas about partitioning networks into more-or-less distinct components, a.k.a. "community discovery". Both chapters 3 and 4 have terminal sections on what to do with time-varying networks; these are much less detailed than the rest, because we don't really know what to do with time-varying networks yet.
Chapter 5 deals with the fact that we generally do not have access to complete networks, but rather to samples of them. Inference from samples to larger assemblages (here, the complete network) is a fundamental statistical problem; depending on how the sample was collected, direct extrapolation from the sample to the whole can be quite accurate or highly misleading. Kolaczyk properly begins by reviewing the techniques used for sample inference in population surveys, such as Horvitz-Thompson estimation, which try to compensate for the biases introduced by the sampling scheme; he then turns to the most common sorts of network sampling methods, and gives some examples of how to incorporate the sampling into inferences. This is an area where much more needs to be done, but it's absolutely fundamental, and I'm extremely pleasing to see it handled here.
Chapter 6 considers probabilistic models of network structure and their statistical inference, mostly through the method of maximum likelihood. It begins with the classical Erdos-Renyi (-Rappoport-Solomonoff) random graph model and some of its immediate generalizations; the theory here is exceedingly pretty, but of course it never fits anything in the real world. It then turns to small-world (Watts-Strogatz) models, and to preferential-attachment and duplication models (introduced by Price, re-introduced by Barabasi and Albert owing to ignorance of the literature), including the particular duplication model due to Wiuf et al. which can be estimated by maximum likelihood (as we've seen). The last part of the chapter discusses exponential-family random graph models, which are a fascinating topic I will post more about soon. Chapter 7 is on inferring network structure from partial measurements, including link prediction, inference of phylogenetic trees, and inference of flow- or message- passing networks from traffic measurements ("network tomography"). There could have been a bit more integration between these two chapters, but there could stand to be more integration in the literature, too.
Chapter 8 looks at processes taking place on networks, divided between predicting random fields on networks, and modeling dynamical processes on them. For the first, Kolaczyk emphasizes Markov random fields (including the Hammersley-Clifford-[Griffeath-Grimmett-Preston-et-alii] theorem) and kernel regression. The only kind of dynamic process on networks treated in any detail is epidemic modeling; as usual, this is because much, much more remains to be done. Chapter 9 looks at statistical models of traffic on networks, some of them going back more than half a century in the economic geography literature. Finally chapter 10 is really more of an appendix, sketching the basic formalism of graphical models, and indicating how it connects to both Markov random fields and to exponential-family random graphs.
The material is up-to-date, the explanations are clear, the graphics are good, and the examples are interesting, covering social networks, biochemistry and molecular biology, neuroscience and telecommunications with about equal comfort. I would have no hesitation at all in using this for a class of first- or second- year graduate students, plan to use parts of it next time I teach 462, and can warmly recommend it for self-study. It should become a standard work.
(Amusingly, Powell's currently recommends that people who buy Kolaczyk also get Jenny Davidson's Breeding [which I'm still reading], and vice versa. This tells me that (i) not many people other than me have bought either book from them, and (ii) they need to make their data-mining algorithms a bit more outlier-resistant.)
Dog Soldiers
J. Random British Army squad vs. werewolves in deepest, darkest Scotland. Recommended by Carrie Vaughn.
Intelligence, season 2
I like where they took the story (though I have special reasons to be amused by the involvement of Caribbean financiers), and am sad the series got canceled.
The Last Winter
Decent horror movie about Arctic isolation and global warming. Suffers towards the end from showing too much of the bogey. (ROT-13'd spoilers: Fcrpgeny pnevobh whfg nera'g gung fpnel; naq V xrcg guvaxvat bs Nhqra, gubhtu gung'f cebonoyl vqvbflapengvp.)
Dexter 3
Few things are quite so restorative when facing the winter blahs as a well-made TV show that understands the true meaning and importance of friendship and family ties.
Marshall G. S. Hodgson, Rethinking World History: Essays on Europe, Islam and World History
Hodgson was a historian of Islam at the University of Chicago, best known for his monumental and fantastic Venture of Islam (I, II, III), which was an attempt to tell the story of "conscience and history in a world civilization". Both the "world" and the "civilization" part are important: Hodgson was one of those historians who breaks the world into civilizations, but didn't think of them as distinct organisms or similar weirdness; rather as complexes of very broadly-distributed but also very involving literate traditions. Moreover, the "world" part mattered a lot too: he constantly kept in view the fact that civilizations were never isolated from each other, and their interactions were vital to who they developed, particularly to "Islamicate" civilization, which for a long time occupied the central position in the "Afro-Eurasian Oecumene". The whole of it was an effort to see the history of Islam as part of world history, and to see world history itself objectively. He also tried very hard to try to inhabit and convey the moral universe of the people he wrote about; this was partly about historical understanding and partly about his own earnest Quaker conscience.
Hodgson spent many, many years working on a world history, which was left in an even more fragmentary state than The Venture of Islam at the time of his death; an unpublishable mess. Rethinking World History is a compilation of fragments this manuscript and selections from The Venture, along with some journal papers and letters. The product is an excellent epitome of Hodgson's more general and theoretical ideas about history and historiography: the central role of Islam in world history and the broad course of Islamicate civilization; the nature of tradition and the very broad, diffuse complexes of traditions that constitute civilizations, and the way all traditions constantly change; the errors of then-conventional "orientalist" scholarship; the sheer unprecedented weirdness of the modern "technical" age; the need to crush Eurocentrism if we're to understand history (and in particular the "optical illusion" which makes us think there's a "western civilization" going from ancient Greece through Rome to medieval western Europe and modern European states and their off-shoots); and finally the fundamental unity of human history, and how that manifested itself over time.
There is also an introduction by the editor, one Edmund Burke III, which is partly helpful, but also oddly dismissive of Hodgson. However this dismissal just takes the form of saying Hodgson's "culturalist" and doesn't acknowledge Immanuel Wallerstein (of all people!) and the more dodgy sort of Marxist; Burke doesn't even mention a single material error or omission these supposed flaws lead Hodgson into. While I appreciate Burke's work in pulling together the book, I wish he'd thought harder when writing his introduction.

Books to Read While the Algae Grow in Your Fur; Scientifiction and Fantastica; Pleasures of Detection, Portraits of Crime; Enigmas of Chance; Networks; Writing for Antiquity; Islam; The Great Transformation

Posted by crshalizi at February 28, 2010 23:59 | permanent link

February 12, 2010

More Output

My review of Susan Hough's Predicting the Unpredictable: The Tumultuous Science of Earthquake Prediction is out, here and at American Scientist.

If you are in Paris on Monday, you can hear Andrew Gelman talk about our joint paper on the real philosophical foundations of Bayesian data analysis.

Enigmas of Chance; Self-Centered; Philosophy; Incestuous Amplification

Posted by crshalizi at February 12, 2010 13:20 | permanent link

February 04, 2010

Upcoming Gigs: Bristol

I am giving two talks in Bristol next week about (not so coincidentally) my two latest papers.

"The Computational Structure of Spike Trains"
Bristol Centre for Complexity Sciences, SM2 in the School of Mathematics, 2 pm on Tuesday 9 February
Abstract: Neurons perform computations, and convey the results of those computations through the statistical structure of their output spike trains. Here we present a practical method, grounded in the information-theoretic analysis of prediction, for inferring a minimal representation of that structure and for characterizing its complexity. Starting from spike trains, our approach finds their causal state models (CSMs), the minimal hidden Markov models or stochastic automata capable of generating statistically identical time series. We then use these CSMs to objectively quantify both the generalizable structure and the idiosyncratic randomness of the spike train. Specifically, we show that the expected algorithmic information content (the information needed to describe the spike train exactly) can be split into three parts describing (1) the time-invariant structure (complexity) of the minimal spike-generating process, which describes the spike train statistically; (2) the randomness (internal entropy rate) of the minimal spike-generating process; and (3) a residual pure noise term not described by the minimal spike-generating process. We use CSMs to approximate each of these quantities. The CSMs are inferred nonparametrically from the data, making only mild regularity assumptions, via the causal state splitting reconstruction algorithm. The methods presented here complement more traditional spike train analyses by describing not only spiking probability and spike train entropy, but also the complexity of a spike train's structure. We demonstrate our approach using both simulated spike trains and experimental data recorded in rat barrel cortex during vibrissa stimulation.
Joint work with Rob Haslinger and Kristina Lisa Klinkner.
"Dynamics of Bayesian updating with dependent data and misspecified models"
Statistics seminar, Department of Mathematics, Seminar Room SM3, 2:15pm on Friday 20 February
Abstract: Much is now known about the consistency of Bayesian non-parametrics with independent or Markovian data.. Necessary conditions for consistency include the prior putting enough weight on the right neighborhoods of the true distribution; various sufficient conditions further restrict the prior in ways analogous to capacity control in frequentist nonparametrics. The asymptotics of Bayesian updating with mis-specified models or priors, or non-Markovian data, are far less well explored. Here I establish sufficient conditions for posterior convergence when all hypotheses are wrong, and the data have complex dependencies. The main dynamical assumption is the asymptotic equipartition (Shannon-McMillan-Breiman) property of information theory. This, plus some basic measure theory, lets me build a sieve-like structure for the prior. The main statistical assumption concerns the compatibility of the prior and the data-generating process, bounding the fluctuations in the log-likelihood when averaged over the sieve-like sets. In addition to posterior convergence, I derive a kind of large deviations principle for the posterior measure, extending in some cases to rates of convergence, and discuss the advantages of predicting using a combination of models known to be wrong.
(More on this paper)

I'll also be lecturing about prediction, self-organization and filtering to the BCCS students.

I presume that I will not spend the whole week talking about statistics, or working on the next round of papers and lectures; is there, I don't know, someplace in Bristol to hear music or something?

Update, 8 February: canceled at the last minute, unfortunately; with some hope of rescheduling.

Self-centered; Enigmas of Chance; Complexity; Minds, Brains, and Neurons

Posted by crshalizi at February 04, 2010 13:48 | permanent link

January 31, 2010

Books to Read While the Algae Grow in Your Fur, January 2010

Virginia Swift, Hello, Stranger
Enjoyable mystery with eccentric academics, God-botherers and gentrification in present-day Laramie. Nth book in a series; I'll keep an eye out for the others.
Intelligence
Smart crime/spook drama set in one of the most attractive cities in the world (Vancouver), which could only be improved if it didn't end in the WORST CLIFFHANGER EVER. (Ahem.) Not, of course, as good as The Wire, but then nothing is.
Daniel Waley, The Italian City-Republics
Short, readable political-institutional history of the communes of northern and central Italy. He begins with the communes starting to take form in the towns and wrest control from their bishops, say around 1000, and ends by about 1400, by which point the towns had almost all, except for Venice, descended into some form of monarchy, generally under the domination of the local feudal land/war-lords. (Waley says little about Venice, which in retrospect seems odd, though it didn't strike me while reading it.) While Waley is good at describing this historical trajectory, he says little about why so many Italian cities followed it. I'd think it'd be natural to compare the Italian case to contemporary cities elsewhere, but I think there is exactly one sentence on them. (I imagine all kinds of interesting comparative work could be or has been done.) But within those limits, it's a nice book. Waley has also written studies on Siena and Orvieto, which sound interesting.
Terry Pratchett, Nation
You don't really need me to recommend Terry Pratchett to you, especially when he's writing about how people find ways to go on when their world has been pointlessly destroyed.
Richard Hofstadter, Anti-Intellectualism in American Life
Astonishingly, this still feels like it fits after a lapse of half a century. The whole "tax-raising, latte-drinking, sushi-eating, Volvo-driving, New-York-Times-reading, body-piercing, Hollywood-loving, left-wing freak-show" nonsense of the last thirty years now makes a lot more sense; and the chapters about the history of American education were frankly a revelation to me. (The chapter on Dewey and his pedagogical influence seems like a model of being respectfully but unrelentingly critical.) No doubt for real historians, this is all painfully outdated, and whatever's actually sound has long since been incorporated into other works, which don't provide such unintentional moments of amusement as, when listing the unfair accusations heaped on Jefferson, including keeping a slave mistress and having children by her. (For that matter I don't care for the Beats very much, but they certainly contributed more to our literature than he thought they would.) Still: the man could write.
ObLinkage: Steve Laniel on AIiAL.
D. N. MacKenzie (trans.), Poems from the Divan of Khushâl Khân Khattak
The first significant body of poetry in Pashto; Khushal was a 17th century warlord in what is now the Northwest Frontier, owing his position to a combination of tribal authority and appointment by the Mughals. This seems to be the most recent translation of a selection from his poetry in English, dating from 1965. It is arranged on no particular principles (some Pashto editions are, following tradition, arranged alphabetically by the first letter of the poem), which produces a rather odd effect, that I might summarize as follows: Khushal is happily in love: wow is the beloved a hottie. Khushal is unhappily in love: separation is awful, especially if it's because the beloved doesn't want to see Khushal. Khushal is a fierce warrior who is also a keen hunter; falconry rules. Khushal has a remarkable capacity for drink. (Go ahead, try and tell me that's allegorical.) Aurangzeb sucks, especially in comparison to his father. (Well, he did, and sticking Khushal in jail can't have won him any points.) The Afghans should rally to Khushal and defeat Aurangzeb! Men are treacherous, false-faced bastards, but Afghans are really worse than the rest. (To be fair, having one of your own sons wage war on you in the name of Aurangzeb has got to be pretty embittering.) Khushal will withdraw from the sinful world and spend his days in pious penance. Khushal glorifies God. Repeat.
My grandfather's extemporized translations were better English poetry, but I will never hear those again.
Moez Draief and Laurent Massoulié, Epidemics and Rumors in Complex Networks
A nice short (< 120 pp.) account of the connections among stochastic network models, branching processes, and epidemic models, of the "susceptible-infectious-susceptible" or "susceptible-infectious-recovered" type, including epidemics on networks. ("Rumors" are assumed to fall under such models.)
They begin with the basic Galton-Watson branching process model, where each member of a population produces a random number of descendants (possibly zero), independently of everyone else, and this distribution is constant both within and across generations. Following over a century of tradition, they look at whether the population survives forever or goes extinct, how large it gets, how long it takes to go extinct if it does, etc. This then gets turned into a simple epidemic model ("member of population" = infected individual). It also maps on to the Erdos-Renyi network model, with "has an edge with" taking the place of "is a descendant of": pick your favorite node, and connect it to a random selection of other nodes, the number following a binomial distribution; connect each of them in turn to more random nodes. The size of the branching process's population corresponds to the size of the connected component in the graph. The mapping really only really works in the limit of low-density graphs (the size of the component is roughly a sum of independent quantities when there are no loops), but it's enough to study the emergence of a giant component and the behavior of the diameter of the graph. As a prelude to more sophisticated models, they then prove a form of Kurtz's Theorem on the convergence of Markov chains to ordinary differential equations in the large-population limit. The second half of the book rehearses Watts-Strogatz small-world and Barabási-Albert scale-free networks (including mention of Yule but not, oddly, of Herbert Simon), before wrapping up with epidemic models on graphs, and the "viral marketing" problem of deciding where, on a known and fixed network, to start an epidemic for maximum impact.
Of course, since it's a mathematics book, the problem of how to link these models to data isn't even dismissed.
This isn't a ground-breaking work, but it's nice to have all this in a single book, and one a bit more accessible than, say, Durrett's Random Graph Dynamics (though by the same token less comprehensive). The implied reader is comfortable with stochastic processes at the level of something like Grimmett and Stirzaker; measure-theoretic issues are avoided, even when discussing Kurtz's Theorem. (Their version is thus much less precise and powerful than his, but vastly easier to understand.) Anyone comfortable with that level of probability could read it without much trouble, and I'd happily use it in a class.
Disclaimer: I read a draft of the manuscript for the publisher in 2007, and they sent me a free copy of the book, but I have no stake in its success.
Joseph L. Graves, Jr., The Emperor's New Clothes: Biological Theories of Race at the Millennium
There are places where he lapses into biological jargon, and others where I think lay readers would have benefited from more detailed rebuttals of the common counter-arguments, but over-all I recommend this very strongly. (Thanks to I.B. for lending me her copy.)
Pascal Massart, Concentration Inequalities and Model Selection
Using empirical process theory, and more specifically concentration of measure, to get finite-sample, i.e., non-asymptotic, risk bounds for various forms of model selection. The basic strategy is to find conditions under which every model in a reasonable class will, with high probability, perform about as well on sample data as they can be expected to do on new data; this involves constraining the richness or flexibility of the model class. A little extra work, and the addition of suitable penalties to the fit, gets bounds that extend over multiple classes of model, even over a countable infinity of classes. Among other highlights, Massart shows why the famous AIC heuristic is often definitely sub-optimal, and how to correct it; it also offers corrections to Vapnik's (much better) structural risk minimization, and a nice treatment of data-set splitting (= 1-fold cross-validation). All of this is for IID data, so the usual caveats apply. Formally self-contained, but realistically some previous exposure to empirical processes (at the level of Pollard's notes if not higher) will be needed. Available for free as a large PDF preprint, but I found it much more convenient to read a dead-tree copy.
Elizabeth Bear, New Amsterdam
Alternate-history fantasy mystery stories. Owing something, perhaps, to Randall Garrett's "Lord Darcy" stories (the name of the heroine is distinctly suspicious), but without their complacency about the benevolence of the powers that be.
David Hand, Heikki Mannila and Padhraic Smyth, Principles of Data Mining
I've used this three times now in teaching 36-350, with about 75 students total over the years. I keep using it because it's the best textbook on data-mining I know. It covers the whole process, soup to nuts: data collection (and the importance of understanding what the data actually mean, if anything), cleaning, databases, model construction, model evaluation, optimization, visualization, etc. All of this is organized around four crucial questions: what kind of pattern are we looking for in the data, and how do we represent those patterns? how do we score representations against each other? how do we search for good representations? what do we need to do to implement that search efficiently? All of the basic methods (and many not so basic ones) are in here, all seen as different answers to these questions. I find its explanations extremely clear, and my students seem to as well. I regard it as a strength that it is not tied to pre-canned software, which would only encourage dependency and thoughtlessness.
The only real competition, to my mind, is Hastie, Tibshirani and Friedman. But the Stanford book is distinctly more about statistics, and has more statistical theory and math (though not, from my point of view, a lot of either), whereas this one is distinctly focused on data-mining and on computation. It would be nice if Hand &c. had material on support vector machines, and more on ensemble methods; perhaps it's time for a second edition?
Disclaimer: I almost took a post-doc under Smyth rather than coming to CMU, back in 2004; also, the MIT Press sent me a free review copy of this book (in 2001).

Books to Read While the Algae Grow in Your Fur; Pleasures of Detection, Portraits of Crime; Enigmas of Chance; Scientifiction and Fantastica; Writing for Antiquity; Afghanistan and Central Asia; The Natural Science of the Human Species; Networks; The Beloved Republic; The Commonwealth of Letters; Learned Folly

Posted by crshalizi at January 31, 2010 23:59 | permanent link

January 19, 2010

The Work of Art in the Age of Mechanical Reproduction

Attention conservation notice: 800+ words of inconclusive art/technological/economic-historical musings.

This thread over at Unfogged reminds me of something that's puzzled me for years, ever since reading this: why didn't prints displace paintings the same way that printed books displaced manuscript codices? Why didn't it become expected that visual artists, like writers, would primarily produce works for reproduction? (No doubt, in that branch of the wave-function*, obsessive fans still want to get the original drawings, but obsessive fans also collect writer's manuscripts, or even their typewriters, as well as their mass-produced books.) 16th century engraving technology was strong enough that it could implement powerful works of art (vide), so that can't be it. And by the 18th century at least writers could make a living (however precarious) from writing for the mass public, so why didn't visual artists (for the most part) do likewise? (Again, it's manifestly not as though technology has regressed.) Why is it still the case that a real, high-class visual artist is someone who makes one-offs? I know that reproductions have been important since at least the late 1800s, but for works and artists who first made their reputation with unique, hand-made objects, which is as though the only books which got sent to the printing press were ones which had already circulated to acclaim in manuscript.

Some possibilities I don't buy:

  1. Aesthetic limitations. There are valuable effects which can be achieved with a big original painting which prints just can't match. Response: there are effects you can achieve with an illuminated, calligraphic manuscript which you can't match with movable type, either. Those weren't valuable enough to keep printed books from taking over. Why the difference? Why not a focus on what can be done through prints, which is quite a lot? (Witness the experience of the 20th century and later, when most art lovers know most works of art they enjoy through reproductions.)
  2. Color. A real limitation; even today, getting color done well in mass visual media is not entirely trivial (cf.), and early modern Europe certainly couldn't do it at all. Response: What makes color so important? We know that some great art was made without its benefit, and we don't really know how much better it could have gotten had prints been the medium of choice. Even if color was all that, it just pushes the shift to the late 19th century.
  3. Artists too expensive. Whether you are producing one painting or a thousand prints, there is a considerable fixed cost to the artist's time and training. (The first print is very expensive.) Individual patrons could afford this; the mass public could not. Response: The same argument would apply to books. Besides, high fixed costs usually drive towards seeking a wider market, so that the fixed costs are distributed over a larger number of people. The argument would have to be one of failure of demand — that where there was one man willing to pay 100 guilders (or whatever) for a painting, there were not, say, 120 people willing to pay 1 guilder for prints. Why not?
  4. Paintings too cheap. There have always been too many people wanting to be visual artists for them to all make a living as original artists. One of the things they could do instead was paint copies. Response: The economy of scale problem still applies.
  5. States too weak. In a competitive market, market prices equal marginal costs. The marginal cost of producing another copy of a print is very, very low, so low that the fixed costs of drawing and designing it in the first place aren't recouped. As usual, then, competitive markets fail massively at producing informational goods. The modern solution is to institute and vigorously enforce intellectual property rights. These are monopoly privileges which the state grants to certain individuals; if anyone tries to compete with these favorites of the powers that be, then "goons with guns" (as my libertarian friends like to say) come to stop them. Doing this requires a really massively powerful and intrusive state, which is a relatively recent phenomenon, and not to be lightly deployed on behalf of artists, of all people. Artists who tried to go the mass-production route would've been even more starvation-prone than those who didn't attempt it. Response: An exactly parallel argument would explain why writers didn't embrace printing.
  6. The revolution has happened. The overwhelming majority of visual artists do aim their work at reproduction; it's just a small minority which continues to produce one-offs. This minority has, however, a lot more cultural prestige. Response: There's some merit to this, but it's bizarre and anomalous; it's not as though our really high-class literature was still illuminated or calligraphic manuscripts, and printing was reserved for declassé "commercial" work.
The most convincing argument I've been able to come up with has to do with how visual artworks were and are used. Even in manuscript, books were for reading: private consumption, or near enough. European culture, however, provided a steady stream of demand for works of visual art for public display, which is rather different. It were just a matter of pictures you'd like to look at for your own enjoyment, perhaps prints would serve. But if it's about decorating the church/guildhall/imposing estate, then you need a unique painting of St. Jerome/the burgomasters/the master of the house. The main point is that the owner has the resources to command their very own artwork, not the work's intrinsic aesthetic properties (which good reproductions would share). But even then, why not develop a second stream of reproducible artwork for private rather than conspicuous consumption? And indeed why not try to achieve similar effects in print, thereby broadcasting the message?

Updates, 31 January 2010: In correspondence, Elihu Gerson points to an interesting-looking book relevant to the social-use explanation.

Also, it seems I should clarify that I am not asking why (as Vukutu puts it) "people desire original works of visual art rather than printed reproductions". If you are going to paint in oils on canvas, then of course making a flat print of the result going to lose some detail of the physical object, and those details might contribute in important ways to people's experience of the object; there might be a real esthetic loss to looking at a reproduction of a painting. What I am asking is why then we do not produce artworks which are designed for reproduction. Or rather, we do produce lots of such art, but it's not seen as very valuable, and generally not even real art in the honorific sense. "Printed reproductions of physical paintings lose valuable details" does not answer "Why did our visual arts continue to focus on making one-off works?", unless you perhaps you add some extra premises, like (i) no print-reproducible image could be as esthetically valuable as a three-dimensional painting, and (ii) that difference in intrinsic quality was extremely important to the people who consumed art, and I am very dubious about both of these.

Finally, I don't think it's sufficient to point to "tradition", since traditions change all the time. That deserves another argument, but another time. In lieu of which, I'll just offer a quotation from a favorite book, Joseph (Abu Thomas) Levenson's Confucian China and Its Modern Fate; he is writing about ideas, but as he makes clear, what he says applies just as much to aesthetic or practical choices as to intellectual ones.

With the passing of time, ideas change. This statement is ambiguous, and less banal than it seems. It refers to thinkers in a given society, and it refers to thought. With the former shade of meaning, it seems almost a truism: men may change their minds or, at the very least, make a change from the mind of their fathers. Ideas at last lose currency, and new ideas achieve it. If we see an iconoclastic Chinese rejection, in the nineteenth and twentieth centuries, of traditional Chinese beliefs, we say that we see ideas changing.

But an idea changes not only when some thinkers believe it to be outworn but when other thinks continue to hold it. An idea changes in its persistence as well as in its rejection, changes "in itself" and not merely in its appeal to the mind. While iconoclasts relegate traditional ideas to the past, traditionalists, at the same time, transform traditional ideas in the present.

This apparently paradoxical transformation-with-preservation of a traditional idea arises form a change in its world, a change in the thinker's alternatives. For (in a Taoist manner of speaking) a thought includes what its thinker eliminates; an idea has its particular quality from the fact that other ideas, expressed in other quarters, are demonstrably alternatives. An idea is always grasped in relative association, never in absolute isolation, and no idea, in history, keeps a changeless self-identity. An audience which appreciates that Mozart is not Wagner will never hear the eighteenth-century Don Giovanni. The mind of a nostalgic European medievalist, though it may follow its model in the most intimate, accurate detail, is scarcely the mirror of a medieval mind; there is sophisticated protest where simple affirmation is meant to be. And a harried Chinese Confucianist among modern Chinese iconoclasts, however scrupulously he respects the past and conforms to the letter of tradition, has left his complacent Confucian ancestors hopelessly far behind him...

An idea, then, is a denial of alternatives and an answer to a question. What a man really means cannot be gathered solely from what he asserts; what he asks and what other men assert invest his ideas with meaning. In no idea does meaning simply inhere, governed only by it degree of correspondence with some unchanging objective reality, without regard to the problems of its thinker. [pp. xxvii--xxviii; for context, this passage was first published in 1958]

*: With apologies to the blogger formerly known as "the blogger formerly known as 'The Statistical Mechanic' ".

Manual trackback: Mostly Hoofless; 3 Quarks Daily; Cliopatria (!); Vukutu.

Writing for Antiquity

Posted by crshalizi at January 19, 2010 22:01 | permanent link

Three-Toed Sloth:   Hosted, but not endorsed, by the Center for the Study of Complex Systems