January 31, 2009

Books to Read While the Algae Grow in Your Fur, January 2009

Charles Tilly, Trust and Rule
Tilly's principal book on "trust networks", how they sustain themselves or fail to do so, and how they relate to states and other forms of political power. Trust networks "consists of ramified interpersonal connections, consisting mainly of strong ties, within which people set valued, consequential, long-term resources and enterprises at risk to the malfeasance, mistakes, or failures of others" (p. 12). This defines trust as a relationship, one of exposing oneself to serious risk through another's malice or mistakes, a definition which is pointedly silent about the feelings accompanying the relationship. I think this nails it.
Tilly goes on (p. 13, italics his):
Most networks support little or no trust. We will sometimes recognize segments of networks that qualify as trust-connected cliques. But the networks of drug use, blood distribution, and sexual contact through which HIV spreads, the networks through which routine political information flows, and the networks established by shared membership in voluntary associations mostly do not qualify. More generally, single-stranded networks containing few triads and sustaining little intimacy among their nodes rarely or never become trust networks.

Characteristic enterprises in which trust networks figure importantly include cohabitation, procreation, provision for children, transmission of property, communication with supernatural forces, joint control of agricultural resources, long-distance trade, protection from [human] predators, maintenance of health, and collective response to disaster. With marked variation from setting to setting, trust networks often take the forms of religious sects and solidarities, lineages, trade diasporas, patron-client chains, credit networks, mutual aid societies, age grades, and local communities.

This is all very good, and I also like that Tilly does not romanticize trust networks, being explicit that terrorist cells, pirates, Russian mobsters, etc., all qualify, and that (as these examples suggest) the risky undertaking enabled by a trust network can be preying on others. Even without that: "Powerful figures within trust networks sometimes tyrannize their members: instill strange beliefs in them, put them through painful initiations, force youngsters into distasteful careers, require shows of respect for unworthy elders, murder young women who challenge their sexual or marital prescriptions. By no means does membership in a trust network guarantee happiness, much less freedom." Still, he convinces the reader, or at least me, that trust networks are enduring and important parts of society. He also offers some intriguing generalizations, like the one about the importance of triads in the network graph. (I think, but I don't recall him ever quite saying this explicitly, this is because triads make monitoring and reputation possible.) There are, as usual, many excellently-presented historical cases, spanning the globe and the centuries.
Nonetheless, I find myself less than fully satisfied. (1) Nobody except Tilly talks about "trust networks", at least not in this sense, and we rarely have historical cases where we can identify them with any precision. So there is a lot of guesswork here. (2) Tilly's stories about the kinds of mechanisms at work sound plausible, as usual, but I despair of ever being able to test them. (3) He offers no guidance about when we should expect different mechanisms to be engaged. Perhaps, to be fair, no guidance can be offered at this level — perhaps it always depends on high-precision historical details. (More minorly: [4], Tilly sometimes, as on p. 81, insists that the ties linking members of a trust network must be a relationship for which the participants have a name: why? I don't even think all his examples meet this criterion. [5] This already-brief book would have been ever shorter if he didn't repeat his definitions of terms over and over.) I can't help but think that Tilly's dislike of game theory may have been a liability.
It's interesting to think about science in terms of trust networks. Scientific collaboration is placing "valued, consequential, long-term resources and enterprises" — viz., the scientist's reputation and career — "at risk to the malfeasance, mistakes, or failures" of the scientist's collaborators. Might this go some way towards explaining features of scientific collaboration networks, like the very high density of triads, and persistent collaborative cliques? As for "strange beliefs", "painful initiations", "distasteful careers", and "deference to unworthy elders" (but fortunately not murder), the jokes draw themselves.
Dave Lapham, Silverfish
Noir crime-fiction in comic-book form, with teenage protagonists. Well-told and well-drawn.
Lois McMaster Bujold, The Sharing Knife, vol. 3, Passage
Mathukumalli Vidyasagar, Learning and Generalization: With Applications to Neural Networks
A very nice textbook on statistical learning theory (a la Vapnik) which, properly, treats it as a branch or extension of empirical process theory, and emphasizes function learning (instead of just classifier learning). Among the many nice features here is the recognition that data in the real world are dependent, and a discussion of conditions under which learning procedures designed for independent data will still work with dependent data, albeit with an efficiency penalty reflecting how quickly correlations decay. (Beta-mixing, for example, is sufficient but not necessary; an interesting open question is what the necessary and sufficient conditions on a mixing process are for probably-approximately-correct learning to remain possible.) Vidyasagar is also good at building connections to computational learning theory, which introduces considerations of time- and sample- complexity.
No prior knowledge of learning theory or even of measure-theoretic probability is required; all the necessary mathematical material is built here. Basic mathematical maturity, of the kind one would expect from graduate students in statistics, computer science, electrical engineering, physics, economics, etc., is essential.
The last two chapters consider, respectively, neural networks and problems in control theory. (Despite the back-cover blurb, support vector machines are discussed on only one page.) The neural network chapter is fairly self-contained, but the control chapter will be largely incomprehensible to those without previous exposure to the subject. This is a shame, since it contains about randomized algorithms for probably-approximately-correct solutions to intractable problems, and about systems identification, which would be of interest to readers whose eyes will glaze over (to say the least) at the sight of transfers functions. This is, however, at most a minor flaw.
If I were teaching a class on statistical learning theory, I would definitely consider using this book.
(Thanks to Dr. Vidyasagar for some interesting correspondence, which prompted me to read his book.)
James K. Galbraith, The Predator State: How Conservatives Abandoned the Free Market and Why Liberals Should Too
I really have nothing to important to add to Aaron Swartz's summary; other than to say read this book. (I am buying copies for friends and relatives.)
(On an un-important note, I think it's inevitable it is unfair but inevitable to compare this J. K. Galbraith writing about economics and public purposes to the other one, who happens to have been his father. [Or it's at least inevitable that I'd make the comparison, since the elder Galbraith is one of my heroes.] This book is in some ways an act of filial piety, losing few opportunities to point out places where the senior Galbraith has been vindicated by events. More broadly, its great theme is the collapse of what JKG I called "countervailing power", the thing which made American capitalism tolerable and even progressive — or, more exactly, the deliberate destruction of such countervailing power. For the most part this Galbraith avoids his father's style, — smooth as silk, and as hard to produce — in favor of more workmanlike prose; there are a few places, but only a few, where he is positively infelicitous, in ways his father would never have allowed into print.)
Dan Simmons, The Terror
Historical horror fiction, based on the Franklin expedition in search of the northwest passage of the 1840s. Some recurring Simmons themes (e.g., the characters with the unlikely fondness for classical Greek (who he should not have had discuss natural selection; I can't decide whether this is a greater offense against historical plausibility or against sheer narrative flow), and the contrast between adapted indigenous cultures (here, the "Esqimaux") and blundering, greedy, self-defeating westerners, though he doesn't hit the reader over the head with that last quite as bluntly as in his master-work Hyperion. (Oddly, non-western high civilizations come off very poorly in Simmons's fiction, as in the brilliant and terrifying yet borderline-racist Song of Kali.) Creepy and intensely compelling.
Steven Johnson, The Invention of Air: A History of Science, Faith, Revolution, and the Birth of America
Popular biography of Joseph Priestley. Johnson tries hard to keep in view both Priestley's individual biography and the larger networks and movements he participated in, so in part this is a bit of a ramble through the 18th century English-speaking house of intellect, which is not a bad thing.
There is, as the subtitle indicates, special emphasis — more than perhaps is truly warranted — on his American connections. This is because the book is very much an attempt by Johnson to claim Priestly as part of a usable American past of Enlightenment progressivism, in which there is no tension between rational religion and scientific advance. There is nothing wrong with this — quite the reverse; this is a part of our national traditions, and we should emphasize it — but at the same time it leads Johnson into some choices in his writing which feel like they make this book more transient than it needed to be.
Annoyances: (1) the formulaic opening scene. (2) the idea that the physiological effects of coffee sparked the Age of Reason was something I tossed off as a joke, complete with counter-examples, several years ago. I am displeased to see this same conceit here (pp. 54ff), being taken seriously not just by Johnson but evidently by others — not, to be clear, because I think I deserve credit (I'm sure it's not original), but because it is stupid.
Errata: p. 20, for "1850s" read "1750s"; p. 22, for "mid-seventeenth" read "mid-eighteenth".
H. P. Lovecraft, The Tomb, and Other Tales
Someone has already expressed my sentiments in lolcat form. (To say the least, Red Hook's changed.) But despite that there is a certain power to the stories.
Jeffrey Alford and Naomi Duguid, Mangoes and Curry Leaves: Culinary Travels through the Great Subcontinent
Very good recipes and nice travel writing, plus lickable-looking photographs.
Hiroshi Kondo, The Book of Saké (a.k.a. Saké: A Drinker's Guide)
Social history, lore and etiquette, a lovingly geeky description of the fermentation process (complete with graphs!), and specific recommendations. (Many thanks to K. for the gift, and for the Yuki no Bosha junmai ginjo.)

Books to Read While the Algae Grow in Your Fur; Scientifiction and Fantastica; The Progressive Forces; Networks; The Dismal Science; Enigmas of Chance; Writing for Antiquity; The Great Transformation; Food The Running Dogs of Reaction; The Beloved Republic

Posted by crshalizi at January 31, 2009 23:59 | permanent link

January 30, 2009

Bayes < Darwin-Wallace

ALL YOUR BAYES ARE BELONG TO US
we share with him
Attention Conservation Notice: jargony, self-promotional ramblings about a new paper on the frequentist properties of non-parametric Bayesian procedures, which is exactly as statistics-geeky as it sounds. Plus a weird analogy to mathematical models of evolution. Even if this sounds vaguely interesting, you could always check back later and see if peer review exposed it as a tissue of fallacies.

Here's the new preprint:

CRS, "Dynamics of Bayesian Updating with Dependent Data and Misspecified Models", arxiv:0901.1342
Abstract: Recent work on the convergence of posterior distributions under Bayesian updating has established conditions under which the posterior will concentrate on the truth, if the latter has a perfect representation within the support of the prior, and under various dynamical assumptions, such as the data being independent and identically distributed or Markovian. Here I establish sufficient conditions for the convergence of the posterior distribution in non-parametric problems even when all of the hypotheses are wrong, and the data-generating process has a complicated dependence structure. The main dynamical assumption is the generalized asymptotic equipartition (or "Shannon-McMillan-Breiman") property of information theory. I derive a kind of large deviations principle for the posterior measure, and discuss the advantages of predicting using a combination of models known to be wrong. An appendix sketches connections between the present results and the "replicator dynamics" of evolutionary theory.

There are people out there who see Bayes's Rule as the key to all methodologies, something essential to rationality. I find this view thoroughly misguided and not even a regulative ideal. But I've written about that at great length elsewhere and won't repeat myself here.

While there are certainly plenty of statisticians who embrace the Bayesian way in all its Savagery (some of them will be on my tenure committee), I think it's fair to say that most of the time, Bayesian methods are not used in contemporary statistics and machine learning to impose coherence on statisticians', or computers', personal beliefs. When people like Andy Gelman, Aleks Jakulin &c. estimate logistic regressions with a particular default prior, what they are doing is really regularization, which is something that a frequentist like me can understand.

Actually, Bayesian inference is a regularization-by-smoothing technique, only in the space of probability distributions instead of the sample space. In ordinary least-squares regression, one asks "what straight line comes closest (in the Euclidean sense) to all the data points?" In subtler methods, one has a whole menagerie of possible curves, and asks which superposition of curves comes closest to the data points. To avoid over-fitting, one needs to keep the weight given to very wiggly basis curves under control, e.g., by simply ignoring possible terms which are not sufficiently smooth. (Through the magic of Lagrange multipliers, this is equivalent to penalizing wiggliness.) As one gets more and more data, finer and finer features of the regression curve can be reliably discerned, so the amount of imposed smoothing should be relaxed. For estimating regression curves, this is all thoroughly understood and even automated (making the persistence of linear regression a mystery; but another time).

All of this carries over to estimating entire distributions rather than just curves. In the space of probability distributions, the entire sample is a single point, the empirical distribution. (I'm ignoring some subtleties about time series here.) The family of models we're willing to consider forms a manifold (or set of manifolds; more details) in the space of probability measures. In this space, the right way to measure distance isn't the Euclidean metric, but rather the relative entropy, a.k.a. the Kullback-Leibler divergence, which generates the information geometry. The maximum likelihood estimate is simply the geometric projection of the empirical distribution on to the manifold of models. In other words, the most likely predictive distribution comes from this single point on the manifold. (There are cleverer ways to get frequentist predictive distributions, though.) The Bayesian posterior predictive distribution gives some weight to all points on the manifold; this is the smoothing. The weights are proportional to the prior, and also decline exponentially with the product of sample size and relative entropy. The effect of the prior is to blunt the impact of what we actually see, keeping us from simply trying to match it — exactly like the effect of smoothing. The amount of smoothing done by the prior declines as the sample size grows.

Why smooth or regularize in the first place? As every school-child knows, the answer lies in the bias-variance trade-off. (Pedantically, bias-variance is for mean-squared error, but other loss functions have similar decompositions.) Regularized estimates are less responsive to differences between data sets, which means that they have less variance. No matter what the data look like, the regularized estimates all have the same shape, so to speak. But unless reality also has that shape, this creates a systematic error, i.e., bias. So the more aggressively we smooth, the more we decrease variance, but the more we increase bias. If some degree of smoothing removes more variance than it adds bias, we come out ahead. Of course, the variance should fall automatically as we get more data, but the bias depends on the strength of smoothing, so the latter should shrink as the sample grows. The important thing in analyzing this type of estimation or prediction scheme is to check whether they relax their regularization at the right rate, so that they don't get killed by either bias or variance, but rather consistently converge on the truth, or at least the best approximation to the truth among the available models.

All of this applies to Bayesian learning. Like any other regularization scheme, it is a way of deliberately introducing bias into our inferences, not so much on Burkean/Quinean grounds that our prejudices providentially point to the truth, but simply to reduce variance, the extent to which our inferences are at the mercy of Fortune (in her role as the goddess of sampling fluctuations). The question about it then becomes whether it gets the trade-off right, and manages to converge despite its bias. In other words, when does Bayesian inference possess frequentist consistency?

A surprisingly large number of people have been satisfied with a result given by the great probabilist J. L. Doob, which essentially says that under some reasonable-looking conditions, the Bayesian learner's prior probability of being inconsistent is zero. (More exactly, the posterior probability of any set of models containing the truth goes to 1, except on a set of sample paths whose prior probability is zero.) Even taken at face value, this just says that each Bayesian is convinced a priori that they'll converge on the truth, not that they actually are almost sure to find the truth.

Examining the reasonable-looking conditions of Doob's result, it turns out that they entail the existence of a consistent non-Bayesian estimator. (Doob's original assumptions can actually be weakened to just the existence of such an estimator; see Schervish's Theory of Statistics, ch. 7.) It is a curious fact that every proof of the consistency of Bayesian learning I know of requires the existence of a consistent non-Bayesian estimator. (Though it need not be the maximum likelihood estimator.) There don't seem to be any situations where Bayesian updating is the only convergent route to the truth.

It turns out that Bayesian inference is not consistent in general. The late David Freedman and the very-much-alive Persi Diaconis showed that if you choose a prior distribution which is badly adapted to the actual data-generating process, your posterior distribution will happily converge on the wrong model, even though the rest of the set-up is very tame — independent and identically distributed data in a boringly well-behaved sample space, etc.

Still, there are many situations where Bayesian learning does seem to work reasonably effectively, which in light of the Freedman-Diaconis results needs explaining, ideally in a way which gives some guidance as to when we can expect it to work. This is the origin of the micro-field of Bayesian consistency or Bayesian nonparametrics, and it's here that I find I've written a paper, rather to my surprise.

I never intended to work on this. In the spring of 2003, I was going to the statistics seminar in Ann Arbor, and one week the speaker happened to be Yoav Freund, talking about this paper (I think) on model averaging for classifiers. I got hung up on why the weights of different models went down exponentially with the number of errors they'd made. It occurred to me that this was what would happen in a very large genetic algorithm, if a solution's fitness was inversely proportional to the number of errors it made, and there was no mutation or cross-over. The model-averaged prediction would just be voting over the population. This made me feel better about why model averaging was working, because using a genetic algorithm to evolve classifier rules was something I was already pretty familiar with.

The next day it struck me that this story would work just as well for Bayesian model averaging, with weights depending on the likelihood rather than the number of errors. In fact, I realized, Bayes's rule just is the discrete-time replicator equation, with different hypotheses being so many different replicators, and the fitness function being the conditional likelihood.

As you know, Bob, the replicator dynamic is a mathematical representation of the basic idea of natural selection. There are different kinds of things, the kinds being called "replicators", because things of one kind cause more things of that same kind to come into being. The average number of descendants per individual is the replicator's fitness; this can depend not only on the properties of the replicator and on time and chance, but also on the distribution of replicators in the population; in that case the fitness is "frequency dependent". In its basic form, fitness-proportional selection is the only evolutionary mechanism: no sampling, no mutation, no cross-over, and of course no sex. The result is that replicators with above-average fitness increase their share of the population, while replicators with below-average fitness dwindle.

This is a pretty natural way of modeling half of the mechanism Darwin and Wallace realized was behind evolution, the "selective retention" part — what it leaves out is "blind variation". Even with this omission, the replicator equation is a surprisingly interesting kind of dynamical system, especially when fitness is frequency-dependent, which opens up deep connections to evolutionary game theory. (ObBook.) Interestingly, however, Bayes is a very limited special case of the replicator equation, since fitness is frequency independent.

"Selective retention" is also the idea that lies behind reinforcement learning and Thorndike's law of effect. Crudely, these are all variations on the theme of "do more of what worked, and less of what didn't". Less crudely, there are at least three independent discoveries of how reinforcement learning, itself, leads to the replicator equation in the appropriate limit. So Bayesian updating isn't just a special case of evolutionary optimization; it's also something like habit formation.

Initially, I wrote this all up in the spring of 2003 and then set it aside, because, after making the formal connections, I didn't see what it was good for. Then, that summer, I went to the first "Science et Gastronomie" workshop, talked about this idea, and realized, from the conversations, that I actually could do something with it. The fitness function was going to end up being the relative entropy rate, and this would control where the posterior distribution concentrated in the long run. This would let me say something about the convergence of Bayesian learning with non-independent and even non-Markovian data, but also about what happens when the true distribution is not "in the support of the prior", i.e., when all the models really are wrong.

So I spent a lot of time in Lyon sweating over the asymptotic behavior of the integrated likelihood and so forth (literally, given the great heat-wave). By the end of the summer, I had versions of what are now Theorems 2, 3 and 4 in the paper. These say, respectively, that the posterior density goes to zero everywhere where fitness isn't maximized, and the rate is the fitness difference; that eventually the posterior distribution concentrates on the global peaks of the fitness landscape; and that the posterior distribution in a subset of model space not including those peaks is driven by the highest fitness in the subset. All of this was pretty nice and I was fairly pleased with myself.

Then I saw that if my results were correct, Bayesian updating should always be consistent. So I put the thing aside for a while....

Eventually I figured out what I was doing wrong; I was presuming that all the points in model-space were equally well-behaved. I was explicitly assuming that the (log) likelihood would eventually converge to the relative entropy for each hypothesis. This is on the one hand mathematically harmless (it's asymptotic equipartition), and on the other hand statistically I can't see how any likelihood-based method can hope to converge unless performance over a sufficiently long past is indicative of future results. (This is why such methods do not solve the problem of induction.) But I was further assuming, implicitly, that there was no way for the likelihood of a hypothesis with very high relative entropy to also converge very slowly. That is, I was assuming the Bayesian learner could not acquire bad habits. But of course it can; the bad habits just need to seem like good ideas at first. In human terms:

No one ever forgets how to do something that's worked for them in the past. Just replacing it with another behavior can be hard enough, and the old behavior is still going to be lurking there underneath it. Thieves keep stealing. Liars keep lying. Drunks never forget about chemically modifying their nervous systems.
Similarly, the Bayesian learner never forgets about the model which matches the data perfectly — until it turns out that the model had just memorized the first 13,127 observations, and then repeats them forever. When that model crashes, however, there is always another one which memorized the first 26,254 observations...

What's needed is some restrictions on the prior distribution which keep it from putting too much weight on these bad hypotheses, though actually in non-parametric problems one doesn't want to give them strictly zero weight, because, hey, maybe a particular sequence of 13,127 observations really will repeat forever. (Metaphorically: a little lying, stealing and drinking is part of a complete life.) We are back to regularization, which has the duty of promoting virtue and suppressing vice.

The most common ways of doing this divide the space of models into distinct classes, and then use statistical learning theory or empirical process theory to gauge the risk of over-fitting within these constrained subspace. Typical arguments involve things like showing that every hypothesis in the constrained set is close to one of a finite number of hypotheses which "cover" or "bracket" the space, which gives uniform convergence. Then one relaxes the constraint as more data arrives, according to some deterministic schedule ("the method of sieves") or to optimize the trade-off between actual performance and a bound on over-fitting ("structural risk minimization"), etc.

Existing proofs of Bayesian consistency in non-parametric problems basically work similarly. To simplify just a bit, the trick has two parts. The first is to find constraints on the hypotheses which are strong enough to ensure uniform convergence, but can be relaxed to include everything; this is what you'd do anyway if you wanted a consistent non-parametric procedure. (Some people don't explicitly posit constraint sets, but do other things with much the same effect.) The second is to construct a prior whose bias towards the most-constrained sets is strong enough to keep the wild, high-capacity parts of the hypothesis space from dominating the posterior distribution, but isn't so biased that the constraint can't be overcome with enough data.

This is what I ended up having to do. (There was a gap of several years between seeing that some constraint was needed, and seeing what constraint would work. I am, in fact, a sloth.) My constraint was, roughly, "the log likelihood converges at least this fast". Unfortunately, I wasn't able to express it in relatively nice terms, like covering numbers, though I suspect someone cleverer could replace it with something along those lines. (It's about one-sided deviations of time-averages from their limiting values, so it feels empirical-process-y.) Anyone who actually cares will actually be better served by reading the paper than by my trying to re-express it verbally.

I didn't figure out the right constraint to impose, and the right way to relax it, until the summer of 2008. (This was what was on my mind when I read Chris Anderson's ode to overfitting.) Once I did, everything else was pretty direct, especially since it turned out I could salvage most (but definitely not all) of what I'd done in Lyon. One of the bigger efforts in actually writing the paper was uniformly eliminating talk of replicators and fitness, except for a small appendix, in favor of more statistical jargon. (I hope I succeeded.) Adding an example, namely trying to use Markov chains to predict a sofic system, took about a day. This is an amusing case, because the posterior distribution never converges — you can always do strictly better by moving to Markov chains of higher order. Nonetheless, the (Hellinger) distance between the posterior predictive distribution and the actual predictive distribution goes to zero.

Having submitted this, I'm going to put it aside again until I hear from the referees, because there's a lot of other work which needs finishing. (When I do hear from the referees, I anticipate a certain amount of gnashing of teeth.) But in the meanwhile, I'll use this space to jot down some ideas.

Manual Trackback: 3 Quarks Daily

Thanks to Nicolas Della Penna, Shiva Kaul and Chris Wiggins for typo-spotting.

Enigmas of Chance; Self-Centered

Posted by crshalizi at January 30, 2009 22:47 | permanent link

January 29, 2009

Deep Thought (On the Ethical Relevance of Economic Analysis)

Restoring stolen goods to their rightful owners is not Pareto-improving.

The Dismal Science

Posted by crshalizi at January 29, 2009 09:08 | permanent link

January 20, 2009

"To choose our better history"

Now would be an excellent time to begin working like we are living in the early days of a better nation.

Let those who do justice and love mercy say amen.

The Beloved Republic; The Progressive Forces

Posted by crshalizi at January 20, 2009 23:11 | permanent link

January 10, 2009

Geoghegan for Congress

If, even a month ago, you had asked me who I'd most like to see elected to Congress, I would not have mentioned Tom Geoghegan, because it wouldn't have even occurred to me that it was possible. Had you asked me about him in particular, I would have replied something on the order of "We should be so lucky". But he's running in the special election to replace Rahm Emanuel, and, well, we the people should be so lucky. The estimable Kathy G. has explained why Geoghegan deserves the support of progressives everywhere. I have nothing to add, other than to say that (1) if you haven't read Which Side Are You On?, you should buy it right now and read it at once; and (2) I have just put my money where my mouth is.

The Progressive Forces; The Beloved Republic

Posted by crshalizi at January 10, 2009 21:45 | permanent link

But It's Only a Little Demon

Attention conservation notice: ~500 words of excessively cute foundations-of-statistical-mechanics geekery. Inspired by this post at The Statistical Mechanic.

I have here on the table before me my favorite classical-mechanical assemblage of interacting particles, with 2n degrees of freedom, n being a macroscopically large number. (The factor of 2 is both because there are always position and velocity degrees of freedom, and to avoid some factors of 1/2 later.) It is in turn part of a larger assemblage with many more degrees of freedom, say 2N. Both the smaller and larger assemblages are highly unstable dynamically, so I can expect statistical mechanics to work quite well. (Really, I can.) On the other hand, I presume that they are very thoroughly isolated from the rest of the universe, so I can ignore interactions with the outside. (Don't ask me how I know what's going on in there in that case, though.)

I have also an Aberdeen Mfg. Mk. II "Neat-fingered" Maxwellian demon, which is capable of instantaneously reversing all the velocities of the particles in the small assemblage (i.e., it can flip the sign of n velocity degrees of freedom). If I had a bigger research budget, I could have bought a Mk. V "Vast and Considerable" demon, which could reverse the whole assemblage's N velocity degrees of freedom, but I don't have to tell you about grants these days.

Now, with the Mk. V, I'd know what to expect: it's the old familiar myth about time's arrow running backwards: sugar spontaneously crystallizing out of sweetened coffee, forming granules and leaping out of the cup into the tea-spoon, etc. But the Mk. II isn't capable of reversing the arrow of time for the whole assemblage, just for part of it. And so there are N-n degrees of freedom in the larger assemblage whose arrow of time points the same way as before. So what happens?

My intuition is that at first the arrow of time is reserved in the small assemblage, leading to the local equivalent of coffee unsweetening. ("At first" according to who? Don't go there.) Eventually, however, interactions with the N-n unreversed degrees of freedom should bring the n degrees of freedom back into line. If interactions are spatially local, then I imagine the time-reversed region gradually shrinking. Mythologically: The sugar crystallizes and forms granules, say, and even starts to leap out of the cup, but neither the air molecules nor the spoon are in the right place at the right time to exactly take them back to sugar-jar, so they spill and make a mess, etc. More generally, an observer within the larger assemblage will first see a small region where, bizarrely, things happen in reverse, then a kind of hard-to-describe crawling molecular chaos, and then a restoration of the ordinary macroscopic natural order, albeit from a weird starting point. But this guess may be excessively shaped by the fluctuation-dissipation theorem. Does a single arrow of time have to get established at all? If so, how long does it typically take? (Intuition again, this time from large deviations: exponential in 2n-N.) Can the n reversed degrees of freedom ever impose their direction on the whole assemblage?

Somebody must have already looked into all this. Where?

Update, later that afternoon: I was probably unconsciously remembering this post by Sean Carroll. (Sean was very polite in pointing this out.) Also, John Burke answers my final "where" question "Budapest", which sounds about right.

Physics

Posted by crshalizi at January 10, 2009 13:45 | permanent link

January 08, 2009

Chaos, Complexity and Inference: 2009 Course Announcement

I will be teaching 36-462, "topics in statistics", in the spring. This is a special topics course for advanced undergraduates, intended to expose them ideas they wouldn't see going through the ordinary curriculum; this year, like last, the subject will be "chaos, complexity and inference". It worked pretty well last time, though I am not sure what to make of the fact that half of those currently registered are graduate students from other departments...

Anyone interested in the readings, assignments and notes can follow them either at the class syllabus page, or from this post and its RSS feed. (Last year's post.) — There have been some requests (as in, more than one!) for podcasts of the lectures. If someone will point me at an idiot's guide to setting one up, and tell me about cheap but adequate microphones, I'm willing to try.

Description
This course will cover some key parts of modern theories of nonlinear dynamics ("chaos") and complex systems, and their connections to fundamental aspects of probability and statistics. By studying systems with many strongly-interacting components, students will learn how stochastic models can illuminate phenomena beyond the usual linear/Gaussian/independent realm, as well as gain a deeper understanding of why stochastic models work at all. Topics will include: chaos theory and nonlinear prediction; information; the distinction between randomness and determinism; self-organization and emergence; heavy-tailed and "scale-free" distributions; social and other complex networks, and the analysis of network data; interacting agents; and inference from simulations.
Full Syllabus
At the course webpage, together with links to readings
Venue
Tuesdays and Thurdays 12:00--1:20 in Scaife Hall 208. Office hours in 229C Baker Hall, Wednesdays 10--11 and Thursdays 4--5.
Required Textbooks
Gary William Flake, The Computational Beauty of Nature
John Miller and Scott Page, Complex Adaptive Systems
Leonard Smith, Chaos: A Very Short Introduction
Optional Textbooks
Peter Guttorp, Stochastic Modeling of Scientific Data
Paul Krugman, The Self-Organizing Economy
Andrew M. Fraser, Hidden Markov Models and Dynamical Systems
W. John Braun and Duncan J. Murdoch, A First Course in Statistical Programming with R (Use of R is not required, but ask before using other languages.)
Prerequisites
A previous course in mathematical statistics (such as 36-310, 36-401, or 36-625/626) and a course in probability including random processes (such as 36-217, 36-225/226, 36-410, or 36-625/626)
or consent of instructor.
(See the handout for more on required background.)
Some programming experience will be extremely helpful.

Corrupting the Young; Complexity; Enigmas of Chance

Posted by crshalizi at January 08, 2009 23:59 | permanent link

Three-Toed Sloth:   Hosted, but not endorsed, by the Center for the Study of Complex Systems