June 30, 2009

Books to Read While the Algae Grow in Your Fur, June 2009

Walter Jon Williams, This Is Not a Game
Williams has been one of my favorite writers since I read Knight Moves in college, and I keep expecting him to get Discovered. With any luck, this novel — a techno-thriller about alternate reality games, gold-farming, third-world currency crises, social engineering on every possible scale, revenge, the power of story-telling, betrayal, moments of ad hoc solidarity, and generally what it's like to live in the early twenty-first century (only turned up to 10.5) — will do the trick. (But Dread Empire's Fall is remarkable too, to say nothing of the rest of his back-list.)
Update: see also Jo Walton.
Friedrich T. Sommer and Andrzej Wichert (eds.), Exploratory Analysis and Data Modeling in Functional Neuroimaging
Solid, if now a little dated. The chapters I found most interesting were those by Tang and Pearlmutter on independent component analysis, and by Tolias, Kourtzi and Logothetis on exploiting adaptation (habituation) to stimuli.
Kat Richardson, Underground
John Kenneth Galbriath, The Great Crash: 1929
I first read this book for a high school history class in October 1987, just before everything went ker-splat. It really does hold up exceptionally well: the origins of the bubble, its character, its collapse, and its aftermath, all elegantly laid out.
(One thing I've noticed from re-reading Galbraith is that I've gone from simply enjoying the way he could write to admiring it. Creating a style like that, maintaining it consistently and deploying it flexibly is hard.)
Laura E. Reeve, Peacekeeper
Mind-candy; about using alcohol to avoid dealing with guilt over having probably committed war crimes, but mind-candy. (Not only does the cover have nothing to do with the contents of the book, the heroine is clearly described in the first chapter and looks nothing like that.) Update: sequel.
Gillian Tett, Fool's Gold: How the Bold Dream of a Small Tribe at J. P. Morgan Was Corrupted by Wall Street Greed and Unleashed a Catastrophe
Full-length review: The Tragedy of Getting What You Want.

Books to Read While the Algae Grow in Your Fur; Scientifiction and Fantastica; Enigmas of Chance; The Dismal Science; Minds, Brains, and Neurons

Posted by crshalizi at June 30, 2009 23:59 | permanent link

June 19, 2009

Page vs. Shalizi in Live

From now until 7:30, I am hosting a "salon" on Scott Page's The Difference at firedoglake. I would like to thank United Airlines for making sure that this announcement could not be made with any sort of reasonable lead-time.

(Title a silly reference to one of my favorite albums of all time.)

Posted by crshalizi at June 19, 2009 17:00 | permanent link

June 16, 2009

Because You Really Wished I'd Write More About Books

... I present for your amusement my reviews of Flynn on the Flynn Effect, and Tett on synthetic collateralized debt obligations and their kin. The former is the original version of something which had to be drastically cut down to fit into American Scientist; the latter appears nowhere else. (But, y'know, make me an offer.)

Anyone who tries to take this as an opportunity to drag me back into arguing about IQ will be ignored.

IQ; The Great Transformation; The Dismal Science; The Continuing Crises; Self-centered

Posted by crshalizi at June 16, 2009 15:37 | permanent link

On the Certainty of the Bayesian Fortune-Teller

Attention conservation notice: 2300 words of technical, yet pretentious and arrogant, dialogue on a point which came up in a manuscript-in-progress, as well as in my long-procrastinated review of Plight of the Fortune Tellers. Why don't you read that book instead?

Q: You really shouldn't write in library books, you know; and if you do, your marginalia should be more helpful, or less distracting, than just "wrong wrong wrong!"

A: No harm done; my pen and I are both transparent rhetorical devices. And besides, Rebonato is wrong in those passages.

Q: Really? Isn't his point that it's absurd to pretend you could actually estimate a something like a probability of an interest rate jump so precisely that there's a real difference between calling it 0.500 000 and calling it 0.499 967? Isn't it yet more absurd to think that you could get the 99.5 percent annual value-at-risk — the amount of money you'd expect to lose once in two thousand years — down to four significant figures, from any data set, let alone one that covers just five years and so omits "not only the Black Death, the Thirty Years' War, the Barbarian invasions, and the fall of the Roman Empire, but even the economic recession of 1991 — the only meaningful recession in the last twenty years" (as of 2006), to say nothing of the "famous corporate loan book crises of the Paleochristian era" (p. 218)?

A: Of course all that's absurd, and Rebonato is right to call people on it. By the time his book came out it was too late to do much good, but if people had paid attention to such warnings I dare say we wouldn't be quite so badly off now, and they had better listen in the future.

Q: So what's your problem. Oh, wait, let me guess: you're upset because Rebonato's a Bayesian, aren't you? Don't bother, I can tell that that's it. Look, we all know that you've got objections to that approach, but at this point I'm starting to think that maybe you have issues. Isn't this sort of reflexive hostility towards a whole methodology — something you must run into every day of work — awkward and uncomfortable? Embarrassing, even? Have you thought about seeking help?

A: Actually, I have a serious point to make here. What Rebonato wants is entirely right-headed, but it fits very badly with his Bayesianism, because Bayesian agents are never uncertain about probabilities; at least, not about the probability of any observable event.

Q: But isn't Bayesianism about representing uncertainty, and making decisions under uncertainty?

A: Yes, but Bayesian agents never have the kind of uncertainty that Rebonato (sensibly) thinks people in finance should have.

Q: Let me try to pin you down in black and white. [Opens notebook] I have here on one side of the page our old friend, the well-known the probability space Omega F. Prob. Coming out of it, in the middle, is a sequence of random variables X1, X2, ... , Xn, ... , which have some joint distribution or other. (And nothing really depends on its being a sequence, I could use a random field on a network or whatever you like, add in covariates, etc.) On the other side of the random variables, looking at them, I have a standard-issue Bayesian agent. The agent has a hypothesis space, each point m of which is a probability distribution for the random sequence. This hypothesis space is measurable, and the agent also has a probability measure, a.k.a. prior distribution, on this space. The agent uses Bayes's rule to update the distribution by conditioning, so it has a sequence of measures D0, D1, etc.

A: I think you are missing an "As you know, Bob", but yes, this is the set-up I have in mind.

Q: Now I pick my favorite observable event f, a set in the joint sigma-field of the Xi. For each hypothesis m, the probability m(f) is well-defined. The Bayesian thinks this is a random variable M(f), since it has a distribution D on the hypothesis space. How is that not being uncertain about the probability of f?

A: Well, in the first place —

Q: I am not interested in quibbles about D being a Dirac delta function.

A: Fine, assume that D doesn't put unit mass on any single hypothesis, and that it gives non-zero weight to hypotheses with different values of m(f). But remember how Bayesian updating works: The Bayesian, by definition, believes in a joint distribution of the random sequence X and of the hypothesis M. (Otherwise, Bayes's rule makes no sense.) This means that by integrating over M, we get an unconditional, marginal probability for f:

Pn(f) = EDn[M(f|X1=x1, X2=x2, ... , Xn=xn)]

Q: Wait, isn't that the denominator in Bayes's rule?

A: Not quite, that equation defines a measure — the predictive distribution — and the denominator in Bayes's rule is the density of that measure (with n=0) at the observed sequence.

Q: Oh, right, go on.

A: As an expectation value, Pn(f) is a completely precise number. The Bayesian has no uncertainty whatsoever in the probabilities it gives to anything observable.

Q: But won't those probabilities change over time, as it gets new data?

A: Yes, but this just means that the random variables aren't independent (under the Bayesian's distribution over observables). Integrating m with respect to the prior D0 gives us the infinite-dimensional distribution of a stochastic process, one which is not (in general) equal to any particular hypothesis, though of course it lies in their convex hull; the simple hypotheses are extremal points. If the individual hypothesis are (laws of) independent, identically-distributed random sequences, their mixture will be exchangeable. If the individual hypotheses are ergodic, their mixture will be asymptotically mean-stationary.

Q: Don't you mean "stationary" rather than "asymptotically mean-stationary"?

A: No; see chapter 25 here, or better yet that trifler's authority.

Q: You were saying.

A: Right. The Bayesian integrates out m and gets a stochastic process where the Xi are dependent. As far as anything observable goes, the Bayesian's predictions, and therefore its actions, are those of an agent which treats this stochastic process as certainly correct.

Q: What happens if the Bayesian agent uses some kind of hierarchical model, or the individual hypotheses are themselves exchangeable/stationary?

A: The only thing that would change, for these purposes, is the exact process the Bayesian is committed to. Convex mixtures of convex mixtures of points in C are convex mixtures of points in C.

Q: So to sum up, you're saying that the Bayesian agent is uncertain about the truth of the unobservable hypotheses (that's their posterior distribution), and uncertain about exactly which observable events will happen (that's their predictive distribution), but not uncertain about the probabilities of observables.

A: Right. (Some other time I'll explain how that helps make Bayesian models testable.) And — here's where we get back to Rebonato — all the things he is worried about, like values-at-risk and so forth, are probabilities of observable events. Put a Bayesian agent in the risk-modeling situation he talks about, and it won't just say that the 99.5% VaR is 109.7 million euros rather than 110 million, it will give you as many significant digits as you have time for.

Q: So let me read you something form p. 194--195:

Once frequentists accept (at a given statistical level of confidence) the point estimate of a quantity (say, a percentile), they tend to act as if the estimated number were the true value of the parameter. Remember that, for a frequentist, a coin cannot have a 40% chance of being biased. Either the coin is fair or it is biased. Either we are in a recession or we are not. We simply accept or reject these black-or-white statements at a certain confidence level... A Bayesian approach automatically tells us that a parameter (say, a percentile) has a whole distribution of possible values attached to it, and that extracting a single number out of this distribution (as I suggested above, the average, the median, the mode, or whatever) is a possibly sensible, but always arbitrary, procedure. No single number distilled from the posterior distribution is a primus inter pares: only the full posterior distribution enjoys this privileged status, and it is our choice what use to make of it.

This seems entirely reasonable; where do you think it goes wrong?

A: You mean, other than the fact that point estimates do not have "statistical levels of confidence", and that Rebonato has apparently forgotten about actual confidence intervals?

Q: Let's come back to that.

A: He is running together parameters of the unobserved hypotheses, and the properties of the predictive distribution on which the Bayesian acts. I can take any function I like of the hypothesis, g(m) say, and use it as a parameter of the distribution. If I have enough parameters gi and they're (algebraically) independent of each other, there's a 1-1 map between hypotheses and parameter vectors — parameter vectors are unique names for hypotheses. I could make parts of those names be readily-interpretable aspects of the hypothetical distributions, like various percentiles or biases. The distribution over hypotheses then gives me a distribution over percentiles conditional on the hypothesis M. But we don't know the true hypothesis, and on the next page Rebonato goes on to cast "ontological" doubt about whether it even exists. (How he can be uncertain about the state of something he thinks doesn't exist is a nice question.) We only have the earlier observations, so we need to integrate or marginalize out M, and this collapses the distribution of percentiles down to a single exact value for that percentile.

Q: Couldn't we avoid that integration somehow?

A: Integrating over the posterior distribution is the whole point of Bayesian decision theory.

Q: Let's go back to the VaR example. If you try estimating the size of once-in-two-thousand-year losses from five years of data, your posterior distribution has got to be pretty diffuse.

A: Actually, it can be arbitrarily concentrated by picking the right prior.

Q: Fine, for any reasonable prior it needs to be pretty diffuse. Shouldn't the Bayesian agent be able to use this information to avoid recklessness?

A: That depends on the loss function. If the loss involves which hypothesis happens to be true, sure, it'll make a difference. (That's how we get the classic proof that if the loss is the squared difference between the true parameter and the point estimate, the best decision is the posterior mean.) But if the loss function just involves what observable events actually take place, then no. Or, more exactly, it might make sense to show more caution if your posterior distribution is very diffuse, but that's not actually licensed by Bayesian decision theory; it is "irrational" and sets you up for a Dutch Book.

Q: Should I be worried about having a Dutch Book made against me?

A: I can't see why, but some people seem to find the prospect worrying.

Q: So what should people do?

A: I wish I had a good answer.. Many of Rebonato's actual suggestions — things like looking at a range of scenarios, robust strategies, not treating VaR as the only thing you need, etc. — make a lot of sense. (When he is making these practical recommendations, he does not counsel people to engage in a careful quantitative elicitation of their subject prior probabilities, and then calculate posterior distributions via Bayes's rule; I wonder why.) I would also add that there are such things as confidence intervals, which do let you make probabilistic guarantees about parameters.

Q: What on earth do you mean by a "probabilistic guarantee"?

A: That either the right value of the parameter is in the confidence set, or you get very unlucky with the data (how unlucky depends on the confidence level), or the model is wrong. Unlike coherence, coverage connects you to reality. This is basically why Haavelmo told the econometricians, back in the day, that they needed confidence intervals, not point estimates.

Q: So how did the econometricians come to make fetishes of unbiased point-estimators and significance tests of equality constraints?

A: No doubt for the same reason they became convinced that linear and logistic regression was all you'd ever need to deal with any empirical data ever.

Q: Anyway, that "get the model right" part seems pretty tricky.

A: Everyone is going to have to deal with that. (You certainly still have to worry about mis-specification with Bayesian updating.) You can test your modeling assumptions, and you can weaken them so you are less susceptible to mis-specification.

Q: Don't you get weaker conclusions — in this case, bigger confidence intervals — from weaker modeling assumptions?

A: That's an unavoidable trade-off, and it's certainly not evaded by going Bayesian (as Rebonato knows full well). With very weak, and therefore very defensible, modeling assumptions, the confidence interval on, say, the 99.5% VaR may be so broad that you can't devise any sensible strategy which copes with that whole range of uncertainty, but that's the math's way of telling you that you don't have enough data, and enough understanding of the data, to talk about once-in-two-thousand-year events. I suppose that, if they have financial engineers in the stationary state, they might eventually be able to look back on enough sufficiently-converged data to do something at the 99% or even 99.5% level.

Q: Wait, doesn't that suggest that there is a much bigger problem with all of this? The economy is non-stationary, right?

A: Sure looks like it.

Q: So how can we use statistical models to forecast it?

A: If you want someone to solve the problem of induction, the philosophy department is down the stairs and to the left.

Enigmas of Chance; The Dismal Science

Posted by crshalizi at June 16, 2009 09:50 | permanent link

Chaos, Complexity and Inference: 2009 Syllabus

See this earlier post or the course homepage for more. This should be an RSS feed for this page, so you can follow updates, which will generally be posted after lectures.

Final exam
take-home, due 12 May
Lecture 28, (Thursday, 30 April): Agents 4: A real-world example of agents on networks
Notes
Hedstrom and Aberg, "Quantitative research, agent-based modelling and theories of the social", ch. 6 (pp. 114--144) in Hedstrom, Dissecting the Social [PDF]; (*) Guttorp, ch. 4
Lecture 27, (Tuesday, 28 April): Networks 5: Community Discovery
No slides
Readings: Clauset, Moore and Newman, arxiv:0811.0484 (code); Girvan and Newman, arxiv:cond-mat/0112110; Hofman and Wiggins, arxiv:0709.3512; Newman, arxiv:physics/0605087; Reichardt and Bornholdt, arxiv:cond-mat/0603718;
Reichardt and White, arxiv:0708.0958 [discussion]
Lecture 26, (Thursday, 23 April): Complex networks 4: inference for network models
Slides
Reading: Hanneke and Xing, "Discrete Temporal Models of Social Networks" [PDF]; Handcock and Jones, "Likelihood-based inference for stochastic models of sexual network formation" [PDF]; Hunter, Goodreau and Handcock, "Goodness of Fit of Social Network Models" [PDF]; Middendorf, Ziv and Wiggins, "Inferring Network Mechanisms: The Drosophila melanogaster Protein Interaction Network", arxiv:q-bio/0408010; Newman, "Structure and Function", sections IV B and V; Newman, Strogatz and Watts, "Random graphs with arbitrary degree distributions and their applications", arxiv:cond-mat/0007235; Wiuf, Brameier, Hagberg and Stumpf, "A likelihood approach to analysis of network data", Proceedings of the National Academy of Sciences (USA) 103 (2006): 7566--7570 [discussion]
Lecture 25, (Tuesday, 21 April): Agents 3: social complexity
Slides
Reading: Flake, ch. 17; Miller and Page, chs. 10--11; Krugman, ch. 6; Salganik, Dodds and Watts, "Experimental study of inequality and unpredictability in an artificial cultural market", Science 311 (2006):854--856; (*) Skyrms and Pemantle, "A Dynamic Model of Social Network Formation", arxiv:math.PR/0404101
Lecture 24, (Thursday, 9 April): Complex networks 3: contagion on networks
slides
Reading: Guttorp, sec. 2.11; Newman, "Structure and Function", sec. VIII; Davis, Trapman, Leirs, Begon and Heesterbeek, "The abundance threshold for plague as a critical percolation phenomenon", Nature 454 (2008): 634--637; Bell, Maiden, Munoz-Solomando and Reddy, "'Mind control experiences' on the Internet: Implications for the psychiatric diagnosis of delusions", Psychopathology 39 (2006): 87--91 [PDF];
Smith and Novella, "HIV Denial in the Internet Era", PLoS Medicine 4:8 (2007): e256; (*) Kenah and Robins, "Second look at the spread of epidemics on networks", arxiv:q-bio.QM/0610057
Lecture 23, (Tuesday, 7 April): Agents 2: collective phenomena and self-organization
Slides
Reading: Flake, ch. 16; Miller and Page, ch. 9; Krugman, introduction and ch. 1; Healy, Walking to School; Perlstein, The Meaning of Box 722
Lecture 22, (Thursday, 2 April): Agent-based models 1
Slides
Reading: Miller and Page, chs. 6 and 7; Flake, ch. 12
Lecture 21, (Tuesday, 30 March): Complex networks 2: growth models
Slides
Reading: Newman, "Structure and function", sec. VII
Lecture 20, (Thursday, 26 March): Complex networks 1: basics, network properties
Slides
Definition of networks and basic terms. Pictures, emphasizing sex and death. Bipartite networks and the Galois lattice construction. The small world problem. What makes a network complex? The Erdos-Renyi model: first properties and weaknesses.
Watts, "The 'New' Science of Networks", Annual Review of Sociology 30 (2004): 243--270
Newman, "The Structure and Function of Complex Networks", SIAM Review 45 (2003): 167--256, arxiv:cond-mat/0303516 (through sec. VI, but skipping or skimming IV B and V)
Lecture 19, (Tuesday, 24 March): Inference from simulations 2: direct and indirect estimation
Slides and R
The method of simulated moments. Example with the logistic map. Issues with the MSM. The trade-off between tractable models and good models. The indirect inference trick: don't match moments, match a bad-but-tractable model. Examples of indirect inference. Convergence properties.
Reading: Smith, "Indirect Inference"; Kendall et al., "Population Cycles in the Pine Looper Moth: Dynamical Tests of Mechanistic Hypotheses"; (*) Gourieroux, Monfort and Renault, "Indirect Inference" [JSTOR]
Lecture 18, (Thursday, 19 March): Inference from simulations 1
The virtues and limits of simulations, especially stochastic simulations. Active testing: deliberately trying to break your model to gain (or lose!) confidence in it. Bootstrapping once again.
Reading: ch. 5 and appendix B of Miller and Page; Miller, "Active Nonlinear Tests (ANTs) of Complex Simulation Models"
Lecture 17, (Tuesday, 17 March): Inference in general: error statistics and severe testing
Slides
Errors and reliable inference. Statistics as principle argument: mostly arguments that certain errors are (or are not) absent or small, unless we're very unlucky. Kinds of errors in statistical inferences. Construction of reliable hypotheses: confidence sets, "probably approximately correct", more. Severe testing. Null hypotheses. Testing for mis-specification, especially by looking at residuals. The two chief world systems.
Reading: Mayo and Cox, "Frequentist statistics as a theory of inductive inference", arxiv:math.ST/0610846; Mayo and Spanos, "Methodology in Practice: Statistical Misspecification Testing" [PDF]; idem, "Severe Testing as a Basic Concept in a Neyman-Pearson Philosophy of Induction" [journal]; Spanos, "Curve-Fitting, the Reliability of Inductive Inference and the Error-Statistical Approach" [journal]
Lecture 16, (Thursday, 5 March): Heavy-tailed distributions 4: Comparing models
Slides, R
Goodness-of-fit tests: Glivenko-Cantelli, "the fundamental theorem of statistics"; the Kolmgorov-Smirnov test; Monte Carlo version of the K-S test when parameters are estimated from data; general remarks on goodness-of-fit. Relative distribution methods for comparing distributions: their virtues; using relative distributions to check power laws; HTTP file size example. Vuong's likelihood-ratio test for model selection. The flight of the albatross. Zombie-hunting.
Readings: Clauset et al. continued; Handcock and Morris, "Relative Distribution Methods"; Freckleton and Sutherland, "Do power laws imply self-regulation?"; idem, "Do in-hospital waiting lists show self-regulation?" (thanks to Nick Watkins for point out those papers); Edwards et al. "Revisiting Lévy flight search patterns of wandering albatrosses, bumblebees and deer"; (*) Vuong, "Likelihood Ratio Tests for Model Selection and Non-Nested Hypotheses"
Lecture 15 (Tuesday, 3 March) Heavy-tailed distributions 3: Estimation
Slides
The right way to do it: maximum likelihood estimation for power laws; properties of the MLE. Nonparametric bootrstrapping for error estimation. The wrong way to do it: linear regression on a log-log plot; why it should never be used. Determining the scaling range. Examples with city sizes. Non-parametric density estimation, and special considerations for heavy-tailed distributions.
Readings: Clauset, Shalizi and Newman, "Power law distributions in empirical data", arxiv:0706.1062; (*) Markovitch and Krieger, "Nonparametric estimation of long-tailed density functions and its application to the analysis of World Wide Web traffic"
Lecture 14, (Thursday, 26 February): Heavy-tailed distributions 2: how they arise
Slides
"How the distributions got their tails". Deflating explanations: measuring ex instead of x; measuring x1/a instead of x; the central limit theorem for multiplicative terms. Everyday explanations: exponential growth from random times; why isn't Acoma, N.M., not the largest city in the US?; the Yule-Simon mechanism for random growth. Exciting and mysterious explanations: why physicists expect everything to follow a Gaussian, except at phase transitions; self-organized criticality; a skeptical note.
Readings: Newman, "Power laws", section IV; Krugman, ch. 3 and the last part of ch. 8; Mitzenmacher, "A Brief History of Generative Models for Power Law and Lognormal Distributions"; video of Mitzenmacher giving a talk on this material; Sornette, "Mechanism for Powerlaws without Self-Organization", arxiv:cond-mat/0110426
Lecture 13, (Tuesday, 24 February): Heavy-tailed distributions 1: what they are
Slides and R
Highly skewed and heavy tailed distributions: definitions. Real-data examples: words, cities, money, papers, phone calls, the Internet, earthquakes, wars, terrorism, solar flares ("die, puny humans!"). Models: the pure power laws (Pareto and Zipf distributions). Properties of power laws: scale-freedom, 80-20, order statistics. Inequality of condition in America. Modifications: generalized Pareto, Yule-Simon. Non-power-law distributions: truncated power laws, log-normal, stretched exponential/Weibull distribution.
Newman, "Power laws, Pareto distributions and Zipf's law", arxiv:cond-mat/0412004 (through section III)
Lecture 12, (Thursday, 19 February): Self-organization 2
Slides
Readings: Shalizi, Klinkner and Haslinger, "Quantifying Self-Organization with Optimal Predictors", arxiv:nlin.AO/0409024; Shalizi, Haslinger, Rouquier, Klinkner and Moore, "Automatic Filters for the Detection of Coherent Structure in Spatiotemporal Systems", arxiv:nlin.CG/0508001
Lecture 11, (Tuesday, 17 February): Cellular automata 2: excitable media; mechanisms and abstractions
Slides
The heart as an assemblage of excitable elements. Excitable media as an abstract mechanism. Philosophical excursus on "abstraction" and "mechanism". Examples of excitable media. Chemical oscillators: Belousov-Zhabotinsky type. Chemical oscillators: Turing type. City growth as Turing morphogenesis. Slime mold morphogenesis. The spiral ice-canyons of Mars. Cellular automata models of excitability: Greenberg-Hastings, cyclic CA. Some math for these models.
Readings: Fisch, Gravner and Griffeath, "Threshold-range scaling of excitable cellular automata" [PDF]; idem, "Cyclic Cellular Automata in Two Dimensions" [zipped PostScript]; Greenberg and Hastings, "Spatial Patterns for Discrete Models of Diffusion in Excitable Media" [JSTOR]; Griffeath, "Self-Organization of Random Cellular Automata: four snapshot" [zipped PostScript]; Krugman, ch. 2 and first part of ch. 8; ch. 4 of Guttorp
Lecture 10, (Thursday, 12 February): Cellular automata 1
Slides
What cellular automata are; basic elements. Examples: diffusion-limited aggregation, lattice gases, majority vote, Life. Mechanical self-reproduction.
Readings: Flake, ch. 15; Miller and Page, ch. 8
Lecture 9, (Tuesday, 10 February): Self-organization 1, some examples
Slides
Levels of description; micro/macro distinctions, aggregation. Emergence of higher levels from lower levels; how to be a good reductionist. Self-organization. Excursus into thermodynamic entropy and the second law of thermodynamics. How self-organization is compatible with the second law. Some examples from fluids. Classroom demo with fluids.
Reading: Miller and Page, ch. 4
Office of Charles and Ray Eames, Powers of Ten, narration by Philip Morrison
Lecture 8, (Thursday, 5 February): Randomness and determinism
Slides
Data compression and description length. Algorithmic information content as the length of the shortest effective description. Random sequences are incompressible; incompressible sequences have all the testable properties of randomness. "Any signal distinguishable from noise is insufficiently compressed." Algorithmic information content is uncomputable. Dynamical systems as sources of randomness; Brudno's theorem connecting algorithmic information content to entropy rate and so to Lyapunov exponents. "Any determinism distinguishable from randomness is insufficiently complicated."
Side-note: Algorithmic Information Content and Marginal Entropies
Readings: Flake, ch. 14; Smith, ch. 11; Poincaré, "Chance", from Science and Method [PDF]
Lecture 7 (Tuesday, 3 February): Information Theory
Slides
Entropy of random variables, interpretations (randomness, variability, uncertainty, coding). Mutual information, statistical independence, Markov properties. Relative entropy, a.k.a. Kullback-Leibler divergence. Relative entropy measures how easy it is to tell two distributions apart: statistical uses in sampling, large deviations, estimation (via Fisher information). Divergence is is negative expected log likelihood; maximum likelihood attempts to minimize divergence. Entropy rates measure dynamical randomness; connection between metric entropy rate, topological entropy rate, and Lyapunov exponents. Asymptotic equipartition; convergence of log likelihoods to their expected values.
Reading: Feldman, "Information Theory" [PDF]
M.C. Hawking, "Entropy", from Fear of a Black Hole [lyrics; mp3 (radio-safe Brief History of Rhyme version)]
Ray and Charles Eames, A Communications Primer (via Britta Gustafson)
Lecture 6 (Thursday, 29 January): Statistical Inference for Discrete Random Sequences
Slides
Inference for Markov chains: the likelihood function; the maximum likelihood estimate of the transition matrix; its convergence; MLE for parameterized Markov chains. Error estimation and hypothesis testing for Markov chains. Parametric bootrstrapping. Higher-order Markov chains. Random functions of Markov chains; stochastic finite automata and their inference. Variable length Markov chains. CSSR.
Handout: Maximum Likelihood Estimation for Markov Chains
Readings :Guttorp, 2.7--2.9 and 2.12 (I, II, III); Smith, ch. 10; Foulkes, "A Class of Machines Which Determine the Statistical Structure of a Sequence of Characters" [PDF]; (*) Smith, "The Maintenance of Uncertainty" [PDF]
Lecture 5 (Tuesday, 27 January): Symbolic Dynamics
Slides
Discrete coarse-grainings of the continuous state space. Symbol sequences determined by initial conditions. The shift map; moving the complexity from the update rule to the space. Motivation: "Continuous math is hard; let's go discretize." Refinement of partitions under dynamics. Generating partitions; discretizing continuous dynamics without loss of information. Symbolic dynamics as discrete stochastic processes; the full logistic map as the Bernoulli process. Allowed and forbidden words. The topological entropy rate; how many things can your process do? Formal languages as tools for describing discrete sequences ("we have ways of making your dynamics talk"). Regular expressions and their grammars. Regular expressions and their machines ("circles and arrows and a paragraph on the back of each one"). Examples of languages and machines from the logistic map. Finite-type vs. sofic processes.
Handout: More on the topological entropy rate
Readings: Daw, Finney and Tracy, "A review of symbolic analysis of experimental data" [reprint]
Lecture 4 (Thursday, 22 January): Attractor Reconstruction and Prediction
Slides, lorenz-generator.pl
"Geometry from a time series": how "time-delay embedding" and Takens's theorem let us reconstruct attractors without knowing equations. An example with the Lorenz system. Attractor reconstruction as a form of inference. Change of coordinates, change of observables. Choice of reconstruction settings, false nearest neighbor method. Prediction in the reconstructed state space. Nearest neighbor prediction; an example with the Lorenz system. Picking predictive settings via cross-validation.
Side-notes: "Smooth change of coordinates"; Nonlinear predictors
Readings: Kantz and Schreiber, chs. 3--4; Smith, chs. 7--9.
Lecture 3 (Tuesday, 20 January): Attractors and Mixing
Slides and R
Attractors as generalizations of fixed points and limit cycles. Geometry of attractors: where do trajectories go? The Hénon map, our first multi-dimensionsional dynamical system. Multiple dimensions and memory. Stretching and folding again. Stable and unstable directions. The Lorenz equations, our first continuous-time dynamical system. Lyapunov exponents to measure stability and instability. Probability on attractors. Invariant distributions again. Subtleties of convergence to the invariant distribution. Mixing and decay of correlations. A central limit theorem for mixing processes. CLT behavior of the logistic map.
Readings: Flake, ch. 11; Miller and Page, chs. 1--3; Smith, chs. 4--6.
Lecture 2 (Thursday, 15 January): More dynamics
Slides and R
Stability of fixed points and cycles. Bifurcations; bifurcation diagrams; the period-doubling route to chaos; periodic windows within chaos. Devaney's definition of chaos; how it implies sensitive dependence; stretching and folding. The Arnold cat map. Intermittency. More ergodicity: the evolution of densities and the Frobenius-Perron operator; time averages; the world's simplest ergodic theorem.
Readings: Flake, ch. 10 and sec. 11.1; Guttorp, ch. 1; Smith, chs. 1--3.
The Arnold Cat Map Movie (starring Marlowe the Cat, directed by Evelyn Sander)
Lecture 1 (Tuesday, 13 January): Introduction to the course; starting with dynamics
Slides and R
Administrivia. What is a model? What is a simulation? What is a dynamical system? What is chaos? First acquaintance with the logistic map. Fixed points and cycles. Exploring some properties of chaos.

Corrupting the Young; Complexity; Enigmas of Chance

Posted by crshalizi at June 16, 2009 09:24 | permanent link

Three-Toed Sloth:   Hosted, but not endorsed, by the Center for the Study of Complex Systems