March 31, 2009

Books to Read While the Algae Grow in Your Fur, March 2009

Diana Pharaoh Francis, The Cipher and The Black Ship
Fantasy brain-candy. The first book does not, despite the title, involve cryptography.
Fall of Cthulhu, vol. 4, Godwar
Pointless if you have not been following along (1, 2, 3).
Carrie Vaughn, Kitty Raises Hell
Patricia Briggs, Bone Crossed
Dean Baker, Plunder and Blunder: The Rise and Fall of the Bubble Economy
The best thing I've read on the current crisis: short, plainly written, and totally accurate. (You can get a sense of its contents here.)
Robert Sharer with Loa Traxler, The Ancient Maya, 6th edition
Massive (~800 pp.) textbook on Maya archaeology, supplemented with ethnohistory and ethnography. Covers the whole period from first settlement through the Spanish conquests, though naturally emphasizing the Classic period (+250 to +900 or +1100, depending on where you are).
Taylor Anderson, Into the Storm, Crusade and Maelstrom
More enjoyable than a trilogy of military SF novels which could be summarized as "what these lemurs need is a boatload of vintage honkeys" has any right being.
Felix Gilman, Thunderer: A Novel of High Fantasy
The city itself as the enchanted realm, with lost, mad and exploited gods, airships, music, feral children, and philosophes writing an encyclopedia. (He realizes that the ecology makes no sense.)

Books to Read While the Algae Grow in Your Fur; Scientifiction and Fantastica; The Dismal Science; The Progressive Forces; The Continuing Crises; Writing for Antiquity; Cthulhiana

Posted by crshalizi at March 31, 2009 23:59 | permanent link

March 27, 2009

Another Idle Question

How many of the people currently pushing or exploiting conspiracy theories about the introduction of a global currency also claim to support returning to the gold standard?

(And where's Dan Sperber when we need him?)

The Dismal Science; The Running Dogs of Reaction; Psychoceramics

Posted by crshalizi at March 27, 2009 21:10 | permanent link

March 26, 2009

Some Bayesian Finger-Puzzle Exercises, or: Often Wrong, Never In Doubt

Attention conservation notice: Clearing out my drafts folder. 600+ words on some examples that I cut from a recent manuscript. Only of interest to (bored) statisticians.

The theme here is to construct some simple yet pointed examples where Bayesian inference goes wrong, though the data-generating processes are well-behaved, and the priors look harmless enough. In reality, however, there is no such thing as an prior without bias, and in these examples the bias is so strong that Bayesian learning reaches absurd conclusions.

Example 1

The data Xi, i=1,2,3,..., come from a 50/50 mixture of two Gaussians, with means at -1 and +1, both with standard deviation 1. (They are independent and identically distributed.) The prior, by coincidence, is a 50/50 mix of two Gaussians, located at -1 and +1, both with standard deviation 1. So initially the posterior predictive distribution coincides exactly with the actual data-generating distribution. After n observations x1, ... xn, whose sum is z, the log-likelihood ratio L(+1)/L(-1) is e2z. Hence the posterior probability that the expectation is +1 is 1/(1+e-2z), and the posterior probability that the expectation is -1 is 1/(1+e2z). The sufficient statistic z itself follows an unbiased random walk, meaning that as n grows it tends to get further and further away from the origin, with a typical size growing roughly like n1/2. It does keep returning to the origin, at intervals dictated by the arc sine law, but it spends more and more of its time very far away from it. The posterior estimate of the mean thus wanders from being close to +1 to being close to -1 and back erratically, hardly ever spending time near zero, even though (from the law of large numbers) the sample mean converges to zero.

This figure shows typical sample paths for z, for the posterior probability of the +1 mode, and for the relative entropy of the predictive distribution from the data-generating distribution. (The latter is calculated by Monte Carlo since I've forgotten how to integrate, so some of the fuzziness is MC noise.) Here is the R code.

click for full-size PDF

Exercise 1: Confirm those calculations for the likelihood ratio and so for the posterior.

Exercise 2: Find the expected log-likelihood of an arbitrary-mean unit-variance Gaussian under this data-generating distribution.

Example 2

Keep the same data-generating distribution, but now let the prior be the conjugate prior for a Gaussian, namely another Gaussian, centered at zero. The posterior is then another Gaussian, which is a function of the sample mean, since the latter is a sufficient statistic for the problem.

Exercise 3: Find the mean and variance of the posterior distribution as functions of the sample mean. (You could look them up, but that would be cheating.)

As we get more and more data, the sample mean of converges almost surely to zero (by the law of large numbers), which here drives the mean and variance of the posterior to zero almost surely as well. In other words, the Bayesian becomes dogmatically certain that the data are distributed according to a standard Gaussian with mean 0 and variance 1. This is so even though the sample variance almost surely converges to the true variance, which is 2. This Bayesian, then, is certain that the data are really not that variable, and any time now will start settling down.

Exercise 4: Suppose that we take the prior from the previous example, set it to 0 on the interval [-1,+1], and increase the prior everywhere else by a constant factor to keep it normalized. Show that the posterior density at every point except -1 and +1 will go to zero. (Hint: use exercise 2 and see here.)

Update in response to e-mails, 27 March: No, I'm not saying that actual Bayesian statisticians are this dumb. A sensible practitioner would, as Andy Gelman always recommends, run a posterior predictive check, and discover that his estimated model looks nothing at all like the data. But that sort of thing is completely outside the formal apparatus of Bayesian inference. What amuses me in these examples is that the formal machinery becomes so certain while being so wrong, while starting from the right answer (and this while Theorem 5 from my paper still applies!). See the second post by Brad DeLong, linked to below.

Manual trackback: Brad DeLong; and again Brad DeLong (with a simpler version of example 1!); The Statistical Mechanic

Enigmas of Chance

Posted by crshalizi at March 26, 2009 10:45 | permanent link

March 24, 2009

Where Did the Steelworkers Go?

Attention conservation notice: back-of-the-envelope calculations about why the US has only about a fifth as many steelworkers now as it did in 1960. Not backed by any actual knowledge of the steel industry. Utterly untimely, it was, I think, prompted by a comment thread on Unfogged, but so long ago I can't remember which.

In 1960, US primary steel production was 91 million tons, of which 2.95 million tons were exported; it also imported 3.24 million tons. This part of the industry employed 530,000 people in all capacities, for an annual output of 170 tons/employee.

In 2007, US primary steel production was 98.1 million tons, with exports of 10.1 million tons and imports of 30.2 million tons. Employment was only 97,540 people, coming to 1005 tons/employee.

Exports and imports in 1960 were a wash, nearly enough, so let's suppose trade patterns had remained comparable and say that all of the net imports were to be made up by higher domestic production: (20.1 million tons)/(1005 tons/worker) = 20,000 extra workers. This would be a substantial increase, but it would still leave employment in steel at only 22% of its 1960 level. Where did the other four-fifths of the industry go?

The most obvious explanation is productivity. The industry in 2007 produced more than it did in 1960, with many fewer employees. In fact, output per employee grew 5.9 times over that period. A six-fold increase in productivity divided by a slight rise in total demand equals a roughly five-fold fall in employment.

Now, this calculation understates the effect of trade because it only considers net imports of steel. But steel is used as an input to producing many other things, and a washing machine made of steel shows up in this sort of official statistic as an import of a manufactured good, not an import of steel. So to really see what US steel production would be if we retained 1960 trade patterns, we'd need to see what the change in the (foreign*) steel content of US net imports has been. Since I don't have Leontief input-output matrices for the US and its trading partners in the two years, I can't do this.

Failing actual knowledge, I'll turn to guesswork. Suppose the steel content of imports was equal to net direct imports; this seems high, but what do I know? This would just add another 20,000 jobs, and bring us up to 26% of the size of the industry in 1960. To get the same level of employment in steel production now as in 1960, the net increase in the foreign steel content of our imports would have to satisfy

(530,000 workers) - (117,540 workers for domestic production and direct imports) = (increase in net indirect imports)/(1005 tons/worker)
or 414,522,300 tons, i.e., about 3.5 times total production plus net direct imports. This is highly implausible.

I conclude that domestic employment in steel production has collapsed largely because increases in productivity have not been matched by increases in demand. If someone can point out where this reasoning goes wrong, I'd appreciate it.

*: Foreign steel content, because if the washing machine is made abroad of steel exported by the US, replacing that washing machine with a US-made one will not increase the demand for American steel.

Sources: 2007 employment figure from BLS (NAICS code 3311). 1960 employment figure from Table 1 on p. 2 of Lebergott. (It does not, however, appear to be affected by some of the well-known problems with Lebergott's series for the 1930s.) Annual production, import and export figures from USGS.

Manual trackback: The Inverse Square Blog; Nothing Funny About Feldspar (with more facts; go read)

The Dismal Science

Posted by crshalizi at March 24, 2009 10:49 | permanent link

March 22, 2009

Special Function Invocation

O Hive Mind, o Lazy Web, Urania's child, I invoke thee! Is there a name for the function
\[ 
f_n(\theta) = \sum_{k=0}^{n}{{n \choose k} \theta^k {(1-\theta)}^{n-k} \log{k!}} 
 \]
i.e., for $ \mathbb{E}[\log{X!}] $ when X is binomially distributed?

Enigmas of Chance

Posted by crshalizi at March 22, 2009 21:52 | permanent link

March 17, 2009

Idle Question of the Day

Exactly what bad consequences would follow if laws were passed by the relevant countries rendering credit default swap contracts void henceforth? (That is, canceling all the outstanding wagers because the bookies went bust.)

Update, 22 March: Well, one bad consequence would evidently be agreeing with Ben Stein. A bit from that link (by Felix Salmon, not Stein) is worth quoting:

There's a good chance, just for starters, that every major bank in America would go bust overnight: after all, they've been packaging up and selling off the credit risk on their multi-trillion-dollar loan portfolios [for years]. If Stein got his way, all that credit risk would suddenly reappear on the banks' balance sheets, and there's nothing they could do about it. Genius. Remember that those super-senior CDOs were the safest bits of the credit that they sold off. Just imagine what their balance sheets would look like if all the risky bits reappeared.

The issue he's raising is that if the banks can't say that they're covered for the risk of their loans defaulting (via the credit default swaps), they need to hold more capital as a protection against default. So as a legal or regulatory issue, ending the swaps would make the banks worse off. Substantively, however, this only makes sense if the swaps would, in fact, protect banks in the event of defaults — if they actually shifted the risk to the swap-sellers. Since we have just had pretty dramatic demonstrations that this is not something to be counted on, it's not at all clear to me that the banks ought to be able to keep that risk off their balance sheets. (In other words, the real value of the swaps to the banks is zero, or next to zero.) In any case, this objection could be countered by combining ending credit default swaps with public guarantees of the banks' existing positions — which is effectively what's happening anyway, only without making it harder to repeat the mistake in the future.

More broadly, ending credit default swaps would mean that those who sold such swaps would lose their stream of payments (a flow) but gain back their collateral and reserves (a stock); conversely buyers of default protection would gain a cash flow but take a hit to their capital stocks. Right now one imagines that even those selling the "end of the world trade" might prefer to get out of the game; I'd be interested to see an estimate of the effects of this on the stability of the financial world right now.

There is also the possibility that eliminating the swaps would deprive us of information about how risky different debts are. The value we should place on this, however, depends on how well these markets actually succeed in aggregating information about risk. I'd say there is abundant cause for skepticism about this — especially when things are, in fact, dangerous. Economic theory does not, in fact, provide any reason to think that such markets will be dominated by those with the most accurate beliefs (or even that the market as a whole will be more accurate than the best-informed trader), unless you assume a complete set of markets, which is a reductio ad absurdum if ever there were one. (When markets are incomplete, more markets are not necessarily better.)

To be clear, I am not asserting that credit default swaps should be ended. I honestly don't think I know enough to have an opinion about that, and while I'm obviously skeptical about their value, some serious and credible people (i.e., ones who do not have a vested interest in the matter) who've studied them in more depth see merit to them. If this is what a world with efficiently-allocated risk looks like, though, I'd hate to see a messed-up one.

(Thanks to readers D.H. and son1 for comments and pointers.)

The Dismal Science; The Continuing Crises; Modest Proposals

Posted by crshalizi at March 17, 2009 22:22 | permanent link

Three-Toed Sloth:   Hosted, but not endorsed, by the Center for the Study of Complex Systems