Markov Chains
When I talked to my friend (and Olsson’s coworker from long ago) Troy, I told him that my choice of the wood-grain design for the Olsson’s Testimonial page was an attempt to capture a feeling I had about the wooden bookshelves. His reply was “and, it’s like a coffin, too!”.
Yes, well…
Soon we were off on a tangent: Because of that page, and the necessity that someone moderate the comments, I have been doing it. I didn’t see a setting for allowing all comments, and frankly, I wanted control over the nonsensical - like botnet spam or incoherent ravings. To which Troy replied “you mean, like a Sarah Palin speech?”.
Not… exactly.
So that was weird, but in “The Drunkard’s Walk”, Leonard Mlodinow reminds us not to be fooled by patterns we think we see in random events. Sarah Palin is obviously in the news, so for someone to mention her name the same day I was just looking at a Sarah Palin interview parody is not weird at all. Nor is it weird that the very same book about randomness should talk about Markov chains, which are the method that generates those fake Palin speeches.
And Markov chains are the mechanism I couldn’t recall that generated the text in that old program we had on our Mac when I was a teenager (what was that thing called?… I found it fascinating). I started to think about how to reverse-engineer that program recently, and came up with a sketch of associative arrays in Perl, or the essentially identical hash object in Ruby. If you think about it, Markov chains are just the framework - probability for any word showing up after any other - and, that means a couple of things, at least: 1) You have to have one program component learn these probabilities from existing texts, unless you want completely synthetic results (I can’t help thinking about music synthesizers vs. music sampling). And, 2) you might need a lot of memory to store a probability for each ordered pair of two words. This gets worse if you plan to extend the concept to what word will follow a particular combination of words - the so-called ‘memory’ of the chain. This is what suggests associative arrays in the first place - they’re more efficient for large but sparse matrices.
If you take a look at the Wikipedia page on Markov chains, you’ll notice that there are a lot of equations. One of my high school physics teachers was convinced that normal people just ignore the equations in a text. Therefore, good science/math writing has to explain everything well, but I love equations because they save me the trouble of wading through so much descriptive text. It’s the same issue with computer code, like when I wanted to tell those guys on the bus the other night about how the politicians should shut up and show me the code revisions they were proposing.
If you’re not going to look at the equations with me, just remember this: The probabilities in that article are context dependent - Every different Markov chain is like a chart with values for those numbers filled in. The numbers are a kind of ‘signature’ that we experience subjectively as uncanny reproductions of Sarah Palin speeches, or scholarly journal articles. Every different generator (i.e. set of those probabilities) applied to a specific vocabulary falls on a point in a multidimensional phase-space - an inherent product of the vocabulary itself (this is referred to as the ‘state-space’ on the Wiki). Even if you consider a two word ‘vocabulary’ like the Heads and Tails of a coin toss, the Markov chain concept implies a continuous space of different actual Markov chains for that vocabulary. Texts generated from those mechanisms are going to feel subjectively similar if the points in phase space are close - i.e., all the probabilities are close enough. This is fortunate, because an exact point in that space can only be estimated in practice: We estimate it by analyzing the texts we want to imitate.
But rest assured, I’m often exhausted by equations like the ones in that article. I have a threshold for what equations I can comfortably examine while reading, too.
I feel a bit relieved by all this: If I ever want to take a break from blogging, at least I know that I could write a Markov chain generator and train it to the previous posts. How long would it take for people to decide that I wasn’t just writing in a self-conscious imitation of my own style?
Posted by Evan Bittner Thu, 02 Oct 2008 13:49:00 GMT
