This may be old news to some of you, but I just noticed the other day that Amazon has added a whole panel of "text stats" for many of its books. I noticed it because my last book The Ghost Map just came out in paperback (go read it people -- it's a lot more fun than this post will turn out to be) and so I'm back into the swing of checking Amazon a few times a day. Text Stats is a pretty wonky page -- everything from some of the "readability" indices, to overall word count, to what Amazon calls "Fun stats" like "Words per dollar." (Quotes you never hear at Barnes and Noble: "This copy of Infinite Jest is such a bargain at only 39,574 words per dollar!")
But the two stats that I found totally fascinating were "Average Words Per Sentence" and "% Complex Words," the latter defined as words with three or more syllables -- words like "ameliorate", "protoplasm" or "motherf***er." I've always thought that sentence length is a hugely determining factor in a reader's perception of a given work's complexity, and I spent quite a bit of time in my twenties actively teaching myself to write shorter sentences. So this kind of material is fascinating to me, partially because it lets me see something statistically that I've thought a great deal about intuitively as a writer, and partially because I can compare my own stats to other writers' and see how I fare. (Perhaps there's a literary Rotisserie league lurking somewhere on those Text Stats pages.)
So I spent a few hours last week plugging in the numbers for my books, as well as a few other authors that I assembled in an entirely unscientific fashion: Malcolm Gladwell, Steven Pinker, Seth Godin, Christopher Hitchens -- and then, just to see how far I'd come, I threw in my intellectual (and, sadly, stylistic) heroes from my early twenties, the post-structuralist legends Michel Foucault and Frederic Jameson. I compiled stats for 3-4 books for each author, except Gladwell who has written two, and then plotted them on a scatter chart, with the y axis representing % complex words and the x axis representing words per sentence. The results were pretty fascinating:
1. There's a clear cluster of Hitchens/Johnson/Pinker in the center. (From eyeballing some other Amazon pages, I think Dawkins, Michael Pollan, E. O. Wilson would have been in that general area as well.) But what I thought was so striking was that even in that cluster, each author's books are closer to his other books than they are to the other two author's books. In other words, each of us has a certain sweet spot of complexity that we come back to book after book. My first and last books, Ghost Map and Interface Culture had the exact same words per sentence, down to the decimal point: 24.6. (My longest sentences turned out to be in Emergence, followed closely by Everything Bad at 25.8 and 25.7.) Pinker tends to be just slightly less complex syntactically (with the one outlier Blank Slate, which is more complex than anything I've written.) And Hitchens tends to write longer sentences by a couple of words.
2. Gladwell's sentences are fully 25% shorter than mine. I'm not sure if the average reader would notice the difference between the Johnson/Hitchens/Pinker cluster, but a 25% drop in sentence length has to alter the reading experience dramatically. Clearly, the only things separating me from selling ten million copies of my books are those extra 6.5 words per sentence.
3. Check out Foucault and Jameson. They are literally on another planet. The top spot goes to Jameson's "Postmodernism" book which I read like scripture my first year of grad school: 53 words per sentence! Interestingly, most of the variation shows up in sentence length not in word complexity -- you often hear people complain about the impenetrable jargon of critical theory, but it looks here like the sentence length is as least as much of a culprit.
4. I would love to see some stats on dynamic range here: not just average sentence length, but how much the sentence lengths vary over the course of each book. One of the things I learned when I started writing in a less academic style (largely when I was doing FEED) is the importance of throwing in a very short sentence for emphasis at regular intervals. (Come to think of it, I may have learned this from reading Gladwell's early pieces in the New Yorker.)
5. Is there a Literature grad school version of the Lazy Web? If so, I would love to see a study that cross-referenced sales and syntactical complexity across thousands of books and determined who had the highest sales-to-complexity ratio of all time.
6. After looking at the Jameson number, I went back to one of my papers from junior year at Brown to see how awful my prose was. I pulled up the scariest sentence in the first paragraph and did a quick word count: 75 words. 75! And no semi-colons either. I bet Fred Jameson's pretty psyched I never finished that PhD...