Literary Style By The Numbers
This may be old news to some of you, but I just noticed the other day that Amazon has added a whole panel of "text stats" for many of its books. I noticed it because my last book The Ghost Map just came out in paperback (go read it people -- it's a lot more fun than this post will turn out to be) and so I'm back into the swing of checking Amazon a few times a day. Text Stats is a pretty wonky page -- everything from some of the "readability" indices, to overall word count, to what Amazon calls "Fun stats" like "Words per dollar." (Quotes you never hear at Barnes and Noble: "This copy of Infinite Jest is such a bargain at only 39,574 words per dollar!")
But the two stats that I found totally fascinating were "Average Words Per Sentence" and "% Complex Words," the latter defined as words with three or more syllables -- words like "ameliorate", "protoplasm" or "motherf***er." I've always thought that sentence length is a hugely determining factor in a reader's perception of a given work's complexity, and I spent quite a bit of time in my twenties actively teaching myself to write shorter sentences. So this kind of material is fascinating to me, partially because it lets me see something statistically that I've thought a great deal about intuitively as a writer, and partially because I can compare my own stats to other writers' and see how I fare. (Perhaps there's a literary Rotisserie league lurking somewhere on those Text Stats pages.)
So I spent a few hours last week plugging in the numbers for my books, as well as a few other authors that I assembled in an entirely unscientific fashion: Malcolm Gladwell, Steven Pinker, Seth Godin, Christopher Hitchens -- and then, just to see how far I'd come, I threw in my intellectual (and, sadly, stylistic) heroes from my early twenties, the post-structuralist legends Michel Foucault and Frederic Jameson. I compiled stats for 3-4 books for each author, except Gladwell who has written two, and then plotted them on a scatter chart, with the y axis representing % complex words and the x axis representing words per sentence. The results were pretty fascinating:
Some observations:
1. There's a clear cluster of Hitchens/Johnson/Pinker in the center. (From eyeballing some other Amazon pages, I think Dawkins, Michael Pollan, E. O. Wilson would have been in that general area as well.) But what I thought was so striking was that even in that cluster, each author's books are closer to his other books than they are to the other two author's books. In other words, each of us has a certain sweet spot of complexity that we come back to book after book. My first and last books, Ghost Map and Interface Culture had the exact same words per sentence, down to the decimal point: 24.6. (My longest sentences turned out to be in Emergence, followed closely by Everything Bad at 25.8 and 25.7.) Pinker tends to be just slightly less complex syntactically (with the one outlier Blank Slate, which is more complex than anything I've written.) And Hitchens tends to write longer sentences by a couple of words.
2. Gladwell's sentences are fully 25% shorter than mine. I'm not sure if the average reader would notice the difference between the Johnson/Hitchens/Pinker cluster, but a 25% drop in sentence length has to alter the reading experience dramatically. Clearly, the only things separating me from selling ten million copies of my books are those extra 6.5 words per sentence.
3. Check out Foucault and Jameson. They are literally on another planet. The top spot goes to Jameson's "Postmodernism" book which I read like scripture my first year of grad school: 53 words per sentence! Interestingly, most of the variation shows up in sentence length not in word complexity -- you often hear people complain about the impenetrable jargon of critical theory, but it looks here like the sentence length is as least as much of a culprit.
4. I would love to see some stats on dynamic range here: not just average sentence length, but how much the sentence lengths vary over the course of each book. One of the things I learned when I started writing in a less academic style (largely when I was doing FEED) is the importance of throwing in a very short sentence for emphasis at regular intervals. (Come to think of it, I may have learned this from reading Gladwell's early pieces in the New Yorker.)
5. Is there a Literature grad school version of the Lazy Web? If so, I would love to see a study that cross-referenced sales and syntactical complexity across thousands of books and determined who had the highest sales-to-complexity ratio of all time.
6. After looking at the Jameson number, I went back to one of my papers from junior year at Brown to see how awful my prose was. I pulled up the scariest sentence in the first paragraph and did a quick word count: 75 words. 75! And no semi-colons either. I bet Fred Jameson's pretty psyched I never finished that PhD...

Great post. But can anyone recommend a software with which to measure statistics on typed words and sentences effectively, based on similar techniques as amazon have used?
Cheers in advances
Posted by: Michael | October 27, 2007 at 11:11 AM
"I'm just wondering what would the graph have looked like if you would have plotted Gayatri Chakravarti Spivak on it..."
If you check Of Grammatology, Derrida (surprisingly, to me) comes in lower than Steven! Go figure...
Posted by: Mike Wing | October 27, 2007 at 10:17 PM
Kharris makes a good point. For example, Constance Garnett is known for having simplified the texts of Dostoevsky and other Russian authors in her translations, which of course affects the statistics.
Posted by: John | October 31, 2007 at 03:41 AM
Text stats become more homogenous with novel writing because dialogue lowers the average words per sentence. For instance, "Yes / Hello / No" all count as complete, one-word sentences and occur frequently in dialogue. That is why Dostoevsky and Hemingway's average words per sentence are nearly the same even though the complexity of their writing differs dramatically.
Comparing Gladwell to Steven is easier since their works contain no dialogue, but with a novelist, a better comparison would probably be the standard deviation (or perhaps just the MEDIAN sentence length) rather than the average words per sentence.
Posted by: Michael | November 02, 2007 at 10:24 PM
Interesting observations.
Perhaps the need reread to understand is related to a limit in the brain. Isn't the limit around 7-8 for the average person without memory training?
When I was reading this, I was thinking of idea of chunking. The strategy, not something you do after a big night out.
http://en.wikipedia.org/wiki/Chunking_(psychology)
Perhaps Gladwell and Godin are popular and sell well because their ideas are easily understood.
There is a perl module for determining lots of statistics.
http://search.cpan.org/~kimryan/Lingua-EN-Fathom-1.11/lib/Lingua/EN/Fathom.pm
An explanation of MS-Word report can be found here
http://www.brainbell.com/tutorials/ms-office/Word/Make_Sense_Of_Word's_Readability_Statistics.htm
And the readability test mentioned has got a wiki entry here
http://en.wikipedia.org/wiki/Flesch-Kincaid_Readability_Test
Have Fun
Paul
Posted by: PaulM | November 03, 2007 at 05:51 PM
If you're such a fan of Jameson, could you spell his first name correctly? Sheesh, man.
Posted by: John | November 10, 2007 at 03:10 PM
That's a great analysis. I always believed it from an anecdotal perspective, but having real numbers definitely confirms it.
I think Robert Kiyosaki put it best when he said his writing wasn't the most eloquent but it was still one of the best selling. He writes his thoughts in clear and simple sentences. He makes it very easy to understand his message rather than convoluting it with extensive verbiage.
Of course I'm not saying to dumb it all down, just to write what you mean. A great book that I strongly recommend on how to do this is called "On Writting Well". You can read my review of it at: http://www.followsteph.com/2007/07/14/book-recommendation-on-writing-well/
Posted by: Stephane Grenier | November 16, 2007 at 11:00 AM
Really interesting analysis! I'll try to write shorter sentences now.
Posted by: Colin | November 17, 2007 at 12:13 PM