This may be old news to some of you, but I just noticed the other day that Amazon has added a whole panel of "text stats" for many of its books. I noticed it because my last book The Ghost Map just came out in paperback (go read it people -- it's a lot more fun than this post will turn out to be) and so I'm back into the swing of checking Amazon a few times a day. Text Stats is a pretty wonky page -- everything from some of the "readability" indices, to overall word count, to what Amazon calls "Fun stats" like "Words per dollar." (Quotes you never hear at Barnes and Noble: "This copy of Infinite Jest is such a bargain at only 39,574 words per dollar!")
But the two stats that I found totally fascinating were "Average Words Per Sentence" and "% Complex Words," the latter defined as words with three or more syllables -- words like "ameliorate", "protoplasm" or "motherf***er." I've always thought that sentence length is a hugely determining factor in a reader's perception of a given work's complexity, and I spent quite a bit of time in my twenties actively teaching myself to write shorter sentences. So this kind of material is fascinating to me, partially because it lets me see something statistically that I've thought a great deal about intuitively as a writer, and partially because I can compare my own stats to other writers' and see how I fare. (Perhaps there's a literary Rotisserie league lurking somewhere on those Text Stats pages.)
So I spent a few hours last week plugging in the numbers for my books, as well as a few other authors that I assembled in an entirely unscientific fashion: Malcolm Gladwell, Steven Pinker, Seth Godin, Christopher Hitchens -- and then, just to see how far I'd come, I threw in my intellectual (and, sadly, stylistic) heroes from my early twenties, the post-structuralist legends Michel Foucault and Frederic Jameson. I compiled stats for 3-4 books for each author, except Gladwell who has written two, and then plotted them on a scatter chart, with the y axis representing % complex words and the x axis representing words per sentence. The results were pretty fascinating:
Some observations:
1. There's a clear cluster of Hitchens/Johnson/Pinker in the center. (From eyeballing some other Amazon pages, I think Dawkins, Michael Pollan, E. O. Wilson would have been in that general area as well.) But what I thought was so striking was that even in that cluster, each author's books are closer to his other books than they are to the other two author's books. In other words, each of us has a certain sweet spot of complexity that we come back to book after book. My first and last books, Ghost Map and Interface Culture had the exact same words per sentence, down to the decimal point: 24.6. (My longest sentences turned out to be in Emergence, followed closely by Everything Bad at 25.8 and 25.7.) Pinker tends to be just slightly less complex syntactically (with the one outlier Blank Slate, which is more complex than anything I've written.) And Hitchens tends to write longer sentences by a couple of words.
2. Gladwell's sentences are fully 25% shorter than mine. I'm not sure if the average reader would notice the difference between the Johnson/Hitchens/Pinker cluster, but a 25% drop in sentence length has to alter the reading experience dramatically. Clearly, the only things separating me from selling ten million copies of my books are those extra 6.5 words per sentence.
3. Check out Foucault and Jameson. They are literally on another planet. The top spot goes to Jameson's "Postmodernism" book which I read like scripture my first year of grad school: 53 words per sentence! Interestingly, most of the variation shows up in sentence length not in word complexity -- you often hear people complain about the impenetrable jargon of critical theory, but it looks here like the sentence length is as least as much of a culprit.
4. I would love to see some stats on dynamic range here: not just average sentence length, but how much the sentence lengths vary over the course of each book. One of the things I learned when I started writing in a less academic style (largely when I was doing FEED) is the importance of throwing in a very short sentence for emphasis at regular intervals. (Come to think of it, I may have learned this from reading Gladwell's early pieces in the New Yorker.)
5. Is there a Literature grad school version of the Lazy Web? If so, I would love to see a study that cross-referenced sales and syntactical complexity across thousands of books and determined who had the highest sales-to-complexity ratio of all time.
6. After looking at the Jameson number, I went back to one of my papers from junior year at Brown to see how awful my prose was. I pulled up the scariest sentence in the first paragraph and did a quick word count: 75 words. 75! And no semi-colons either. I bet Fred Jameson's pretty psyched I never finished that PhD...

Great post. But can anyone recommend a software with which to measure statistics on typed words and sentences effectively, based on similar techniques as amazon have used?
Cheers in advances
Posted by: Michael | October 27, 2007 at 11:11 AM
"I'm just wondering what would the graph have looked like if you would have plotted Gayatri Chakravarti Spivak on it..."
If you check Of Grammatology, Derrida (surprisingly, to me) comes in lower than Steven! Go figure...
Posted by: Mike Wing | October 27, 2007 at 10:17 PM
Kharris makes a good point. For example, Constance Garnett is known for having simplified the texts of Dostoevsky and other Russian authors in her translations, which of course affects the statistics.
Posted by: John | October 31, 2007 at 03:41 AM
Text stats become more homogenous with novel writing because dialogue lowers the average words per sentence. For instance, "Yes / Hello / No" all count as complete, one-word sentences and occur frequently in dialogue. That is why Dostoevsky and Hemingway's average words per sentence are nearly the same even though the complexity of their writing differs dramatically.
Comparing Gladwell to Steven is easier since their works contain no dialogue, but with a novelist, a better comparison would probably be the standard deviation (or perhaps just the MEDIAN sentence length) rather than the average words per sentence.
Posted by: Michael | November 02, 2007 at 10:24 PM
Interesting observations.
Perhaps the need reread to understand is related to a limit in the brain. Isn't the limit around 7-8 for the average person without memory training?
When I was reading this, I was thinking of idea of chunking. The strategy, not something you do after a big night out.
http://en.wikipedia.org/wiki/Chunking_(psychology)
Perhaps Gladwell and Godin are popular and sell well because their ideas are easily understood.
There is a perl module for determining lots of statistics.
http://search.cpan.org/~kimryan/Lingua-EN-Fathom-1.11/lib/Lingua/EN/Fathom.pm
An explanation of MS-Word report can be found here
http://www.brainbell.com/tutorials/ms-office/Word/Make_Sense_Of_Word's_Readability_Statistics.htm
And the readability test mentioned has got a wiki entry here
http://en.wikipedia.org/wiki/Flesch-Kincaid_Readability_Test
Have Fun
Paul
Posted by: PaulM | November 03, 2007 at 05:51 PM
If you're such a fan of Jameson, could you spell his first name correctly? Sheesh, man.
Posted by: John | November 10, 2007 at 03:10 PM
That's a great analysis. I always believed it from an anecdotal perspective, but having real numbers definitely confirms it.
I think Robert Kiyosaki put it best when he said his writing wasn't the most eloquent but it was still one of the best selling. He writes his thoughts in clear and simple sentences. He makes it very easy to understand his message rather than convoluting it with extensive verbiage.
Of course I'm not saying to dumb it all down, just to write what you mean. A great book that I strongly recommend on how to do this is called "On Writting Well". You can read my review of it at: http://www.followsteph.com/2007/07/14/book-recommendation-on-writing-well/
Posted by: Stephane Grenier | November 16, 2007 at 11:00 AM
Really interesting analysis! I'll try to write shorter sentences now.
Posted by: Colin | November 17, 2007 at 12:13 PM
I like this idea of counting. It has legs. An author could cut all of his sentences in half and sell the result as the second edition.
An author could vary the lengths of sentences according to the Golden Ratio 1.6180339887498948482. For example, his mean sentence lengths could be 17, 27.5065778087482124194, and 44.506577808748212419225639951467.
An author could choose the text of a great book
and measure word length, sentence length, paragraph length, chapter length and document length. Then, for example, he could write the sentence lengths on ping-pong balls and pick the balls from a lottery device. This method would generate the numerics of the book, and the author could concentrate on content.
I believe that the content could also be generated numerically. Put all of the settings
such as war on ping-pong balls. Continue with the number of characters, genres, plots, etc.
With the draft written the author could proceed to editing which he could have his wife do. Of course he would need a geek for the computer work.
These numerical books could provide his income leaving him enough time to write his own book.
Voila!
Posted by: B.J. Henderson | January 25, 2009 at 07:14 PM
You're a published author, so please don't say that Foucault and Jameson are "literally on another planet."
They are not *literally* on another planet, unless there's something you know that I don't know about either life after death or Duke University.
Help stamp out the "literally" disease.
Posted by: cm | February 10, 2009 at 03:00 PM
Sorry about the literally gripe, it derailed me from the more important point of saying that this blog post rules the Earth with an iron fist. Not literally, but pretty close.
No, really--very interesting!
Posted by: cm | February 10, 2009 at 03:06 PM
I love any comment that can help me to improve my blog.
Posted by: Rerto Jordans | June 07, 2010 at 05:57 PM
We should all be concerned about the future because that is where we will spend the remainder of our lives.
Posted by: discount coach | June 28, 2010 at 07:56 PM
Faced with a hard and demanding task, people’s attitude varies widely: some try to avoid it and others regard it as a challenge to their abilities. In fact the choice we make between flight or fight make the difference between leaders and mediocrities.
Posted by: discount coach | June 30, 2010 at 06:11 PM
Steven, this is great!Thanks for the tip. I'm doing more and more writing, which I'm glad for.I'm sure there's got to be some correlation with sales here. I wonder what is median and range for something like the NYT bestsellers lists.
Posted by: Term Papers | August 28, 2010 at 01:14 AM
You have done a marvelous job by exploring this subject with such an honesty and depth. Thanks for sharing it with us!
Posted by: customized term papers | September 15, 2010 at 05:27 AM
Stacy, I love these. Really I do. I cant wait for more. Could this perhaps be a weekly thing? One of my favorite past times is going on IMDB and hitting up the trivia pages. This is like my movie trivia heroin fix.
Posted by: jordan retro 9 | September 19, 2010 at 08:07 PM
I'm really happy to depart my messages within your posts, I'd fancy to hear another suggestions that you simply or your readers. God Bless you and your amazing loved ones as well as the happiest.
Posted by: New Balance 574 | September 20, 2010 at 11:25 PM
The first section of the book basically includes all you need to know about the parts of speech with many specific ideas about how to get those concepts across to students.
good posts well,and now change some ideas each other,link my name firstly.
Posted by: jordan 7 | October 15, 2010 at 08:31 PM
Your blog is so funny that I can not help to finish it front and back. It is of help during my boring work.
Posted by: ugg outlet | October 30, 2010 at 12:22 AM
Your articles are so functional on entertainment, really worth to read after a day of hard work. Maybe you will became a talent script editor.
Posted by: ugg store | October 30, 2010 at 02:11 AM