« Apple Opens Up | Main | Twitter »

Literary Style By The Numbers

This may be old news to some of you, but I just noticed the other day that Amazon has added a whole panel of "text stats" for many of its books. I noticed it because my last book The Ghost Map just came out in paperback (go read it people --  it's a lot more fun than this post will turn out to be) and so I'm back into the swing of checking Amazon a few times a day. Text Stats is a pretty wonky page -- everything from some of the "readability" indices, to overall word count, to what Amazon calls "Fun stats" like "Words per dollar." (Quotes you never hear at Barnes and Noble: "This copy of Infinite Jest is such a bargain at only 39,574 words per dollar!")

But the two stats that I found totally fascinating were "Average Words Per Sentence" and "% Complex Words," the latter defined as words with three or more syllables -- words like "ameliorate", "protoplasm" or "motherf***er." I've always thought that sentence length is a hugely determining factor in a reader's perception of a given work's complexity, and I spent quite a bit of time in my twenties actively teaching myself to write shorter sentences. So this kind of material is fascinating to me, partially because it lets me see something statistically that I've thought a great deal about intuitively as a writer, and partially because I can compare my own stats to other writers' and see how I fare. (Perhaps there's a literary Rotisserie league lurking somewhere on those Text Stats pages.)

So I spent a few hours last week plugging in the numbers for my books, as well as a few other authors that I assembled in an entirely unscientific fashion: Malcolm Gladwell, Steven Pinker, Seth Godin, Christopher Hitchens -- and then, just to see how far I'd come, I threw in my intellectual (and, sadly, stylistic) heroes from my early twenties, the post-structuralist legends Michel Foucault and Frederic Jameson.  I compiled stats for 3-4 books for each author, except Gladwell who has written two, and then plotted them on a scatter chart, with  the y axis representing % complex words and the x axis representing words per sentence. The results were pretty fascinating:

Chart

Some observations:

1. There's a clear cluster of Hitchens/Johnson/Pinker in the center. (From eyeballing some other Amazon pages, I think Dawkins, Michael Pollan, E. O. Wilson would have been in that general area as well.) But what I thought was so striking was that even in that cluster, each author's books are closer to his other books than they are to the other two author's books. In other words, each of us has a certain sweet spot of complexity that we come back to book after book. My first and last books, Ghost Map and Interface Culture had the exact same words per sentence, down to the decimal point: 24.6. (My longest sentences turned out to be in Emergence, followed closely by Everything Bad at 25.8 and 25.7.) Pinker tends to be just slightly less complex syntactically (with the one outlier Blank Slate, which is more complex than anything I've written.) And Hitchens tends to write longer sentences by a couple of words.

2. Gladwell's sentences are fully 25% shorter than mine. I'm not sure if the average reader would notice the difference between the Johnson/Hitchens/Pinker cluster, but a 25% drop in sentence length has to alter the reading experience dramatically. Clearly, the only things separating me from selling ten million copies of my books are those extra 6.5 words per sentence.

3. Check out Foucault and Jameson. They are literally on another planet. The top spot goes to Jameson's "Postmodernism" book which I read like scripture my first year of grad school: 53 words per sentence! Interestingly, most of the variation shows up in sentence length not in word complexity -- you often hear people complain about the impenetrable jargon of critical theory, but it looks here like the sentence length is as least as much of a culprit.

4. I would love to see some stats on dynamic range here: not just average sentence length, but how much the sentence lengths vary over the course of each book. One of the things I learned when I started writing in a less academic style (largely when I was doing FEED) is the importance of throwing in a very short sentence for emphasis at regular intervals. (Come to think of it, I may have learned this from reading Gladwell's early pieces in the New Yorker.)

5. Is there a Literature grad school version of the Lazy Web? If so, I would love to see a study that cross-referenced sales and syntactical complexity across thousands of books and determined who had the highest sales-to-complexity ratio of all time.

6. After looking at the Jameson number, I went back to one of my papers from junior year at Brown to see how awful my prose was. I pulled up the scariest sentence in the first paragraph and did a quick word count: 75 words. 75! And no semi-colons either. I bet Fred Jameson's pretty psyched I never finished that PhD...

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/t/trackback/60481/22642520

Listed below are links to weblogs that reference Literary Style By The Numbers:

Comments

Great post. But can anyone recommend a software with which to measure statistics on typed words and sentences effectively, based on similar techniques as amazon have used?
Cheers in advances

"I'm just wondering what would the graph have looked like if you would have plotted Gayatri Chakravarti Spivak on it..."

If you check Of Grammatology, Derrida (surprisingly, to me) comes in lower than Steven! Go figure...

Kharris makes a good point. For example, Constance Garnett is known for having simplified the texts of Dostoevsky and other Russian authors in her translations, which of course affects the statistics.

Text stats become more homogenous with novel writing because dialogue lowers the average words per sentence. For instance, "Yes / Hello / No" all count as complete, one-word sentences and occur frequently in dialogue. That is why Dostoevsky and Hemingway's average words per sentence are nearly the same even though the complexity of their writing differs dramatically.

Comparing Gladwell to Steven is easier since their works contain no dialogue, but with a novelist, a better comparison would probably be the standard deviation (or perhaps just the MEDIAN sentence length) rather than the average words per sentence.

Interesting observations.

Perhaps the need reread to understand is related to a limit in the brain. Isn't the limit around 7-8 for the average person without memory training?

When I was reading this, I was thinking of idea of chunking. The strategy, not something you do after a big night out.
http://en.wikipedia.org/wiki/Chunking_(psychology)

Perhaps Gladwell and Godin are popular and sell well because their ideas are easily understood.

There is a perl module for determining lots of statistics.
http://search.cpan.org/~kimryan/Lingua-EN-Fathom-1.11/lib/Lingua/EN/Fathom.pm

An explanation of MS-Word report can be found here
http://www.brainbell.com/tutorials/ms-office/Word/Make_Sense_Of_Word's_Readability_Statistics.htm

And the readability test mentioned has got a wiki entry here
http://en.wikipedia.org/wiki/Flesch-Kincaid_Readability_Test

Have Fun

Paul

If you're such a fan of Jameson, could you spell his first name correctly? Sheesh, man.

That's a great analysis. I always believed it from an anecdotal perspective, but having real numbers definitely confirms it.

I think Robert Kiyosaki put it best when he said his writing wasn't the most eloquent but it was still one of the best selling. He writes his thoughts in clear and simple sentences. He makes it very easy to understand his message rather than convoluting it with extensive verbiage.

Of course I'm not saying to dumb it all down, just to write what you mean. A great book that I strongly recommend on how to do this is called "On Writting Well". You can read my review of it at: http://www.followsteph.com/2007/07/14/book-recommendation-on-writing-well/

Really interesting analysis! I'll try to write shorter sentences now.

Post a comment

If you have a TypeKey or TypePad account, please Sign In

My Photo

SBJ via Twitter

    follow me on Twitter

    The Basics

    • I'm a father of three boys, husband of one wife, and author of five books. In early 2007 I went and foolishly got myself a day job running the hyperlocal community site, outside.in that I co-founded the year before. We spend most of the year in Park Slope, Brooklyn, though I'm on the road a lot giving talks. (You can see the full story here.) Personal correspondence should go to sbj6668 at earthlink dot net. Media requests should go to Matthew.Venzon at us.penguingroup dot com. If you're interested in having me speak at an event, drop a line to Wesley Neff at the Leigh Bureau (WesN at Leighbureau dot com.)

    Live SBJ

    StoryMap

    Recent Essays

    My Books

    • : The Ghost Map

      The Ghost Map
      The latest: the story of a terrifying outbreak of cholera in 1854 London 1854 that ended up changing the world. An idea book wrapped around a page-turner. I like to think of it as a sequel to Emergence if Emergence had been a disease thriller. You can see a trailer for the book here.

    • : Everything Bad Is Good for You: How Today's Popular Culture Is Actually Making Us Smarter

      Everything Bad Is Good for You: How Today's Popular Culture Is Actually Making Us Smarter
      The title says it all. This one sparked a slightly insane international conversation about the state of pop culture -- and particularly games. There were more than a few dissenters, but the response was more positive than I had expected. And it got me on The Daily Show, which made it all worthwhile.

    • : Mind Wide Open : Your Brain and the Neuroscience of Everyday Life

      Mind Wide Open : Your Brain and the Neuroscience of Everyday Life
      My first best-seller, and the only book I've written in which I appear as a recurring character, subjecting myself to a battery of humiliating brain scans. The last chapter on Freud and the neuroscientific model of the mind is one of my personal favorites.

    • : Emergence: The Connected Lives of Ants, Brains, Cities, and Software

      Emergence: The Connected Lives of Ants, Brains, Cities, and Software
      The story of bottom-up intelligence, from slime mold to Slashdot. Probably the most critically well-received all my books, and the one that has influenced the most eclectic mix of fields: political campaigns, web business models, urban planning, the war on terror.

    • : Interface Culture : How New Technology Transforms the Way We Create and Communicate

      Interface Culture : How New Technology Transforms the Way We Create and Communicate
      My first. The book I wrote instead of finishing my dissertation. Still in print almost a decade later, and still relevant, I think. But I haven't read it in a while, so who knows what's in there!

    Blog powered by TypePad