« Battle Of The Sexes, cont. | Main | DevonThink Continued »

Tool For Thought

This week's edition of the Times Book Review features an essay that I wrote about the research system I've used for the past few years: a tool for exploring the couple thousand notes and quotations that I've assembled over the past decade -- along with the text of finished essays and books. I suspect there will be a number of you curious about the technical details, so I've put together a little overview here, along with some specific observations. For starters, though, go read the essay and then come back once you've got an overview.

The software I use now is called DevonThink, and I'm sorry to report that it is only available for Mac OS X. (I know there are a number of advanced search tools available for Windows, so I'm sure most of what I describe here could be reproduced -- I just don't know enough about the search tools on that platform to recommend anything.)

I talked in the Times essay about using the tool as a springboard for new ideas and inspiration. Here's what that process looks like in practice. This is the window that shows me an overview of part of my "research library" in DevonThink:

screen1.jpg

These are all books that I have transcribed digital passages from over the past 10 years or so -- you can see how many quotes for each book in the little number in parentheses after each title. Oftentimes I'll start the exploration with a straightforward keyword search, in this case: "urban ecosystem." I plug that in, and get back one result, a short quote from Manuel DeLanda's excellent 10,000 Years Of Non-Linear History.

screen2.jpg

This is where it gets interesting. I take that quote, and click on the "see also" button, which generates an instant list of other documents or quotes that have some semantic connection to the original one. I can see a few words from the entry, along with the author and book title.

screen3.jpg

I find another, more elaborate quote from DeLanda in that bunch:

screen4.jpg

And then I perform a "see also" on that quote. I get back a few pointers to essays that I've actually written -- and completely forgotten about -- including a review of an E.O. Wilson book on biodiversity that I wrote about three years ago. Ultimately, I end up with this wonderful quote from Jane Jacobs that draws an explicit analogy between natural and made-made ecosystems. The whole process takes me no more than a minute.

screen5.jpg

Over the past few years of working with this approach, I've learned a few key principles. The system works for three reasons:

1) The DevonThink software does a great job at making semantic connections between documents based on word frequency.

2) I have pre-filtered the results by selecting quotes that interest me, and by archiving my own prose. The signal-to-noise ratio is so high because I've eliminated 99% of the noise on my own.

3) Most of the entries are in a sweet spot where length is concerned: between 50 and 500 words. If I had whole eBooks in there, instead of little clips of text, the tool would be useless.

I think #3 is the point that needs to be drilled home to people working on desktop search. It's been hidden from us largely because the web itself is broken up into pages that are often in that 500 word sweet spot. Think about the difference between Google and Google Desktop: Google gives you URLs in return for your search request; Google Desktop gives you files (and email messages or web pages where appropriate.) On the web, a URL is an appropriate search result because it's generally the right scale: a single web page generally doesn't include that much information (and of course a blog post even less.) So the page Google serves up is often very tightly focused on the information you're looking for.

But files are a different matter. Think of all the documents you have on your machine that are longer than a thousand words: business plans, articles, ebooks, pdfs of product manuals, research notes, etc. When you're making an exploratory search through that information, you're not looking for the files that include the keywords you've identified; you're looking for specific sections of text -- sometimes just a paragraph -- that relate to the general theme of the search query. If I do a Google Desktop search for "Richard Dawkins" I'll get dozens of documents back, but then I have to go through and find all the sections inside those documents that are relevant to Dawkins, which saves me almost no time.

So the proper unit for this kind of exploratory, semantic search is not the file, but rather something else, something I don't quite have a word for: a chunk or cluster of text, something close to those little quotes that I've assembled in DevonThink. If I have an eBook of Manual DeLanda's on my hard drive, and I search for "urban ecosystem" I don't want the software to tell me that an entire book is related to my query. I want the software to tell me that these five separate paragraphs from this book are relevant. Until the tools can break out those smaller units on their own, I'll still be assembling my research library by hand in DevonThink.

I wonder whether it might be possible to have software create those smaller clippings on its own: you'd feed the program an entire e-book, and it would break it up into 200-1000 word chunks of text, based on word frequency and other cues (chapter or section breaks perhaps.) Already Devonthink can take a large collection of documents and group them into categories based on word use, so theoretically you could do the same kind of auto-classification within a document. It still wouldn't have the pre-filtered property of my curated quotations, but it would make it far more productive to just dump a whole eBook into my digital research library.

The other thing that would be fascinating would be to open up these personal libraries to the external world. That would be a lovely combination of old-fashioned book-based wisdom, advanced semantic search technology, and the personality-driven filters that we've come to enjoy in the blogosphere. I can imagine someone sitting down to write an article about complexity theory and the web, and saying, "I bet Johnson's got some good material on this in his 'library.'" (You wouldn't be able to pull down the entire database, just query it, so there wouldn't be any potential for intellectual property abuse.) I can imagine saying to myself: "I have to write this essay on taxonomies, so I'd better sift through Weinberger's library, and that chapter about power laws won't be complete without a visit to Shirky's database."

These extra features would be wonderful, but the truth is I'm thrilled to have the software work as well as it does in its existing form. I've been fantasizing about precisely this kind of tool for nearly twenty years now, ever since I lost an entire semester building a Hypercard-based app for storing my notes during my sophomore year of college. There's a longstanding assumption that the modern, web-enabled PC is the realization of the Memex, but if you go back and look at Bush's essay, he was describing something more specific -- a personal research tool that would learn as you interacted with it. That's what I think about whenever I use this system to stumble across a genuinely useful new idea: finally, I have a Memex!

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8345166f269e200d834250c1f53ef

Listed below are links to weblogs that reference Tool For Thought:

» Tool for Thought from Matthew G. Kirschenbaum
Steven Johnson links to his own piece on a "Tool for Thought," a discussion of desktop data mining tools he sees as the next big thing in electronic literacy. Useful context for the NORA project. [Read More]

» Personal bodies of knowledge from Jots & Sketches
I really like this article by Steven Berlin Johnson, author of Emergence and Interface Culture, about his research library setup in DevonThink. I've been testing DevonThink for about a week now and I am probably going to buy it. I... [Read More]

» Steven Johnson on Associative Search from Geek Notes
A very interesting essay in this weekend’s New York Times Book Review: Steven Johnson: “Tool for Thought”. Excerpts: 2005 may be the year when tools for thought become a reality for people who manipulate words for a living, thanks to... [Read More]

» Links for 31st of Jan 05 from idiolect.org.uk
My friend Jim Findley, in Doctor Who, 'Resurrection of the daleks' My uncle david has a good suggestion for a christening present (in The Guardian) Review of Gladwell's Blink in The New York Times 'What sort of life (if any), what sort of world, what ... [Read More]

» Superwriter 5.0 from Corante New York
In a New York Times Book Review essay, Steven Berlin Johnson takes a look at the array of new writing tools available to creative types looking for a little inspiration. Johnson writes, "2005 may be the year when tools for... [Read More]

» A Psuedo-Memex from Auxiliary Memory
stevenberlinjohnson.com: Tool For Thought Cool writing tool for organizing your personal 'library' of quotes and quotations into something that can even suggest new connections between ideas. Neat. Now I just need a Mac to run it on.... [Read More]

» Ideas management from Caveat Lector
In an interesting article Steven Johnson describes a knowledge management tool that he uses for his research. This is a text only search engine but it uses word frequencies to determine which articles are likely to be relavent to the search phrase. F... [Read More]

» Johnson and Search - No, Exploring from John Battelle's Searchblog
The ever wonderful Steven Johnson riffs in the NYT about a tool he's been using for a few years in his writing, one that we probably all wish we had. The raw material the software relies on is an archive of my writings and notes, plus a few thousand c... [Read More]

» Computer Tool That's Like Your Our Mind Scanner from PJNet Today
Now there's a computer tool that's like taking a tour through your own mind, and suddenly all those little bits and pieces of information you are storing away in your computer or blog will all come back some day to... [Read More]

» Think Better Than Steven Johnson from Brain Waves
Writers, researchers and all the rest of us information saturated, multi-tasking individuals should follow Steven Johnson's lead on how to best organize one's thoughts. Steven is an extremely prolific and successful writer. He is the author of several ... [Read More]

» ch-ch-ch-ch-changes from mamamusings
The relative quiet around here hasn’t been a sign of malaise. Instead, it’s been an indication that I’ve been deeply engaged in activities that take me away from the blogosphere…and for good reason. At the beginning of the year,... [Read More]

» Symbiotic thought from Joho the Blog
Steve Johnson, one of my favorite writers, has a piece in the NYTimes about software that collaborates in the thinking process. He elaborates in his blog, complete with screen captures of DevonThink at work. The software does a semantic analysis of wha... [Read More]

» "finally, I have a Memex!" from Future of the Book
It's quite suggestive that DEVONthink's semantic search function can to an extent be trained, taking the obnoxious little puppy on Windows search toward its full potential - a sleek, truffle-tuned hound. When Johnson loads his body of work onto the com... [Read More]

» Text chunking from Defenestrated
Text chunking is a neat idea that's popularity is overdue. In 2003, I helped put together a real-world text chunking content management system that simultaneously managed and supported online and offline publishing solutions. This is this project's sto... [Read More]

» Steven Berlin Johnson: from mikel.org | Michael Boyle's weblog
Tool For Thought. An article about the software he uses to help him do the research for his articles and books.... [Read More]

» Cool Tool from LifeHertz
A pretty interesting description of computer software that can organize your own personal subset of information. Examples given include previous things you've written, snippets of articles and books, etc. He then goes on to describe how this empowers... [Read More]

» PIM from Green Hat Journal
Once again I'm procrastinating by thinking about personal information management (PIM) and knowledge management software. Many of the existing systems are complex graphical interfaces where the user organizes information in pretty pictures. But all I w... [Read More]

» Johnson on DevonThink from Gen Kanai weblog
Steven Berlin Johnson, author of many wonderful books including ''Mind Wide Open," and ''Everything Bad Is Good for You,'' which... [Read More]

» http://www.lehopictures.com/links/archives/003296.php from links
stevenberlinjohnson.com: Tool For Thought... [Read More]

» Tools for Thought from Thinking About Technology
Steven Johnson writes a fascinating essay in the NY Times Sunday Book Review on the tool he uses for organizing and collecting thoughts - a wonderful example of an innovative information interface. [Read More]

» man and machine from thinking machine
Steven Johnson's facinating encounter with the machines:Now, strictly speaking, who is responsible for that initial idea? Was it me or the software? It sounds like a facetious question, but I mean it seriously. Obviously, the computer wasn't conscious ... [Read More]

» Tool for Thought: A Thinking Partner from exploring edges
Steven Johnson, author of Emergence: The Connected Lives of Ants, Brains, Cities, and Software and Mind Wide Open: Your Brain and the Neuroscience of Everyday Life, recently wrote an essay for the New York Times called Tool for Thought. He says: [This ... [Read More]

» How "similar chunks of text" could help people think better from Erik Benson's Morale-O-Meter
http://www.stevenberlinjohnson.com/movabletype/archives/000230.html... [Read More]

» DevonThink research and writing tool from Lifehacker

Author Steven Berlin Johnson demonstrates how he used research tool DevonThink to write his last ...

[Read More]

» Meteorologer, associerande sökningar, min utombordshjärna from Blind Höna
Vad är det okända sambandet mellan John Pohlman och Josef Stalin? Precis som Håkan visste: Stalin jobbade som meteorolog (inte... [Read More]

» How do you use DevonThink? from dekay.org
Ever since I got my hands on DevonThink some years ago I knew there was some larger meaning to this wonderfol application - I had been trying it then but somehow was overwhelmed by its possible uses. When I finally adopted DevonThink, I believe it was ... [Read More]

» http://www.mawopi.com/archives/2005/02/superdumpling_h.html from Many a western omelette plugs intestine.
mvb: hey mvb: www.greyworld.com mvb: it's a bonanza. mvb: 2 screens are better than one mvb: i thin ki need three jsd returned at 11:36:08 PM. jsd: you there? mvb: y jsd: oh man jsd: I just read the inside... [Read More]

» Tools for thought from CyberJournalist.net
Steven Johnson in The New York Times: 2005 may be the year when tools for thought become a reality for people who manipulate words for a living, thanks to the release of nearly a dozen new programs all aiming to... [Read More]

» music and machines from thinking machine
So I've been thinking about Steven Johnsons encounter with the machines a lot recently. I've also been listening to a whole of IDM recent (thanks last.fm!). And then it dawned on me... Electronic music musicians have been working with and have been ... [Read More]

» RSS, the Long Tail and iTunes from Banapana
Chris Anderson of the Long Tail blog identified a trend in commerce that he calls the Long Tail some issues ago in Wired Magazine. To summarize his observation, most (offline) stores have a limitation of physical capacity that keeps them from selling e... [Read More]

» All watched over by machines of loving grace from Omit Needless Words

In which is discussed: the cyborg self, mac shareware, Ray Bradbury, Richard Brautigan, Linux distributions, scooting through disused industrial areas, self-help books and the art of overcoming procrastination, the cha...

[Read More]

Comments

I'm trying to replicate your system. Do you name the individual entries with the text of the quote?

Also, can you go a bit into your quote-harvesting process? Do you input as you read, or ...?

Thanks.

I think plain old paragraphs fit your #3 requirement pretty well. They're units of text whose size is usually on the smaller side of the 50-500 "sweet spot", and almost always carry enough information to be somewhat self-contained in relation to the text around them.

Each file type usually has a specific way of defining paragraphs, and even in plain text there are a few common strategies most people use, such as keeping a blank line between two paragraphs, or preceding each one with a tab or a few spaces. For this reason, making a program to fetch paragraphs from a document wouldn't be too hard.

I use DevonThink for a similar purpose, and I love it. Although I should mention that in my browser I can't actually see your screenshots (?)

Also:
something I don't quite have a word for: a chunk or cluster of text

Have you considered using "lexia" as the word you're looking for?

I would like to see something akin to this for images. Any ideas?

One small, mildly off-topic request: would you mind changing the images in this post to be in PNG or JPEG format? Neither Firefox nor IE on Windows seems to be able to load them.

Sorry about the images -- could have sworn they were jpegs before. They should be viewable now.

As for how I capture the quotes themselves, I have long used an advanced piece of software called a "research assistant" to type in passages that I've marked. I just started experimenting with scanning and OCR'ing in though, which seems to work fairly well...

Very interesting -- thanks for sharing.

As for the entire book vs. quote -- I use a program on Windows called DTSearch which is basically a full-text search program on steroids.

One of the things it can do is show the results in context and use fuzzy searches, proximity settings, etc. so if I search for "concept X", rather than saying "oh, it's somewhere in this e-book here" it will show the relevant parts of the book that match the search.

Still a long way from being perfect and it can't do some of the things it looks like you're doing with DevonThink, but works pretty well.

I've looked at a lot of this stuff on Mac and Wintel, and its kind of odd at just how primitive the tools are for either OS for this sort of thing. If you'd have asked me in the mid-1990s, I'd have assumed progress on organizing and searching free-form info would have progressed a lot farther than it has.

This Devonthink app seems a lot like the new Spotlight feature in the upcoming Mac OS 10.4 Tiger. What sorts of features does Devonthink offer that Spotlight won't?

(As in, why should I buy Devonthink instead of waiting to upgrade to Tiger?)

Can someone recommend an equivalent to DevonThink for Windows? I don't even know how to do a google search for the software because I don't know what it is called in the general sense.

When and where will your piece on London sewers appear - sounds interesting (for a civil engineer like me anyway).

I'll second the request above for the names of Windows programs equivalent to Devon. Shouldn't all Devon's competitors be deluging you with emails after your article?

I'm very interested in mindhandling software and I am thus very glad about your post about DevonThink. Right now I'm testing it and will most probably buy it.

Please keep us furthermore informed about think and expression tools, such as ThinkDevon, Ulysses and others.

Cheers, Stefan

Sorry -- comments were down for a few hours. Should be back up now.

Suddenly, DevonThink makes sense. As a returning student after many, many years away, I'm trying to find how to take best advantage of the technology which simply didn't exist before. DevonThink is a tool I've downloaded and tried, and never really had it click. It's clicking now.

What's problematic, however, is that now I've got one more tool which does one thing and that's it. Sure, I could compose in DT, but it's not its strength. So I compose in one location, save my research in DT, and my bibliographic info in EndNote (which I might drop for Sente or Bookends anyway). I suppose three tools isn't that bad, now that I think about it.

If you use Windows, check out www.asksam.com

Questia (online library of ebooks) can make semantic searches except it can handicapped by the fact that you're searching through whole e-books even though it lets you search inside the book.

(www.questia.com)

I tried DevonThink some months ago. I initially liked it but then stopped to use it, as it lacks multilingual capacities. I usually store quotes or chunks of text in the language they are written, and that approach unfortunately prevents DevonThink to do its magic. Still looking for a piece of software with such capacity.

I'd add that the useful chunk size online is often not the URL of a main page or an index but a permalink pointing to a specific, often brief, entry in a weblog.

btw, SBJ, ever experiment with Voodoo Pad?

Steven - this is poignant post about search. We just completed a book titled "Lucene in Action" and I built a "search inside" the book website for it. The granularity of search results are book sections, not pages. I am also capturing, yet not exposing yet, each page of a section in order to have better information displayed. I've also linked a blog into the table of contents page - so I can add commentary/errata after the fact to a book section. I will be building in "see related" types of connections that are not made explicit.

I'd be grateful for you to review what I've built and offer suggestions to further enhance this type of thing. I have not yet considered hooking in handling multiple books, but our publisher is definitely interested in adopting the system I've built and these types of inter-book connections would be a great thing to have.

A much simpler (and of course less powerful) program for writers to keep track of notes of any kind (I use it for quotes) is Notational Velocity. It's free and is OS X only. It's my most used app. You can get it here: http://pubweb.nwu.edu/~zps869/nv.html

Steven:

You said: "I wonder whether it might be possible to have software create those smaller clippings on its own"

I have two possible solutions you could investigate:

1) Book2Pod is free, and converts etext into iPod-notes sized chunks -- each chunk is about 4K big, which works out to about 680 words - a bit higher than the sweet spot, but maybe not so bad. http://www.tomsci.com/book2pod/

2) The O'Reilly network published a 3 parter on how to build an eDoc reader for the iPod here: http://www.macdevcenter.com/pub/a/mac/2004/12/14/ipod_reader.html I think they have the finished software available for download, but, since they give you the source, you can probably hack it to generate notes much smaller than 4K (ie: somewhere in the 50-500 word zone)

Both of these are free. The second one is interesting because it can format text from pdf's into iPod sized notes.

Anyway: Thanks for sharing DevonThink with us. I've seen it before, but I think I'll go have a closer look at it in light of what you just wrote.

Hi Steven, Glad to come across your article in the Times and your site. I'm really curious how you digitize/save all your qoutes from other sources. Are they word coduments, emails to self, some kind of database? I'm doing the same but am pretty haphazard about it and would love to hear your method. Thanks.

For almost a decade from ~1988 I kept my reading & research commonplace book in Persoft's IZE, a DOS textbase -- orphaned all too soon -- that did simple but very useful things with keywords presented in an indented hierarchy. The more entries and keywords I gave it, the more the hierarchies took on increasingly interesting and suggestive sequences; i.e. they looked more like *outlines.* IZE seemed to understand the content of the passages.

I knew perfectly well that appearance was "just" a reflection of my choices of keywords -- an embodiment of how I used and related words -- but it felt uncanny all the same.

Norretranders quotes Kline quotes Hertz on Maxwell's equations: "One cannot escape the feeling that these equations have an existence and an intelligence of their own, that they are wiser than we are, wiser even than their discoverers, that we get more out of them than was originally put into them."

"One of the new applications that came out last year was Google Desktop -- using the search engine's tools to filter through your personal files." Loading this Google software into at least a Windows machine opens a back door to the computer. Anyone can open this door and walk into your computer.

And, of course, ten minutes later I trip over New Scientist on semantic search for Google... now Slashdotted...

http://www.newscientist.com/article.ns?id=dn6924

The comments to this entry are closed.

My Photo

SBJ via Twitter

    follow me on Twitter

    The Basics

    • I'm a father of three boys, husband of one wife, and author of five books. In early 2007 I went and foolishly got myself a day job running the hyperlocal community site, outside.in that I co-founded the year before. We spend most of the year in Park Slope, Brooklyn, though I'm on the road a lot giving talks. (You can see the full story here.) Personal correspondence should go to sbj6668 at earthlink dot net. Media requests should go to Matthew.Venzon at us.penguingroup dot com. If you're interested in having me speak at an event, drop a line to Wesley Neff at the Leigh Bureau (WesN at Leighbureau dot com.)

    Live SBJ

    StoryMap

    Recent Essays

    My Books

    • : The Ghost Map

      The Ghost Map
      The latest: the story of a terrifying outbreak of cholera in 1854 London 1854 that ended up changing the world. An idea book wrapped around a page-turner. I like to think of it as a sequel to Emergence if Emergence had been a disease thriller. You can see a trailer for the book here.

    • : Everything Bad Is Good for You: How Today's Popular Culture Is Actually Making Us Smarter

      Everything Bad Is Good for You: How Today's Popular Culture Is Actually Making Us Smarter
      The title says it all. This one sparked a slightly insane international conversation about the state of pop culture -- and particularly games. There were more than a few dissenters, but the response was more positive than I had expected. And it got me on The Daily Show, which made it all worthwhile.

    • : Mind Wide Open : Your Brain and the Neuroscience of Everyday Life

      Mind Wide Open : Your Brain and the Neuroscience of Everyday Life
      My first best-seller, and the only book I've written in which I appear as a recurring character, subjecting myself to a battery of humiliating brain scans. The last chapter on Freud and the neuroscientific model of the mind is one of my personal favorites.

    • : Emergence: The Connected Lives of Ants, Brains, Cities, and Software

      Emergence: The Connected Lives of Ants, Brains, Cities, and Software
      The story of bottom-up intelligence, from slime mold to Slashdot. Probably the most critically well-received all my books, and the one that has influenced the most eclectic mix of fields: political campaigns, web business models, urban planning, the war on terror.

    • : Interface Culture : How New Technology Transforms the Way We Create and Communicate

      Interface Culture : How New Technology Transforms the Way We Create and Communicate
      My first. The book I wrote instead of finishing my dissertation. Still in print almost a decade later, and still relevant, I think. But I haven't read it in a while, so who knows what's in there!

    Blog powered by TypePad