Okay, this will soon become one of the site's recurring themes. Like many of you, I'm obsessed with Google, and particularly obsessed with what Google can tell me about me. I'm sure someone has done this before, but it occured to me today that you could use Google as an interesting yardstick for measuring how much a given word had come to be associated with a specific person, as in the tired saying, "You look up 'selfish' in the dictionary, and there's a picture of you."
Basically, all you do is search Google for a specific word, and get back the total number of results. Then you search that set for someone's name. Divide the second number by the first, and you get a percentage that shows you how much the person "owns" the word. Call it semantic mindshare. Or lexical penetration. Or whatever.
At any rate, I tried it with three case studies: "emergence" and "interface" with "Steven Johnson"; and then, just because I blogged him earlier, "deconstruction" with "Jacques Derrida." I'm happy to report that Derrida is soundly kicking my ass. Let's go to the tape...
Emergence: 1,450,000 hits
Emergence with SBJ: 5190
Mindshare: .3%
Interface: 21,800,000
Interface with SBJ: 3790
Mindshare: .01%
Deconstruction: 179000
Deconstruction with JD: 20,500
Mindshare: 5.4%
Now, to a certain extent Derrida's got it easy, because deconstruction is a word that's primarily used in the context of philosophy, whereas interface and emergence have broader usages. (There's also the fact that I didn't invent a whole interpretative model the way Derrida did.) It would be interesting to apply the same technique to multiple authors for the same word: to see, for instance, whether Dawkins or Gould owns more mindshare of "Darwinism." Would be very interesting as well to see those percentages change over time.
(Not to put myself on the same plateau with Derrida, Dawkins, and Gould, but you can see how fun this is...)
In a similar vein, there's also googlefight.com. Perhaps someone could write an app that uses to the Google API to calculate Googleshare...
Posted by: Gene | November 15, 2002 at 03:04 AM
Darwinism: 125,000
Darwinism with Dawkins: 10,200
Mindshare: 8.16%
Darwinism with Gould: 11,200
Mindshare: 8.96%
These percentages can add up to more than 100% due to the fact that both can be mentioned in the same source.
Posted by: Dave Babbitt | November 15, 2002 at 07:47 AM
You can also use googlism.com to find out what Google thinks about you. On a search for Steven Johnson you'll discover that:
- steven johnson is the author of emergence
- steven johnson is a very necessary person at this time in human development
- steven johnson is very busy
- steven johnson is charged with reckless endangerment
Posted by: Janus | November 15, 2002 at 08:19 AM
I'd seen googlism before -- that's a pretty funny site. that reckless endangerment charge is news to me. :)
Very interesting that Dawkins and Gould are basically tied -- also noteworthy that their "penetration" is even larger than Derrida's, given that they didn't exactly invent Darwinism. (Shows you what great popularizers they were/are.) Maybe we should run similar numbers on Marxism...
Posted by: Steven Johnson | November 15, 2002 at 11:01 AM
Howdy Steven et al,
In lieu of a script I've owed Steven since the dawn of time, I whipped up a version of Googleshare in Perl.
I have it running at:
http://www.raelity.org/lang/perl/google/googleshare/
The source is available for your downloading, mutating, spindling pleasure at:
http://www.raelity.org/lang/perl/google/googleshare/googleshare.txt
Enjoy!
Rael
Posted by: Rael Dornfest | November 16, 2002 at 02:04 AM
Howdy,
Thanks to Steven for the further suggestion of allowing for multiple names to be compared against the initial query.
I've altered my Googleshare implementation to accept a comma-delimited list of names in the 'Persons" field like so:
Steven Johnson, Slime Mold
Running that against Emergence returns:
--
Steven Johnson has a 0.34% googleshare of "Emergence"
Slime Mold has a 0.04% googleshare of "Emergence"
--
(Couldn't resist ;-)
You'll find the new implementation in the same place:
http://www.raelity.org/lang/perl/google/googleshare/
Enjoy!
Rael
P.s. Note that each of the names is run as a separate query, so 3 names will drain 4 queries from your Google API key (one for the initial query).
Posted by: Rael Dornfest | November 16, 2002 at 02:28 AM
Googleshare reminds me of something I wrote a few months ago: http://www.squarefree.com/google/relatedness.html
Google Relatedness makes three queries given two words: the two words separately and the two words together. It then compares how often pages contain both of two words with what you would expect given a database of a few billion pages and two words with known frequencies. Google Relatedness makes three queries while Googleshare only makes two. I think Googleshare is both easier to understand and more useful.
Posted by: Jesse Ruderman | November 21, 2002 at 03:05 AM
Emergence returns: 1,300,000 pages
"Steven Johnson" Emergence returns: 4,520 pages
"Insect" Emergence returns: 55,700 pages
Well, you know what they say about the insects taking over ...
Posted by: Hugh Crawford | November 21, 2002 at 11:41 AM
To move this on. Take the GoogleShare of organisations or departments to rate their Linkedness with a topic.
For ex. MIT Media Lab's GoogleShare of, say, the Semantic Web. Calculate the individual Googleshare of each faculty member with the phrase and average. Or, run all the faculty in an and string with the phrase.
Posted by: Adam Smith | November 23, 2002 at 03:05 AM