Anybody out there know of an easy way to export my MovableType blog entries to text files? I would like a nice big Spotlight-friendly folder of text files, each one being a blog entry. I'd like the title, date, and primary text included, but have no interest in all the other info -- categories, comments, trackbacks, etc. (It'd also be nice to have the title of the document by the title of the blog entry, not some weird numerical thing.) It seems like the only default export option from MovableType itself gives you a giant file with all the info included. Surely there's got to be an easy way to do a more streamlined version?
Speaking of Spotlight, I never got around to doing a writeup on OS X Tiger given all the early Everything Bad craziness I was dealing with. The short report is that it's very nice, though a little less stable, but Spotlight is everything it's cracked up to be and more. It is easily the most profound change in the way I explore my personal information since the original Mac OS came out -- I basically just search for everything now: files, messages, applications, etc. I spend almost no time browsing through folders.
The one thing that's missing that I think would be pretty easy to do, and would make a big difference, is the ability to Spotlight your browser cache. I'm constantly thinking to myself: where was that article I just read yesterday about, say, teen drug use stats? Right now, the easiest way to retrieve it is to just reconstruct the original search I did in Google, which is sometimes easy to do if I remember the search query exactly and the page was one of the top results. But it would be just so much easier to do it through Spotlight. Those cached files are just sitting on my hard drive anyway -- why isn't Spotlight searching them?
I've never used Spotlight, so I don't know how restrictive the text file requirement is, but if it's relatively loose, here's a suggestion. In the MT installation, add a second individual entry archive, and set the archive file template to something like:
/.txt
[Or .html, if Spotlight has problems stripping the xhtml markups when looking at the files.]
Then rebuild the site, and you should have a number of new mt/subdirs based on the number of extant categories (I know you're not interested in categories, but it would easily solve the problem of having any duplicate filenames, and of course, you can kill the categories dir if you don't like it), and then just download those folders to the HD. It might work - like I said, no Spotlight experience, but it should come close to what you want.
If Spotlight can't read them, at least doing the .html as the archive file template will at least give you name files rather than numbers, which should help when you export the entries.
Posted by: Kenneth Rufo | August 04, 2005 at 02:08 AM
Oops, it cut the html/MT tags out of the comment. The /.txt line should read:
$MTEntryCategory dirify="1"$>/$MTEntryTitle dirify="1"$.txt
And the ".html" reference in the last paragraph should read "$MTEntryTitle dirify="1"$.html.
Posted by: Kenneth Rufo | August 04, 2005 at 02:11 AM
There is a built in export function in Movable Type. It creates a large text file with all your entries. The html tags are left in, but I find them relatively unobtrusive. It's one of the tabs on the left, labeled import/export.
Posted by: Abe | August 04, 2005 at 02:22 AM
The Safari cache files are not plain text or plain images. They are binary files.
You can search the Safari history (titles, URLs, partial URLs) from the Bookmarks window. The search box at the bottom has an option to search the history.
Not as good as full-text on the page contents, but...
Posted by: Chris Janton | August 04, 2005 at 04:53 AM
Another way of getting the content out:
Create a new index template (copy it from your main index template so you have all the html headers etc) and replace the body with this:
Give the output file a name like "all_content.html". Save the file and rebuild the index. The page will be http://www.stevenberlinjohnson.com/movabletype/all_content.html
It will give you an enormous web page with the title, URL, date and content of all your entries.
Posted by: DonnaM | August 04, 2005 at 06:20 AM
Building on Donna's comment:
Make a master index file (a default is included in the distribution.) Pull all the extraneous information out that you don't want in the final file.
Rebuild to create the file.
Run through Lynx to strip the HTML out. (IMO, Lynx is the best HTML to ASCII interpreter out there, no graphics for it.) From my previous work I prefer the following options to convert HTML to text: -dump -force_html -image_links -pseudo_inlines -dont_wrap_pre -image_links -underscore -width=78
Those are actually the options I use to convert my blog into individual emails. (See http://www.inmff.net/peidm/mailer.txt for the source before it is de-html'd.)
Posted by: nickb | August 04, 2005 at 10:58 AM
Don't forget, you don't have to strain to remember your exact Google searches. Safari keeps past searches as a pulldown menu in the built-in Google search box at the top of the browser window.
Posted by: jeffeng | August 04, 2005 at 11:17 AM
You are right, the ability to use Spotlight to search your browser cache would be terrific. Until Apple extends Spotlight, you might look at History Hound, a Mac OS X utility that is designed to do just that.
http://www.stclairsoft.com/HistoryHound/index.html
Posted by: Doug | August 06, 2005 at 01:52 AM
If you find out how to do it please post it I would love to also move all my moveable tpy einto a better folder.
Posted by: Jim | August 10, 2005 at 05:28 AM