Archive for December, 2006

Where’s my data?

Jim Benson once again wrote something very insightful – I am detecting a pattern here. Free Services Are Unaccountable Stewards. If you rely on others to safeguard your data and make sure it is accessible to you when you need it, there is always an SLA involved. A Service Level Agreement – think about it, and look it up for the services that you use. Sometimes it’s just implied (usually with systems that don’t require a login like Google), in other cases there’s some legal language that defines the responsibilities and promises – or lack thereof. For example look at the Terms of Service over at WordPress.Com. Here’s one of my favorite excerpts:

…in no event will Automattic, its suppliers or its licensors be liable to you or any other party for any direct, indirect, special, consequential or exemplary damages, regardless of the basis or nature of the claim, resulting from any use of the Website, or the contents thereof or of any hyperlinked website including without limitation any lost profits, business interruption, loss of data or otherwise, even if Automattic, its suppliers or its licensors were expressly advised of the possibility of such damages.

Automattic are the nice folks who make WordPress.Com available for free. Don’t get me wrong, there’s nothing wrong with this language – it’s a free service. What I am pointing out is the risk that you take with your data here. Same goes for Jim’s Gmail example. Or del.idio.us. These are all free and therefore you’re mostly on your own.

Personally I am rather uncomfortable with that. My Gmail account is almost exclusively used for throw away email (registrations at web sites and other things likely to just attract spam). My main email address (hohndel.org) gets you to a server running in my office (server is a loose description – it’s a Mac Mini). My WordPress based blogs run on that same server. I control where the data lives. I control the backup schedule. And yes, if power is out or my DSL link is down then my servers are down, too. That’s the price I pay. But at least my data is safe. Let me rephrase that. At least I control how safe my data is.

So many bots, so little time

The number of bots that are crawling my server is getting out of hand. A quick survey of the log files showed that about two thirds of all requests are coming from bots. Many are genuine (the nice folks at Bloglines or the billionairs at Google). But a lot are at least suspicious if not known to be evil.

Googling for the bot name (if given in the HTTP_USER_AGENT part of the request) gets you to many discussion threads listing many of the crawlers that you don’t want to visit your site (email harvesters, image harvesters, spam bots, etc) and many who are of unknown purpose (which in this day and age means that most likely you want to block them). Very interesting is this three part thread over at WebmasterWorld which discusses a few of the bots and more importantly good ways to get rid of them, especially those that ignore your robots.txt (and there are many other similar threads elsewhere).

I followed the consensus and decided to be a little more aggressive – a lengthy list of bots simply gets a Forbidden response from the Apache server. mod_rewrite is your friend.

Since I am blocking a most of the bots I notice two good side effects: on the one hand less clutter in the log files, on the other hand less traffic which means better response times for the people actually looking at my blogs (I had one bot pulling about 50MB worth of images over and over again from the site).

Email clients

I have used so many of them. The original Berkeley mail. Then elm, pine, vm (under Xemacs) and finally mutt. Those are all text console based and (at the risk of getting myself flamed here) are sorted in order of usefulness – with mutt clearly superior to the rest. They work exceptionally well if you don’t get a lot of HTML emails and if you don’t expect seamless integration of pictures, rich text documents and other attachments. Which, btw, until only a few years ago, meant they worked very well with the vast majority of email.

I also was exposed to the frightening class of gui-based email programs. The distressing Lotus Notes (which back then didn’t even speak the most basic Internet email protocols correctly – allegedly that’s fixed now). The utterly frightening Outlook Express. The omnipresent Outlook (which is not terrible as far as gui-based email programs go, but has all of their shortcomings that I’ll get to in a moment). Right now I use Entourage for work email – which in many ways is nicer than Outlook (for example, it runs on OS X and is reasonably well integrated into that which gives it a nice touch compared to Outlook), but in other ways worse (as it competes with Outlook, is from the same small software company in Redmond, WA, and still isn’t able to fully integrate with that same company’s Exchange server – how ridiculous is that? Entourage doesn’t understand MAPI and instead uses WebDAV to talk to Exchange – which simply takes a lot of potential features away).

And it’s sad to say, there’s a group of programs that’s even worse – the open source gui email programs (like Evolution, Thunderbird or Kmail). Why am I so negative? Well, they compete with Outlook and they don’t come even close. None of them can really integrate with the Exchange calendar (Evolution tries to but fails badly). None of them has a gui that’s even close to what Outlook or Entourage have to offer. They are slow (try using them with a 250MB mailbox under Exchange) and are simply hard to use – even allowing for the fact that gui-clients in general are bad for email…

Here, I said it again… so why do I dislike gui-clients so much when it comes to email? Simple. If you are dealing with a lot of email (and who isn’t, given the spam pandemic) then the number one task of an email client is to allow you to quickly sort, view and discard email based of a variety of criteria. Mail thread boring? Delete all emails in it. Mail author annoying you? Gone are his emails. Which other emails have I received from this person? Which emails where the subject contains the word “blog”?

Sure, you can do all of these with the gui programs. But that requires you to touch the mouse. Bzzzzt. Disqualified. If I get to an inbox with 400 new messages since yesterday evening (not unusual) I don’t have the time to keep moving from keyboard to mouse and back.

But let’s say for the sake of argument that there was a gui client that had a decent keyboard interface. That still leaves you with the problem that it will try to render all the stuff that people send you. Which is fine for the 5% of your email that you actually want to read in detail. And for the rest it is at best a waste (and with Outlook on Windows, often quite dangerous).

“But it’s so easy to use the gui clients!”, I hear you say. Yep, for the occasional or newby user. But once you spent some time with your email client (and again, this whole posting assumes that you get a serious amount of email – so you’ll be there soon enough) then all what makes the gui clients so easy to use at first now makes them even more annoying.

Yes, for people who love to send pictures or other embedded objects around to others, mutt is not as pretty. And the learning curve is steep. But I think it’s worth it. I use it every day for all my email at hohndel.org and just love it. Even though I read those email on Macs these days which means I’d have access to Mail.app – one of the better gui-clients out there. But a good text based mailer like mutt beats Mail.app for large volumes of email, any time.

Migrating from Blosxom to WordPress

So I decided to move from Blosxom to WordPress, first for my personal blog and then for my tech blog. And since I had about 550 postings and around 40 or so comments in my personal blog I needed a way to migrate my data. Googling didn’t find anything even remotely useful (the “import via RSS” suggestions simply lost too much formatting – things looked terrible, given how many pictures I have). Instead I figured I’d write a perl script that would do the hard work; pull all the postings and comments from Blosxom and import them into WordPress. Looking at the structure of the existing import scripts (and the fact that I know far less php than perl) I decided not to integrate this into WordPress but instead to insert the data directly into the mysql database. That should be fun. And amazingly it took not nearly as long as I feared!

Now I want to share what I learned with the rest of you, but the more I look at the script that I wrote, the more I realize that it is based on so many assumptions that it might be almost useless to anyone else. But then again, maybe it can help someone in a similar situation as a starting point. Writing it certainly helped me understand why WordPress doesn’t have an import function for Blosxom.

Here’s the fundamental idea of what I did

  • install WordPress on the target system. One assumption made in the script is that you can access the mysql database from a system that has the Blosxom files accessible in its file system.
  • set up the new blog. Depending on your needs you may have to find (or write) a theme that is similar to your Blosxom theme. In my case (the personal blog, not this one) the formatting of many of the postings was based on this being a fixed width theme of a certain width with certain classes defined in the CSS, certain margins set around different HTML objects, etc. So I started from something reasonably similar and then more or less wrote my own theme.
  • delete the default posting and comment, make any other changes you want (blogroll, etc) and set up all your categories (important – the script will fail if it finds a category that doesn’t exist).
  • back up your database
  • I mean it. Use the wp-backup plugin. Or do it manually in mysql. But back it up. Really. I restored this backup quite a few times while working around bugs in the script, typos in the blog postings, etc.
  • download the blosxomtowp.pl script.
  • read the script. Edit the variables at the top of the script. Look through the assumptions made. Here are the ones I’m aware of, but you really might want to read through the script and compare with your file system layout, posting structure, etc.
    • it assumes that you have shell access to the machine that blosxom runs on and that you can connect to the WP mysql database from that machine
    • it assumes that you use directories under the main blosxom blog directory for your category hierarchy – just as with using the “categorytree” plugin
    • it assumes that you use the “meta” and “metadate” plugin to set the date on your postings (but it’s easy to change this to use the file time stamp instead – I just haven’t done that)
    • it assumes that you are using the “feedback” plugin for comments (but I think “writeback” and some others have similar file layouts and formats)
    • it assumes that you have already created /all/ categories that you have in blosxom in your WP database
    • it assumes the database table layout in WP-2.0.5

    Figure out what else you want to preserve (assuming you have different plugins than I had). Figure out what you can live without.

  • you did back up the wp database, right?
  • go to the main directory of your blosxom tree and run the script on one posting
    .../path/blosxomtowp.pl misc/aposting.blog (note that I used “.blog” as suffix – for most people that will be “.txt”).
  • check your blog in a web browser. Did the posting show up? Does everything look right?
  • start debugging. wp-phpmyadmin was a huge help for me to see what went wrong in the mysql database
  • once this works for a few postings you can slurp all of it in (don’t forget to restore the backup, first, so you don’t get duplicate postings):
    find . -name \*.blog | xargs .../path/blosxomtowp.pl

I’m sure I’m forgetting a lot of things here. Please comment if you have additions, improvements, suggestions. The script is under the GPL, I’d be happy to accept fixes from anyone, but especially from people who actually are better at writing perl than I am (that’s not a high hurdle) and who can help me clean up the code.

What’s wrong with my Apples?

It’s just driving me insane… now my brand new MacBook Pro (Core 2 Duo) is broken. After an hour or so, its keyboard and touchpad stop working. If I connect a keyboard / mouse via USB everything is fine, but the built-in devices are simply dead.

Karen must be right – I have bad Apple karma or something; it is just astounding how frequently Apple hardware breaks for me.

Changes are here – the new blog (and a few small inconveniences)

So after my personal blog I now also moved the technical blog to WordPress.

As in the first case there are again a few things to point out:

  • in order for existing links to continue to work (and search engines to continue to be able to find old postings), the old blog will stay online, but won’t be updated anymore. In order to get to the Community Matters blog in the future you’ll have to use the new URL http://www.hohndel.org/communitymatters.
  • if you are using a feed reader like Bloglines or Netvibes or if you are using the RSS readers built into modern browsers like Firefox or Safari, you’ll have to resubscribe to this blog. I know that’s a pain, but given the way WordPress creates the feed, trying to use ModRewrite to make this happen automagically turned out to be a lot harder than I thought. On the plus side, you can subscribe to the nice new Atom feed (which most of those readers should pick up by simply clicking on that link Update: it seems that some readers want this as http reference instead of a feed reference as in the previous link – please try either).

Any problems, issues, complaints, compliments, praise, concerns for my mental health, etc… please leave a comment on the new site using this link.

This will also be the last posting to the old Blosxom blog. :-)