Cruft-free URLs

I just reconfigured my Movable Type blog to use cruft-free URLs. The basic premise is to clean up your URL scheme by removing non-essential information, thus ensuring that your links will live on in the future as you evolve with different web publishing tools and technologies.

In my case, I also wanted to change from my old method of daily archives (one file per calendar day, including one or more posts) to individual archives (one post per file). Among other benefits, this should help with search ability of the site because the entry post will be the same as the page header. (Right now, searching for something on my site in Google shows entries highlighting the date rather than the title.)

There are several good resources on this subject:

For my configuration, the main task was to change my individual entry URLs from the original format which stored multiple entries per day:

http://www.cantoni.org/yyyy/mm/dd.html#entry_title

to the new format which stores each entry on its own page (and drops the .html extension):

http://www.cantoni.org/yyyy/mm/dd/entry_title

This new URL seems like a good choice, includes the entry title which makes it human-readable, and still uses the yyyy/mm/dd hierarchy to help date the entries. Dropping the .html makes it possible to use other techniques or web tools in the future (e.g., PHP).

Here are the detailed steps I used to implement my cruft-free URLs in Movable Type:

  1. Installed Brad Choate’s MT-IfEmpty plugin (this will be used in the next step to form the URL with either the dirified title or the short name in the keyword field)
  2. In Weblog Config, I switched from Daily to Individual archives and set the filename template using Mark’s example:
    <span class="code"><$MTArchiveDate format=”%Y/%m/%d”$>/<MTIfEmpty var=”EntryKeywords”><$MTEntryTitle dirify=”1”$> </MTIfEmpty> <MTIfNotEmpty var=”EntryKeywords”><$MTEntryKeywords$></MTIfNotEmpty></span>
  3. Rebuilt all individual archives
  4. Updated my .htaccess file to serve up files as text/html by default (because I dropped the .html extension, this allows the pages to be served as if they were HTML): <span class="code">DefaultType text/html</span>
  5. Created appropriate Redirect entries in my .htaccess to redirect from the old URL scheme to the new one (more on this below)

Actually, the hardest part of this whole endeavor was that last step, redirecting from my old entries to the new. After brushing up on Apache’s mod_alias, then mod_rewrite, I tried numerous regular expressions trying to link from my old style URL to the new one.

After creating my regular expressions and carefully testing them in my text editor, I tried dozens of combinations without success. Finally I found the reason it wasn’t working: the anchor in the URL (that portion after the ‘#’ symbol) isn’t actually sent to the web server; it’s strictly used by the browser. That threw out my plan for redirection because my new scheme needs the entry title which is an anchor in the old scheme.

My Plan B was to create a (long) list of manual redirects for the 150 or so web entries. It’s not an ideal solution, but it works for everything except those days which had multiple entries (which fortunately were few).