Monthly Archives: October 2004

Bloglines Enclosure Download

I just published my “Bloglines enclosure downloader” script. This started as an experiment with the Bloglines service APIs. As other applications (in particular, Doppler) continued to advance, I put this on hold. Doppler had promised support for Bloglines support in version 1, so I waited. Once the final Doppler came out, I realized it wasn’t quite the way I wanted.

So, I did some more hacking on my little Perl script to create a 340-liner that works for me and might work for others.

This script has the key features I’m looking for, in particular the automatic association with a designated folder in the Bloglines service. You don’t need to load up my script with feed URLs, just point it to a single Bloglines folder and it’ll do the rest.

The script will automatically download audio files, tag them if needed, and create an M3U-format playlist. It doesn’t yet set up the files for automatic transfer to a media player, but that can certainly be added in.

If anyone finds this useful or has ideas for changes, please let me know!

Bloglines Enclosure Download Script

blogenc.pl — a Perl script for automatically downloading enclosures from RSS feeds tracked by Bloglines.com

Description

This script started as an experiment with the Bloglines service APIs with the addition of rudimentary download capabilities. As other applications (like iPodder and Doppler) continued to advance, I put this on hold but returned to it when I realized the other applications didn’t do it exactly the way I wanted.

The result? A 400-line Perl script that works for me and might work for others. Maybe it will be the starting point for others who don’t mind a little Perl hacking to create a solution that works for them.

The key features that describe this script are:

  • Uses Bloglines published interface to automatically find your list of feeds. To help manage the feeds, the script will only work with a designated ‘audio’ folder.
  • Uses Bloglines published interface to download the feeds themselves, using the provided time-stamping to only pull new items.
  • Once enclosure items are identified, uses wget to actually do the downloading.
  • For downloaded MP3s, set the ID3 Genre field to a specified value (I use ‘Speech’ but this can be changed).
  • For downloaded MP3s whose sample rate is less than 16 kHz, resample it to 16 kHz using LAME http://www.mp3dev.org/. (This is controlled by $resample in the Configuration Options section of the script. Set it to 0 if you don’t need the conversion.)
  • After all files downloaded, create a playlist in M3U format suitable for transfer to a portable device or desktop media player.

The script uses two local data files to keep track of feeds visited (feedsave.dat) and files downloaded (filesave.dat). To force revisiting a feed or redownloading a file, either or both of these files can be deleted prior to running the script.

Installation

Download: http://www.cantoni.org/files/blogenc.zip

This script was developed and tested on Windows XP with ActiveState Perl v5.6.1. It was also tested on FreeBSD with Perl v5.6.1 and should generally work on any platform.

This script requires wget which must be in the path as well as LAME whose path is specified in the configuration. It also requires these Perl modules:

  • XML::RSS
  • HTML::Entities
  • WebService::Bloglines
  • MP3-Tag
  • MP3-Info
  • Log-Log4perl

To install the script:

  1. Install Perl modules listed above
  2. Ensure wget is installed and in system path
  3. Extract blogenc.pl to a directory somewhere
  4. Edit blogenc.pl, changing values in the “configuration options” section; most likely the only change will be the Bloglines account info (username, password, folder)

The script could be run automatically at regular intervals; It uses Log::Log4perl which currently send output to STDERR. If you want to capture the output when running the script automatically, you would do something like this:
perl blogenc.pl 2>log.txt

Usage

After installation, you can run the script in a couple different modes.

Test mode will use a test Bloglines account and ensure that the functionality is working:
perl blogenc.pl test

Dryrun mode will use your Bloglines account and will list (but not download) all the new enclosures:
perl blogenc.pl dryrun

Normal mode will use your Bloglines account and will download normally:
perl blogenc.pl

Future

Some notes for future improvement:

  • Read feeds directly from site rather than Bloglines (which will return 304 if user reads items thru web browser)
  • Hook into media player and/or device to set up automatic transfer of audio files
  • Fix hack that finds the enclosure tags
  • For sample rate fix hack, do a better job with MP3 ID3 tages
  • Possible changes for any non-MP3 files (currently the script always assumes MP3)
  • Improve logging to improve ability to run automatically (e.g., detailed debug log written to file, info log written to STDOUT)
  • Support for gzip server content (if it makes sense)
  • Better tracking of feeds/items, better DB storage

Revision History

2004-12-20 v0.40 — Changes for this release include:

  • Fixed bug that prevented MP3 resampling from working correctly. Also added $resample config variable that controls whether or not to do the resampling step. (Those not using a Creative MP3 play can turn this off.)

2004-12-15 v0.30 — Changes for this release include:

  • I found that my Creative Zen Touch (20GB) MP3 player has trouble playing MP3 files where the sampling rate is less than 16 kHz. I added to the script an additional step that uses the external LAME MP3 encoder to re-sample such files to 16 kHz. This required an additional Perl module (MP3-Info) to read the sample rate. There is also a new configuration setting ($lamepath) that must point to the LAME encoder on your system. http://www.cantoni.org/2004/12/13/zenproblems
  • Cleaned up some of the log/debug messages.

2004-11-09 v0.20 — Minor bug fixes and improvements:

  • If a feed had 0 unread items, it would be skipped, but this could lead to missed items if the user had viewed the channel from the web. Now, the script relies on the ‘last seen’ timestamp and ignored the unread count.

  • Added a default 30-second timeout for wget call; (previous default was 900 seconds which is quite long).

2004-10-29 v0.10 — First release, supporting Bloglines interface, downloading with wget, and automatic M3U playlist generation.

Author

Written by Brian Cantoni (brian at cantoni dot org)

This script is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

Special thanks to Tatsuhiko Miyagawa for authoring the handy WebService::Bloglines Perl module: http://search.cpan.org/~miyagawa/

Some Perl tricks borrowed from Dave Slusher’s get_enclosures script: http://www.evilgeniuschronicles.org

Podcasts of Interest

I’ve been finding more and more podcasts out there and listening to as many as possible on my new Zen Touch. The problem — as others have pointed out — is not having enough hours in the day to listen to all of this. In Bloglines I subscribe to about 180 feeds, but I do a lot of skimming which lets me keep up. With audio, I’ve realized I need to be more selective on the feeds I follow.

With that in mind, here are podcasts that I’ve found consistently interesting:

  • Brainwagon is Mark VandeWettering talking about a variety of tech subjects and some insight from his work at Pixar Animation Studios.
  • GeekNewsCentral is Todd Cochrane‘s site covering all sorts of technical content. One of Todd’s earlier podcasts explained his move from the early BBS days to weblogs in part because of the people and the interesting ideas that are exchanged.
  • Treo Podcast is another technical podcast, but is recorded completely on a Treo 600. Lots of good Treo tips as you’d expect.
  • MWGblog is something non-tech for a change — Michael Geoghegan giving in-depth movie reviews. This one could be bad for the bank account; I’ve already put a few DVDs on my Amazon wishlist.

CityDesk Slideshows

Having previously shown how the S5 slideshow system can be integrated with MovableType, I thought CityDesk might also be a nice integration point. CityDesk makes it easy to edit web pages and has a pretty powerful template system that would allow creation of S5-formatted output files.

Result: after an hour or two of fiddling, I created a sample CityDesk project file that, when published, creates an S5 slideshow of the content inside.

CityDesk S5 Slideshows

Update: Now linked from CityDeskNews! If anyone reading this finds it useful or wants any help, just let me know.

S5 is a new standards-based slideshow system developed by Eric Meyer. Rather than using a proprietary tool like Powerpoint, this system uses standard XHTML files with Javascript for navigation control and CSS for styling. The result is a presentation viewable with just a recent browser.

To aid those who might be reluctant to edit XHTML files directly, I created a sample CityDesk project template. CityDesk is an easy-to-learn Windows application for creating and maintaining websites.

In this scheme, the CityDesk project file corresponds to the presentation. You create a CityDesk article for each page of the presentation. When published, the resulting index file contains the entire presentation. All of the supporting files are includes in this sample project and the theme can be easily changed as well.

Download: Slideshow.zip (91 KB)

CityDesk Slideshow Screenshot

S5 and MovableType

(Updated: Corrected link to the sample presentation.)

Anil Dash from Six Apart posted instructions for implementing Eric Meyer’s S5 slideshow system with MovableType templates. This is a great example of both the simplicity of the S5 system and the power of a template-driven website.

In a couple of minutes, I implemented the technique on Cantoni.org to see how well it works (see Cantoni.org Slideshow for the result). The longer posts and images overflow the page, but it’s a quick example that shows what is possible.

Sports/Racing Blogs

There seems to be a scarcity of good quality weblogs about racing, but I just found Full Throttle which looks good. (Their RSS feed is not linked from the main page, but with a little digging I found it: Full Throttle RSS feed.)

I also found Sports Blog which is kind of a meta-weblog, bringing in reports from a bunch of authors on different sports, including racing.

I’ve made my own feed (scraped) for Jayski’s Silly Season which is a really good NASCAR source. I need to propose that they put out an RSS feed as well.