Bloglines Enclosure Download

I just published my “Bloglines enclosure downloader” script. This started as an experiment with the Bloglines service APIs. As other applications (in particular, Doppler) continued to advance, I put this on hold. Doppler had promised support for Bloglines support in version 1, so I waited. Once the final Doppler came out, I realized it wasn’t quite the way I wanted.

So, I did some more hacking on my little Perl script to create a 340-liner that works for me and might work for others.

This script has the key features I’m looking for, in particular the automatic association with a designated folder in the Bloglines service. You don’t need to load up my script with feed URLs, just point it to a single Bloglines folder and it’ll do the rest.

The script will automatically download audio files, tag them if needed, and create an M3U-format playlist. It doesn’t yet set up the files for automatic transfer to a media player, but that can certainly be added in.

If anyone finds this useful or has ideas for changes, please let me know!

Bloglines Enclosure Download Script

blogenc.pl — a Perl script for automatically downloading enclosures from RSS feeds tracked by Bloglines.com

Description

This script started as an experiment with the Bloglines service APIs with the addition of rudimentary download capabilities. As other applications (like iPodder and Doppler) continued to advance, I put this on hold but returned to it when I realized the other applications didn’t do it exactly the way I wanted.

The result? A 400-line Perl script that works for me and might work for others. Maybe it will be the starting point for others who don’t mind a little Perl hacking to create a solution that works for them.

The key features that describe this script are:

  • Uses Bloglines published interface to automatically find your list of feeds. To help manage the feeds, the script will only work with a designated ‘audio’ folder.
  • Uses Bloglines published interface to download the feeds themselves, using the provided time-stamping to only pull new items.
  • Once enclosure items are identified, uses wget to actually do the downloading.
  • For downloaded MP3s, set the ID3 Genre field to a specified value (I use ‘Speech’ but this can be changed).
  • For downloaded MP3s whose sample rate is less than 16 kHz, resample it to 16 kHz using LAME http://www.mp3dev.org/. (This is controlled by $resample in the Configuration Options section of the script. Set it to 0 if you don’t need the conversion.)
  • After all files downloaded, create a playlist in M3U format suitable for transfer to a portable device or desktop media player.

The script uses two local data files to keep track of feeds visited (feedsave.dat) and files downloaded (filesave.dat). To force revisiting a feed or redownloading a file, either or both of these files can be deleted prior to running the script.

Installation

Download: http://www.cantoni.org/files/blogenc.zip

This script was developed and tested on Windows XP with ActiveState Perl v5.6.1. It was also tested on FreeBSD with Perl v5.6.1 and should generally work on any platform.

This script requires wget which must be in the path as well as LAME whose path is specified in the configuration. It also requires these Perl modules:

  • XML::RSS
  • HTML::Entities
  • WebService::Bloglines
  • MP3-Tag
  • MP3-Info
  • Log-Log4perl

To install the script:

  1. Install Perl modules listed above
  2. Ensure wget is installed and in system path
  3. Extract blogenc.pl to a directory somewhere
  4. Edit blogenc.pl, changing values in the “configuration options” section; most likely the only change will be the Bloglines account info (username, password, folder)

The script could be run automatically at regular intervals; It uses Log::Log4perl which currently send output to STDERR. If you want to capture the output when running the script automatically, you would do something like this:
perl blogenc.pl 2>log.txt

Usage

After installation, you can run the script in a couple different modes.

Test mode will use a test Bloglines account and ensure that the functionality is working:
perl blogenc.pl test

Dryrun mode will use your Bloglines account and will list (but not download) all the new enclosures:
perl blogenc.pl dryrun

Normal mode will use your Bloglines account and will download normally:
perl blogenc.pl

Future

Some notes for future improvement:

  • Read feeds directly from site rather than Bloglines (which will return 304 if user reads items thru web browser)
  • Hook into media player and/or device to set up automatic transfer of audio files
  • Fix hack that finds the enclosure tags
  • For sample rate fix hack, do a better job with MP3 ID3 tages
  • Possible changes for any non-MP3 files (currently the script always assumes MP3)
  • Improve logging to improve ability to run automatically (e.g., detailed debug log written to file, info log written to STDOUT)
  • Support for gzip server content (if it makes sense)
  • Better tracking of feeds/items, better DB storage

Revision History

2004-12-20 v0.40 — Changes for this release include:

  • Fixed bug that prevented MP3 resampling from working correctly. Also added $resample config variable that controls whether or not to do the resampling step. (Those not using a Creative MP3 play can turn this off.)

2004-12-15 v0.30 — Changes for this release include:

  • I found that my Creative Zen Touch (20GB) MP3 player has trouble playing MP3 files where the sampling rate is less than 16 kHz. I added to the script an additional step that uses the external LAME MP3 encoder to re-sample such files to 16 kHz. This required an additional Perl module (MP3-Info) to read the sample rate. There is also a new configuration setting ($lamepath) that must point to the LAME encoder on your system. http://www.cantoni.org/2004/12/13/zenproblems
  • Cleaned up some of the log/debug messages.

2004-11-09 v0.20 — Minor bug fixes and improvements:

  • If a feed had 0 unread items, it would be skipped, but this could lead to missed items if the user had viewed the channel from the web. Now, the script relies on the ‘last seen’ timestamp and ignored the unread count.

  • Added a default 30-second timeout for wget call; (previous default was 900 seconds which is quite long).

2004-10-29 v0.10 — First release, supporting Bloglines interface, downloading with wget, and automatic M3U playlist generation.

Author

Written by Brian Cantoni (brian at cantoni dot org)

This script is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

Special thanks to Tatsuhiko Miyagawa for authoring the handy WebService::Bloglines Perl module: http://search.cpan.org/~miyagawa/

Some Perl tricks borrowed from Dave Slusher’s get_enclosures script: http://www.evilgeniuschronicles.org