All posts by Brian Cantoni

Converting HTML to Markdown using Pandoc

Markdown is a great plain text format for a lot of applications and is often used to convert to HTML (for example on my WordPress blog here). I recently had a case where I needed to convert from HTML and found that Pandoc makes it really easy.

What’s Pandoc

Pandoc is an open-source utility for converting between a number of common (and rare) document types, for example plain text, HTML, Markdown, MS Word, LaTeX, wiki, and so on. The output formats list is really extensive, and people can write their own “filters” to handle other formats as well, or to customize the existing ones to their exact needs.

My Use Case

My particular use case here is converting about a dozen really old blog posts from this website. I wrote these back in the early days when I managed this site in CityDesk and later migrated to MovableType. The Google Search Console alerted me to some crawler errors which turned out to be caused by raw PHP file content being served instead of real HTML.

My plan for cleaning this up: 1. Convert HTML original articles into Markdown format 2. Do some manual cleanup editing and double-check links are still valid 3. Drop the Markdown into the appropriate Posts within WordPress 4. Modify my existing .htaccess files to do permanent (301) redirects for any of the old URLs that search engines may still have

Examples

Simple HTML Example

With Pandoc installed, you can try a simple test pulling down the installation instructions page:

curl --silent https://pandoc.org/installing.html | pandoc --from html --to markdown_strict -o installing.md

If we take a look at an HTML snippet:

<h2 id="compiling-from-source">Compiling from source</h2>
<p>If for some reason a binary package is not available for your platform, or if you want to hack on pandoc or use a non-released version, you can install from source.</p>
<h3 id="getting-the-pandoc-source-code">Getting the pandoc source code</h3>
<p>Source tarballs can be found at <a href="https://hackage.haskell.org/package/pandoc" class="uri">https://hackage.haskell.org/package/pandoc</a>. For example, to fetch the source for version 1.17.0.3:</p>
<pre><code>wget https://hackage.haskell.org/package/pandoc-1.17.0.3/pandoc-1.17.0.3.tar.gz
tar xvzf pandoc-1.17.0.3.tar.gz
cd pandoc-1.17.0.3</code></pre>

We can see the resulting Markdown looks like this:

## Compiling from source

If for some reason a binary package is not available for your platform, or if you want to hack on pandoc or use a non-released version, you can install from source.

### Getting the pandoc source code

Source tarballs can be found at <a href="https://hackage.haskell.org/package/pandoc" class="uri">https://hackage.haskell.org/package/pandoc</a>. For example, to fetch the source for version 1.17.0.3:

    wget https://hackage.haskell.org/package/pandoc-1.17.0.3/pandoc-1.17.0.3.tar.gz
    tar xvzf pandoc-1.17.0.3.tar.gz
    cd pandoc-1.17.0.3

My Blog Post Conversions

For my dozen old HTML articles, the straight conversion ended up being a bit noisy, especially with the template boilerplate around the content which was no longer needed. To clean those up I used a little bit of Sed to clean it up before conversion:

#!/bin/bash
echo "converting $1"
cat $1 | sed '1,/<div class="asset-header">/d' | sed '/<div class="asset-footer">/,/<\/html>/d' | pandoc --wrap=none --from html --to markdown_strict > $1.md

After that, I just needed to do some minor editing cleanups on the Markdown files before bringing them in to WordPress. Success!

Further Reading

There are a few good online converters you can try; keep in mind some of these are limited in the number of characters they can handle:

To learn more and go deeper on Pandoc, they’ve got an excellent user’s guide.

And finally a big recommendation for Dillinger, a great online tool for editing Markdown text with live HTML rendering.

My Current Podcast Playlist

boy singing on microphone with pop filter

I like to periodically drop my podcast subscription list here for anyone interested, and so I can look back and see how my interests have changed :) (Search here for some previous updates.) Lately I’m mostly listening to software or startup podcasts, but have started following a lot of woodworking ones as well as I try to find time for my woodworking hobby!

Tech / Software

Hanselminutes – Fresh Talk and Tech for Developers [rss]

The Changelog [rss]

.NET Rocks! [rss]

Build Your SaaS – running a startup in 2019 [rss]

The Ars Technicast [rss]

Startups / Business

DataSnax Podcast [rss]

MegaMaker [rss]

Import This [rss]

The Tim Ferriss Show [rss]

The Smart Passive Income Online Business and Blogging Podcast [rss]

Woodworking / Makers

Making It With Jimmy Diresta, Bob Clagett and David Picciuto [rss]

The Modern Maker Podcast [rss]

Made for Profit [rss]

The Green Woodworker Podcast [rss]

If You Build IT Podcast [rss]

Measuring Up Podcast [rss]

The Make or Break Show [rss]

Forked Up: A Thug Kitchen Podcast [rss]

Sports

NASCAR on NBC podcast [rss]

Sports Media with Richard Deitsch [rss]

Photo by Jason Rosewell on Unsplash

Best Practices for Leading Online Meetings

Team Meeting Fist Bump

Online meetings continue to rise in popularity, in particular for companies with remote workers or distributed teams. The effectiveness of online meetings can be improved significantly by following a few simple techniques and habits.

First of all, what kind and size of meetings are we talking about?

  1. One-on-one (2 people)
  2. Smaller team meetings (2-10 people)
  3. Medium team meetings, internal training or demo (10-20)
  4. Larger team meetings, company “all hands” (20+)
  5. Public facing webinar (marketing, sales, training)

This guide is targeting categories 3 & 4 – these meetings are big enough that you want to run them effectively but are still internal and less formal compared to public webinars or training. These types of meetings are often recorded for those not able to attend, so creating a good quality recording is important.

The advice here should be general enough to apply to most online meeting software, even though the exact features may vary and the apps themselves are constantly changing. Over time I’ve used Skype, GoToMeeting, Google Hangouts, WebEx, and most recently BlueJeans. In some cases I’ll reference particular options in certain software which are helpful.

With the intro out of the way, let’s dive in to the list of best practices. What follows here is opinionated based on my own experiences and needs, so make sure to tailor the advice here to fit your own situations.

Preparing

  • If you’re using a presentation, make sure to have a shareable version of it. Google Docs is good for sharing; PowerPoint/Keynote can be shared as PDF for those who might not have those apps.
  • On the title slide make sure to include the presenter’s name and date (this helps puts the meeting in the proper context for anyone watching the recording later)
  • If applicable, post the slides and share the link ahead of the meeting
  • Configure your computer for effective visibility for meeting participants:
    • When showing a presentation, use slide show mode
    • In other apps, use full screen mode if available
    • Zoom in (increase font size) as needed, especially for anything involving code
    • Turn off 2nd monitor if needed (some older apps like WebEx had a real problem with this)
    • Quit or snooze any apps which may show notifications or reminders
  • Practice with your online meeting software if you’ve never presented before

Scheduling

  • Schedule a unique meeting in your system and include the pertinent details in the calendar invite:
    • Instructions for joining the meeting
    • Agenda
    • Where will chat/Q&A happen
    • Will recording be posted afterwards
  • Configure your meeting with settings to help minimize distractions:
    • Entry/exit tones: off (to avoid annoying beeps)
    • Mute on entry (not everyone will remember to automatically mute themselves)
    • If using a separate system like HipChat or Slack for chat and Q&A, disable the built-in chat
  • For bigger meetings with moderators/presenters in multiple locations, consider using a back-channel for coordinating hand-offs and so on. Using a mobile app like WhatsApp, GroupMe or SMS has the added benefit of being available even if some participants’ internet connections have problems.

Running

  • Have a separate moderator who is not presenting; this lets the presenter focus on their content while the moderator focuses on the meeting itself (mute/unmute, watching for questions, etc.).
  • Join the meeting from a second device like a tablet or phone and leave it on your desk. This makes it easier to confirm and monitor what the attendee view looks like. (Make sure to mute and silence the 2nd device to avoid audio feedback.)
  • Start and join the meeting 10 minutes early and arrange for all presenters to do the same; check all the controls and screensharing before everyone joins. (For first-time presenters, you could do a separate dry-run meeting earlier to ensure their software is working correctly.)
  • When joining the meeting, make sure all presenters are identifiable by their names (as opposed to something like “guest_1” or a dial-in phone number)
  • If your software has the option, turn off entry/exit tones and select mute on entry
  • At the start of the meeting, make announcements a couple times while waiting for people to join:
    • Where the chat or Q&A will be happening
    • Please mute yourself
    • We’ll be starting soon
    • This will be recorded and posted afterwards

Recording

Recordings are helpful for anythings that may have value later, especially internal product demos or training. They can also be useful for regular project/staff meetings for the benefit of people unable to attend.

  • If your meeting software has a built-in recording feature, use it. If not (or even in addition to) you can use a desktop application like Screenflow or Camtasia. (For higher-quality recordings, I always use an external recording application.) Make sure to record the presented video screen, the meeting audio, and your local audio device.
  • If the meeting is important (e.g. you have a guest speaker), have a second person also record from their computer as a backup.
  • Don’t start the recording until the presentation is about to start (i.e. don’t record your announcements mentioned above).
  • When you’re ready to start, hit record, wait a moment, then give a good introduction before passing off to the first presenter. That gives your recording a clean starting point with the subject mentioned right away (and avoids all the pre-meeting dead time).

Q&A

  • For questions and discussion during or after the presentation, encourage everyone to unmute and ask their question live; this helps with those watching the recording later.
  • For questions read from chat or other sources, make sure to read the questions out loug before answering (again, for the benefit of the recording).

After

  • Clean up and edit the video as needed (depending on how polished you need it). (I like to at least run through the whole video and edit out obvious dead time, coughing, and “ums”.)
  • Upload/post the video and slides
  • Send an email to everyone with links to both

Photo by rawpixel on Unsplash

Running LinkChecker on a Mac

LinkChecker is a utility written in Python for scanning and checking web page links, usually used for finding invalid or outdated pointers which need to be updated. The LinkChecker project is in a bit of flux right now because the original project (GitHub wummel/linkchecker) has gone completely quiet and presumably the original author is no longer interested in maintaining it. Luckily there is a new group of volunteers rallying around a new fork (GitHub linkcheck/linkchecker)

The project has a variety of packaged downloads, but they are not all updated yet from the newest source tree. On my Mac system I always had trouble making the old project work (usually getting an error like ImportError: No module named requests). Switching to the new LinkChecker source and using Virtualenv have solved my problems! These are my steps for making this work; it’s pretty straightforward if you have some experience with Python-based utilities.

Prerequisites

First Time Installation

The first step is to create a working directory for LinkChecker and set up the virtual Python environment:

mkdir ~/linkchecker
cd ~/linkchecker
virtualenv env
source env/bin/activate
python --version

Next we’ll clone the latest LinkChecker and install it in the virtual Python environment:

git clone https://github.com/linkcheck/linkchecker.git .
python setup.py sdist --manifest-only
python setup.py build
python setup.py install

Next, confirm that it’s installed and ready to run:

linkchecker
linkchecker --help

Finally, start using the tool and check some websites, for example:

linkchecker --timeout 5 --check-extern https://tweetfave.com/
linkchecker -r 1 --timeout 5 --check-extern http://www.cantoni.org/2017/07/27/podcast-update-feed-reader

Running LinkChecker

The above steps are just needed for the first time. After that, you just need to enter the Virtualenv first:

cd ~/linkchecker
source env/bin/activate
linkchecker --help

Tweetfave Passes 500K Tweets

My Tweetfave service has been running for just over 4 years now and just passed the 500K tweets mark! The usage has been pretty steady considering I haven’t done much to promote it. Over 270 users have tried the service, with about 100 still active. Luckily the service hasn’t required much maintenance over that time, just an occasional update to deal with webservice API changes or to fix minor bugs.

Here’s the growth chart covering the last 4 years:

Chart: Tweetfave Reaches 500K

If you use Twitter and use the “favorites” feature, give Tweetfave a try. The service will automatically email you (every couple of hours) all the tweets you liked.