Category Archives: Software

Over 1 Million Tweets Delivered by Tweetfave

My Tweetfave side project has just passed a major milestone – over 1 million Tweets delivered!

McDonalds Sign

Tweetfave is a simple service which sends emails with all tweets you’ve marked as favorites (or in today’s terminology “liked”). I launched this publicly over 6 years ago in May 2013 and it’s been running along quietly since then.

The chart below shows the cumulative total number of tweets delivered through the service. At the end of August 2019 we just passed the 1 million mark (on pace for about 1.1 million by the end of the year):

Tweetfave growth chart showing 1 million tweets after 6 years

It’s kind of impressive the amount of usage given the small number of users. Overall Tweetfave has had almost 300 people sign up and try it; currently 110 users are still active. The numbers are definitely skewed towards a few power users: the top 20 users account for almost 80% of the total usage. In fact, despite being “customer #1”, I’m only the 18th most active user on the system :)

So, what are some lessons learned along the way?

Building it was straightforward with PHP and libraries for everything. It wasn’t necessarily easy (in particular learning the Twitter API), but all the building blocks came together pretty smoothly. Everything I needed had PHP libraries, including the Twitter API, Mustache templates, composing/sending email, and the MySQL database. I wrote up more about all the software and services used back in 2013.

Left alone, it just kept working. For the most part the service just kept running along. There were a few outages along the way from Twitter and AWS S3. There were also a few updates needed to keep up with Twitter API changes, for example when support for longer tweets rolled out in 2016. One problem is there is no monitoring to speak of, but since I still use the system a lot I’ll eventually notice when things are broken.

Minimal marketing results in minimal usage. At I time I launched Tweetfave I pinged people I knew who might like it and got a few takers. I would also tweet about it occasionally and there were a few referrals from other users. For the most part I think people stumble across and try it out and some stick around. I imagine a real focus on marketing could drive more users; maybe some day in the future :)

Original feature set was pretty complete. If I was more bold I might call the original launch my Minimal Viable Product which in some sense it was. I’ve really only made a few more feature tweaks over time, the biggest of which was adding RSS feeds in 2015.

Email delivery can be tricky. One area I didn’t have much experience with was deliverying so much email. For the majority of time I used SendGrid who were great. (I used their tech support multiple times which is very impressive considering I was always at the free level.) Recently I switched to mailgun because I continued to have delivery problems to Hotmail/Outlook account; those are all working now. I’m still small enough to be on the free tier; it’s running about 3000 emails/month recently, with about 32% open rate which seems pretty good. Also, switching email providers when your interface is just SMTP is very easy.

Infrastructure needs some updates. The service is running on an unsupported and very old PHP (5.5.9) and unsupported Ubuntu (14.04). I also need to move the MySQL data from my shared web host account over to Digital Ocean (the service and website are already there). Once MySQL is updated, I need to revisit some database issues when tweets contain Unicode characters. There is not a lot of test code which makes me nervous to make big changes, so perhaps as part of this I’ll finally add some better tests.

Still useful as a bookmarking/read it later service. I originally created Tweetfave to help fit my model of reading Twitter and it’s still effective for me today. Like many “read it later” type services, I suffer from not always going back and reading everything but it’s very handy to have them all there in my inbox. Email is powerful for these types of services and I’ve got a few more related ideas that I’d like to build.

If you made it this far, give Tweetfave a try and let me know what you think (brian AT cantoni.org)!

Converting HTML to Markdown using Pandoc

Markdown is a great plain text format for a lot of applications and is often used to convert to HTML (for example on my WordPress blog here). There are also some good use cases for the opposite: converting from HTML into Markdown. I recently had such a case to convert some older blog posts from raw HTML into Markdown found that Pandoc made it really easy.

What’s Pandoc

Pandoc is an open-source utility for converting between a number of common (and rare) document types, for example plain text, HTML, Markdown, MS Word, LaTeX, wiki, and so on. The output formats list is really extensive, and people can write their own “filters” to handle other formats as well, or to customize the existing ones to their exact needs. The project tagline sums it up nicely:

If you need to convert files from one markup format into another, pandoc is your swiss-army knife.

Screenshot of Pandoc website showing all the supported file formats
The Pandoc website lists all of the support file types it can convert between

My Use Case

My particular use case was to convert about a dozen really old blog posts from this website. I wrote these back in the early days when I managed this site in CityDesk and later migrated to MovableType. The Google Search Console alerted me to some crawler errors which turned out to be caused by raw PHP file content being served instead of real HTML.

My approach for cleaning this up was as follows:

  1. Convert HTML original articles into Markdown format
  2. Do some manual cleanup editing and double-check links are still valid
  3. Drop the Markdown into the appropriate Posts within WordPress
  4. Modify my existing .htaccess files to do permanent (301) redirects for all of the old URLs

Examples

Simple HTML Example

With Pandoc installed, you can try a simple test pulling down the installation instructions page:

curl --silent https://pandoc.org/installing.html | pandoc --from html --to markdown_strict -o installing.md

To see the result, consider this HTML snippet from installing.html:

<h2 id="compiling-from-source">Compiling from source</h2>
<p>If for some reason a binary package is not available for your platform, or if you want to hack on pandoc or use a non-released version, you can install from source.</p>
<h3 id="getting-the-pandoc-source-code">Getting the pandoc source code</h3>
<p>Source tarballs can be found at <a href="https://hackage.haskell.org/package/pandoc" class="uri">https://hackage.haskell.org/package/pandoc</a>. For example, to fetch the source for version 1.17.0.3:</p>
<pre><code>wget https://hackage.haskell.org/package/pandoc-1.17.0.3/pandoc-1.17.0.3.tar.gz
tar xvzf pandoc-1.17.0.3.tar.gz
cd pandoc-1.17.0.3</code></pre>

We can see the resulting Markdown turned out very well:

## Compiling from source

If for some reason a binary package is not available for your platform, or if you want to hack on pandoc or use a non-released version, you can install from source.

### Getting the pandoc source code

Source tarballs can be found at <a href="https://hackage.haskell.org/package/pandoc" class="uri">https://hackage.haskell.org/package/pandoc</a>. For example, to fetch the source for version 1.17.0.3:

    wget https://hackage.haskell.org/package/pandoc-1.17.0.3/pandoc-1.17.0.3.tar.gz
    tar xvzf pandoc-1.17.0.3.tar.gz
    cd pandoc-1.17.0.3

My Blog Post Conversions

For my dozen old HTML articles, the straight conversion ended up being a bit noisy, especially with the some old CMS template boilerplate around the content which was no longer needed. To clean those up I used a little bit of Sed to clean it up before conversion:

#!/bin/bash
echo "converting $1"
cat $1 | sed '1,/<div class="asset-header">/d' | sed '/<div class="asset-footer">/,/<\/html>/d' | pandoc --wrap=none --from html --to markdown_strict > $1.md

(The above Sed commands clean up the HTML source in two passes: first removing everything from top of file to <div class="asset-header">, which is where the blog post started; and then removing all from <div class="asset-footer"> to the end of file.)

After that, I just needed to do some minor editing cleanups on the Markdown files before bringing them in to WordPress. Success!

Further Reading

There are a few good online converters you can try; keep in mind some of these are limited in the number of characters they can handle:

To learn more and go deeper on Pandoc, I recommend going through their excellent user’s guide.

And finally a big recommendation for Dillinger, a great online tool for editing Markdown text with live HTML rendering. I use that for writing these blog articles as well, before moving them in to WordPress.

DataStax Installer with Vagrant

I’ve continued to make improvements to my “Cassandra on Vagrant” project (Using Vagrant for Local Cassandra Development) which shows how to install open-source Cassandra or DataStax Enterprise in a variety of different ways. Using Vagrant is very helpful for local development and testing. Virtual images can be created very quickly and can be erased when done, keeping your primary development system clean.

Recently I added an example which uses the DataStax Enterprise (DSE) standalone installer which first appeared in DSE 4.5. The standalone installer normally runs in a graphical UI mode, but can also be run in an unattended mode which I’m using here.

To play with the examples, grab a copy of the Vagrant projects from GitHub: bcantoni/vagrant-cassandra. Once you have Vagrant and VirtualBox set up, check out example 5. DSE Installer and go through the setup.

On my Mac laptop, creating a 3-node DSE cluster takes less than 5 minutes. (The speed is greatly improved because we only need to download the installer once.) The installer has several options for running in unattended mode, so the installation can be customized as needed.

See the code and more details at bcantoni/vagrant-cassandra.

Tech Advent Calendars – 2014

Update: For the latest, check out Tech Advent Calendars – 2016

It’s that time of the year again – Advent calendars for many tech communities. As in past years (2011, 2012, 2013), I’ve gathered a few here that should be interesting:
* Perf Planet Advent (performance) – Feed
* 24ways Advent (web design/development) – Feed
* Perl Advent (Perl language) – Feed
* Java Advent (Java language) – Feed
* UXMas (UX for everyone) – Feed
* SysAdvent (Sysadmin) – Feed I have a combined RSS feed (created with Yahoo! Pipes) that picks up all of these advent calendars:

http://feeds.feedburner.com/TechAdventCalendars. (Yahoo Pipe source).

Quick Guide to Vagrant on Amazon EC2

Here’s a really quick guide to using Vagrant to create virtual machines on Amazon Web Services EC2. I’ve gotten a lot of use out of Vagrant for local development, but sometimes it’s helpful to build out VMs in the cloud. (In particular, if your local machine isn’t very powerful.)

These steps assume you already have Vagrant installed and have an Amazon Web Services account (and know how to use both).

Installation

First you’ll need to install the Vagrant AWS plugin:

vagrant plugin install vagrant-aws
vagrant box add dummy https://github.com/mitchellh/vagrant-aws/raw/master/dummy.box

Next login to your Amazon AWS console to get a few things:

  • AWS access key
  • AWS secret key
  • SSH keypair name
  • SSH private key file (.pem extension)
  • Make sure the default security group enables SSH (port 22) access from anywhere

I like to set these up as environment variables to keep them out of the Vagrantfile. On Mac or Linux systems you can add this to your ~.profile file:

export AWS_KEY='your-key'
export AWS_SECRET='your-secret'
export AWS_KEYNAME='your-keyname'
export AWS_KEYPATH='your-keypath'

Vagrantfile

Now we can configure our Vagrantfile with the specifics needed for AWS. Refer to the vagrant-aws documentation to understand all the options. In the example below we have all the AWS-related settings in the x.vm.provider :aws block:

VAGRANTFILE_API_VERSION = "2"
Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
  config.vm.define :delta do |x|
    x.vm.box = "hashicorp/precise64"
    x.vm.hostname = "delta"

    x.vm.provider :virtualbox do |v|
      v.name = "delta"
    end

    x.vm.provider :aws do |aws, override|
      aws.access_key_id = ENV['AWS_KEY']
      aws.secret_access_key = ENV['AWS_SECRET']
      aws.keypair_name = ENV['AWS_KEYNAME']
      aws.ami = "ami-a7fdfee2"
      aws.region = "us-west-1"
      aws.instance_type = "m3.medium"

      override.vm.box = "dummy"
      override.ssh.username = "ubuntu"
      override.ssh.private_key_path = ENV['AWS_KEYPATH']
    end
  end
end

See this Github gist for a longer example file.

Now you can bring up the VM by specifying the AWS plugin as the provider:

vagrant up --provider=aws

After about a minute, the VM should be up and running and available for SSH:

$ vagrant up --provider=aws
Bringing machine 'delta' up with 'aws' provider...
==> delta: Launching an instance with the following settings...
==> delta:  -- Type: m3.medium
==> delta:  -- AMI: ami-a7fdfee2
==> delta:  -- Region: us-west-1
==> delta:  -- Keypair: briancantoni
==> delta:  -- Block Device Mapping: []
==> delta:  -- Terminate On Shutdown: false
==> delta:  -- Monitoring: false
==> delta:  -- EBS optimized: false
==> delta:  -- Assigning a public IP address in a VPC: false
==> delta: Waiting for instance to become "ready"...
==> delta: Waiting for SSH to become available...
==> delta: Machine is booted and ready for use!
==> delta: Rsyncing folder: /Users/briancantoni/dev/vagrant/aws/ => /vagrant

$ vagrant ssh
Welcome to Ubuntu 14.04 LTS (GNU/Linux 3.13.0-29-generic x86_64)

ubuntu@ip-172-31-30-167:~$

Notes

  • You need to configure a specific AMI for Vagrant to use. I find the Ubuntu Amazon EC2 AMI Finder very helpful to match the version and region I wanted to use.
  • A common tripping point is the default security group not allowing SSH (port 22) from any IP address. Also make sure to add any other ports depending on your application (e.g., port 80 for HTTP).
  • Once you have the basics working, make sure to read through the vagrant-aws project to understand all the options available.
  • Make sure to vagrant destroy your VMs when done, and check the AWS Console to make sure they were terminated correctly (to avoid unexpected charges).