Monthly Archives: March 2006

Amazon S3 Simple Storage Service

Amazon’s Simple Storage Service just launched:

Amazon S3 is storage for the Internet. It is designed to make web-scale computing easier for developers.

Amazon S3 provides a simple web services interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web. It gives any developer access to the same highly scalable, reliable, fast, inexpensive data storage infrastructure that Amazon uses to run its own global network of web sites. The service aims to maximize benefits of scale and to pass those benefits on to developers.

This looks pretty interesting — no startup or minimum monthly fees, just “pay as you go” for storage and bandwidth on a monthly basis. My initial thought is it would be good for hosting larger files like podcasts, screencasts, or video. I wonder if the implementation would also work for small, more granular data. Could you build a product or website completely using S3 as the backend database?

Playing with VideoEgg

I first heard about VideoEgg from the excellent Venture Voice podcast (shows 14, 15). At the time, they had an initial deal with TypePad blogs, but now they’ve opened it up for other publishing destinations as well.

Perhaps as a testament to the site’s design and interface, I was able to upload and publish a sample video in under a minute. (Click below to see the extended entry.)

Continue reading Playing with VideoEgg

Switched to AWStats

My hosting provider provides built-in support for two different web log analysis packages: Analog and Webalizer. I’ve tried both of them, but neither gave me very good control over the data, so I decided to switch to AWStats which seems to be fairly popular. At first the installation steps were kind of painful, seeming to focus on an installation where I had full control over Apache (which I don’t). I finally found a short write-up from another Pair user that didn’t need the AWStats install script.

Some background: My site has recently become bombarded with thousands of requests originating from China, some which might be from the Baidu search engine, although I’ve read of other bots that impersonate Baidu, so it’s hard to tell. A typical request will be fetching something like “http://www.yd136.com/click.asp?id=857” which is clearly not a page under cantoni.org.

The result: I knew my real data would be hard to find under all those bogus requests, but was amazed to learn that 90% of my page hits were bogus (for February 2006, my logs had a total of 1.1M entries, of which 100,000 were valid). I’m now rejecting by (sometimes large) IP blocks to reject the sources of these bogus hits. By setting the right criteria in AWStats, I’m able to filter out those bad requests to find the real data.

The scoop: So what did I learn from the month of February?

  • Publishing a screencast drives up the bandwidth in a hurry
  • All the major search engine spiders are visiting my site each day
  • Of links attributed to a search engine, about 90% are from Google, 5% from Yahoo, and the rest less than 1%
  • My PDA links page is still the most popular, drawing 5 times more visits than the next page
  • The most popular search term was “Chuck Norris Facts“!

Next step: Set up my hosting provider and AWStats to also track stats for some smaller sites I’m running. I also need to dig into AWStats some more to see what adjustments I can make, including adding support for mobile device user agent detection.

Stats! Stats! Stats!