Tonight I attended the SDForum Web Services SIG meeting whose topic was “Semantic XHTML — Can your website be your API?”. The presenters were Kevin Marks and Tantek Çelik from Technorati. Following are my rough notes from this interesting presentation.
Update 2004-10-05: Slides from this talk now posted on Tantek’s site
Semantic XHTML
Can your website be your API?
SDForum Web Services SIG, 2004-09-28
Some SDForum general topics: * Monthly Web Services Working Group will probably be formed in a couple months * Forming a new Web Client SIG, topics to inclue RSS, Atom, SOAP, REST, etc.; looking for a host * New PayPal Hacks book coming out
Background on Technorati
Tracking 4 million blogs now (was 3 million in June). About 4 million posts per week. New Politics site tracks and summarizes about 10,000 political blogs. Link analysis is the key attribute of their processing. For international, they use UTF-8 internally and can convert from the majority of encodings as needed. Not as much content searching yet for internationals, but not as critical yet because they rely on links rather than content.
Presentation
HTML started structured, became presentational during browser wars. Explosive growth because of error tolerance. Table abuse & font tagitis & spacer GIF layouts caused two backlashes:
- Backlash for structure — XML; draconian error checking, freedom to make own schemas, appeals to programmers
- Backlash for layout — CSS; move presentation away from structure, content independence, appeals to designers, http://www.csszengarden.com
Where does XML fail?
- schema explosion (everyone makes their own)
- tag/attribute battling
- abstraction ratholes - BTO ontology
- not human readable (partly by design)
- doesn’t work on “the Web” today
Where does CSS fail?
- folk coding (design rather than engineering community)
- variable implementations
- visual designers thinking about presentation ass structure
- structure hacks to fix presentation
Can we re-integrate these strands?
- XHTML is XML (XHTML = HTML made into XML)
- parseable, modular
- XHTML supports CSS
- everyone already has a viewer
- everyone can make queries
Example - Politics Site. Sample problem:
- wanted a chart of the top 3 links on a page
- dynamically generated using some complex app logic to choose the link title based on transient data
- solution: use the site output page as input, easily parsable to extract desired information
- this web page wasn’t originally designed with that in mind, but due to its structure was reusable
XHTML building blocks
- most applications reuse a lot of common concepts
- strings
- lists, correspond to program arrays (
<ol>and<ul>) - tables, can be used for 2D array
- links with ‘rel’ attribute explicitly defines relationship; is extensible and multivalued
- definition lists, key/value pairs or hashtables
- citations and quotes; cite a person or source by name, popular use in weblogs
Existing examples
- XFN - XHTML friends network; just add ‘rel’ to your blogroll links; define profile using a dictionary: http://gmpg.org/xfn/1
Future example
- attention.xml; what are you reading, how often are you reading them, etc. with goal of application that can help synchronize what you’re reading, help highlight things that you are interested in
- XSPF - play lists (XML shared playlist format)
New types - Methodology
- map existing data structures into XHTML equivalents
- enable new stylable building blocks
- readily exchange data as mapping is 1:1
New type - People
- RFC 2426 vCard <-> hCard
- create an XHTML representation of this
- embed within a webpage, share to and from the web
New type - Events
- RFC 2445 iCalendar <-> hCalendar
- describe events
- display them and enable parsing