Skip to content

Markup Proposal (Idle Words)#

05.22.2003

Markup Proposal

Part of the problem in indexing weblogs is finding them in the first place. Weblogs.com and sites like it are a start, but there are plenty of weblogs that don't announce their updates anywhere. The only way to find them is by crawling.

Once you've found a weblog, you still have a problem. It's not easy to find dates, link lists, or boundaries between weblog posts. There are a zillion different formats, and none of them are all that consistent. It would be nice to offer a per-post search engine, for example, but right now it's not feasible. Is that text a post, or a comment, or a TrackBack, or part of the template, or what, exactly?

A couple of days ago, I traded ideas around with Dave Sifry and Steve Nieker, and we came up with a proposal for blog tool writers. Four small changes that would make weblog pages much easier to identify and parse:

  1. An identifying tag in the HTML header:

  2. Delimiters around each post (with an optional GMT datestamp):


    ...

  3. A delimiter around the blogroll:


    ...

  4. Permalinks explicitly labeled:

Crufty, inelegant, and a pale shadow of what RSS offers, sure. But it's something that would make a majority of sites more visible to search engines.

Textpattern gets not only fulsome praise for being the first CMS to sign on, but additional style points for requiring me to replace the word 'blogroll' with something less linguistically odious.

What do you think, gentle reader?

Idle Words

brevity is for the weak

Your Host

Maciej Cegłowski
maciej @ ceglowski.com

Threat

Please ask permission before reprinting full-text posts or I will crush you.