Skip to content

Blog Categorization (Idle Words)#


Blog Categorization

I got invited into an interesting discussion today - the topic is blog categorization - given a set of weblogs ( or blog posts ), how to label them as belonging to a certain topic, and presumably make it possible to search by category, etc. etc.

Much of the discussion centers around taxonomies - how to create them, how to choose between them, how to make sure they don't discriminate.

I'm not sure I'm big on taxonomies. I like it better when data can decide on its own how it wants to group together. I like clustering, and statistical methods, and topicalization - using relationships detected within the data itself to generate the top-level categories. There's something fashionably jujitsu about it. Taxonomies make me think of Yahoo! and its infinite levels and sub-levels of folders, or else yelllowed card catalogs with categories and sub-sub-categories a foot long.

And now I just realized I'm likely to be in the last generation to remember card catalogs.

But enough of this posting about weblogs, and self-referential hooey. Let's talk about China: