Skip to content

assign (Idle Words)#

05.06.2003

assign

An exciting assignment at work* - the boss says to me "go forth unto the Internet, and find me every weblog you can get your hands on!". It seems we need a large, live collection to prove our search algorithms on. Not exactly a Mt. Everest of data, but something more than the little molehills of documents we've conquered so far. So I have dutifully started crawling the Web, as well as asking for contributions from the many other people who already maintain extensive lists, to try to get an authoritative collection together. One of the immediate goals of the project is to gather reliable, quantitative data on weblogs, both for our own work and for the benefit of others. It seems wasteful to make everyone interested in doing research on social networks and other oddities start a crawl from scratch, so we intend to maintain a large blog database that will be accessible to anyone who wants to do a research project. At the very least it will spare people having to download half the Web over a DSL line. If you can spare the time, pay a visit to the crawl stats page and submit your URL to make sure it's included in our list. That means you, Kottke! The page updates every five minutes with the latest figures from our crawl, as well as some gratuitous and completely unscientific statistics on CMS market share that are bound to get me in some kind of trouble. And if you are one of the Brahmins who already has a large list of blog URLs on hand, consider giving the gift of data! * I work for an entity called NITLE, a non-profit cabal of liberal arts colleges.