Skip to content

Geek Factor 10 (Idle Words)#

06.10.2002

Geek Factor 10

Today I got my research account for the Internet Archive - five terabytes of data to play with, and tickle, and cover with soft kisses! If only it wouldn't tease me so... I spent a bewildering half hour navigating various /usr/local/home/libs and other deep directories, but no sign of anything so far. I guess I will have to wait for documentation.

Also wending its way to me by post is a Reuters article corpus, for further natural language processing shenanigans.

An interesting question is whether one can get large research corpora for other languages from the comforts of one's own Vermont manor. And to that question I intend to devote the remainder of the evening. That's the kind of crazy living I like to do when the boss isn't here.