Skip to content

Language Barriers in Blogging (Idle Words)#

05.23.2003

Language Barriers in Blogging

For a while now, I've been interested in how language barriers affect our ability to communicate online. With some real blog census data now coming in (and with the better half gone to her sister's graduation, and so unable to keep me from wasting a perfectly good Saturday) I spent today trying to measure how high those barriers are.

As I write this, the database has 380,000 entries and is pretty evenly split between weblogs in English and weblogs in other languages. If language barriers meant nothing, and bloggers could read material no matter what language it was written in, you would expect the average link to have about a 54/46 chance of hitting an English versus a non-English weblog.

Of course, language does matter, so links tend not to cross language boundaries. If you look at all the outgoing links from English language blogs, only about 1.75% point to a non-English weblog. In the reverse direction, however, the figure is much higher. A full 7% of links from non-English-language weblogs point to an English site.

This means that non-English speakers, on average, link in to our community at four times the rate at which we link into the rest of the world. This is a kind of one-way mirror effect: because English dominates the Internet, we are less likely to to see anything outside our own community, while non-speakers will still be exposed to a lot of what goes on here. In the global conversation, we're the ones standing at the microphone.

That figure of 4% is for the aggregate of all links coming from non-English languages. The effect for any individual language community will necessarily be more pronounced, especially if the community is a small one.

Take Iceland, for example. The Icelanders are avid bloggers, with about 3500 weblogs (out of an online population of about 160,000). In any given Icelandic weblog, 12% of the links will point to a site written in English. So even those Icelandic readers who don't speak any English are fairly likely to come into contact with ideas that cross over from the English-language Internet.

But in the other direction, my own chance as an English speaker of coming across a link to an Icelandic site is a whopping 0.02%. In fact, I've found fewer than 300 such links to Icelandic sites across the entire data set. Unless I happen to read Kristiv's Weird Existence or Digital Dreaming, there's no way I'll ever hear about anything cool happening among Icelandic bloggers.

"Big deal, Mr. Multicultural", you'll think to yourself, " of course you won't find links to Icelandic blogs. Iceland is a tiny country, and we've got them outnumbered". But the imbalance in links is far greater than relative numbers would suggest. Once again, if you assumed that links were completely independent of language, you would expect about 54% of all Icelandic links to point to English sites, and 0.9% of English links to point to Icelandic ones. Predictably enough, both languages have fewer links to each other because of the language barrier, but to a very different degree. Icelandic blogs underlink to English ones by a factor of about 4.5 (54% predicted,12% actual). But English blogs underlink Icelandic ones by a factor of 80. Just the fact that they're writing in Icelandic makes these 3,000 bloggers eighty times less visible to us than an equivalent group of English-language bloggers would be.

I propose we call this 'underlinking' coefficient the "Bennett Factor", in honor of the great thinker who said "Our common language is English. And our common task is to ensure that our non-English-speaking children learn this common language." Our Bennett Factors to other languages remain astronomical. We continue to keep ourselves isolated from world opinion, which is particularly troubling at a time when our country's politics are becoming more exceptionalist and unilateral.

Of course, having a lingua franca is a blessing. It makes it possible to communicate in a common forum, and it's vastly better than having the kind of language soup you find at EU headquarters or the UN. But for us English speakers it's a mixed blessing, because it tempts us to get all solipsistic and insular.

The problem isn't that everyone else is learning English. It's that Americans, as a rule, do not bother to learn foreign languages. In 1998, across the entire United States, there were only 841 students studying Hindustani, a language with half a billion speakers. A grand total of 5055 were studying Arabic.

You'd think that in an age of empire, there would be a strong incentive for us to pick up the lingo. But of course you would be wrong:

What [the WMD search team] could not do was ask a question, should they find someone there. Yet they were supposed to ask questions under the guidelines for surveying a suspected secret police site such as this. One suggested query is, "Was there a lot of noise, such as people screaming?" Others ask about covered buses and unusual activity at night.

Anderson, the only team member learning Arabic, still does not have the ability to ask those questions. He has taught himself five phrases so far: "Good morning," "Good evening," "Drop your weapon," "That's dangerous," and "Keep away."

As Team 3 worked, it became evident more than once that even a passive reading knowledge would help.

On its way through one darkened corridor, the team reached an especially recalcitrant door. Sgt. Ivan Westrick, the team's explosive ordnance technician, swung the sledgehammer in a powerful arc that struck sparks with every blow, like flint on steel. A reporter later translated a snapshot of a sign across that door. It said, "No Smoking."

A longer announcement, in bold red and blue strokes, attracted the team's attention. The sign had been positioned in such a way that Saddam Hussein, gazing sternly off the canvas of a youthful portrait, appeared to be reading it. Anderson wondered briefly what it might say.

Had anyone known the answer then, the chamber of vacuum cleaners in the next corridor would have come as no surprise. Neither would the contents of the other sealed rooms: air conditioners, rolls of fabric, marble facing stones.

"Honorable Brother and Packer," the sign began. "Packaged goods cannot be returned after leaving the depot." The sign welcomed suggestions, apologized for delays, and thanked patrons for their cooperation. It concluded with a two-word signature: "STORAGE ADMINISTRATION."

We can't change the Internet overnight just by worrying about language. But we should at least recognize the magnitude of the problem. BlogTalk in Vienna is a good first step. My proposal to Tim O'Reilly: Hold the 2004 Emerging Technology conference in Brazil.