Why say "That Whoopie Pie was delicious!" rather than "The Whoopie Pie, which consists of a mass of frosting sandwiched between two moderately sized cakes, tasted good"? Aside from the obvious reason of not wanting to bore everyone, the first sentence is more efficient and conveys the general idea in a much shorter time. Using the word "delicious," for example, tells more about the tasty experience than "good." Steve Piantadosi, an MIT graduate student in the department of Brain and Cognitive Sciences, recently researched how the length of a given word relates to its meaning.
Some cognitive scientists believe that language is merely a biological happenstance not intended for communication and, in listening to bits of the mile-a-minute, irrelevant chatter within certain circles, I can see their point. But on the other hand, Piantadosi believes that language is specifically designed to serve a communicative purpose. Working with fellow MIT linguists Ted Gibson and Hal Tily, Piantadosi investigated how efficiency and language interact.
"If you're trying to be efficient in communicating," said Piantadosi, "then you should try to keep the amount of information you convey per unit time roughly constant."
The MIT team drew inspiration from the work of renowned Harvard linguist George Zipf. In the early 1900s, Zipf postulated that redundancies in speech should be reduced in order to squeeze more information into a shorter period of time. Working off of this, Piantadosi's team had to figure out a way of measuring which words stored the most information.
"The information content of a word depends on what we're talking about, what words mean, what other words they could have said and ambiguity," stated Piantadosi, listing only a few examples. In short, to measure information content, approximating provided the best and easiest means of calculation. And here enters the power of Google.
Every so often, Google releases a giant mass of text consisting of ten to the twelfth power words pulled from all across the Internet. This is no small chicken. Moreover, rather than including entire bodies of text, the Google data lists sequences of about five words at a time. One can only imagine the endless strings of "a's," "the's," and garbled nonsense that repeat themselves endlessly across the pages. The information content of a word can be measured by deciding how predictable it is in context, but to delve into the relevant Google data, Piantadosi had to first weed out all the excess.
In this instance, it turns out that movie subtitles are actually good for something. Subtitles contain a very structured word choice and are also accessible in a range of languages. Piantadosi decided to look into 11 different languages including English, French, German, Czech, and Romanian. Piantadosi returned to the Google texts only after determining which words were used most frequently in the movies. Siphoning down the mass of web document to display only the phrases containing the frequent words from the subtitles, the predictability of each word, and thereby its degree of information content, was established using Information Theory and an N-gram computation model.
Information Theory states that the information content of a word can be found by taking the negative log of probability. In compliance with this theory, Piantadosi applied this to the text and then measured predictability with an N-gram model, which predicts the next term in a sequence.
To demonstrate the N-gram model, Piantadosi drew up a series of bar graphs onto his computer screen. Each language was tested using three models: 2-gram, 3-gram and 4-gram. Simply put, a 2-gram model looks at the specified word plus one word before it. The 3-gram includes two words before the specified word and so on. The 3-gram model ended up being the best indicator of information content since it gives a decent amount of preceding information but the phrase is not yet long enough to tangle the probability calculations. For every language but Polish, the trend came out as predicted with the bar showing information content to be higher than frequency.
Piantadosi's findings allow him to conclude that a word's frequent usage does not necessarily make it more significant. Additionally, using shorter, more frequent words, which levels out spikes in the flow of information, may make it easier for the listener to understand the broad ideas.
Piantadosi cited Florian Jaeger, a cognitive scientist at University of Rochester, to explain how the elements of surprise and suspense manifest in language. "If you're going to say something surprising, you had better spread it out," summarized Piantadosi. Incredibly predictable phrases are reduced in these situations.
Jaeger's work also described the idea of optional syntactic elements, such as the word "that." In most cases, "that" is completely unnecessary and English professors set on brevity will most likely condemn their students for using it in an indecent manner. And yet "that" is rampant in the English language. Jaeger found it possible to predict whether someone will use "that" based on the information content of what they're saying.
"People insert ‘that' when they're saying something that is really high information content, giving the speaker more time to spread out the information and not overload the language processing system," explained Piantadosi. Though the word itself has little meaning attached to it, its contribution to phrase length adds to the overall contextual significance.
Currently Piantadosi is beginning to look into ambiguity and how context serves to separate definitions of words that may be spelled and sound the same way—for example, a "run in baseball" and a "run in a pair of tights." Words like "run" are used frequently because of their brevity and ease, which contribute to efficiency. "If language wasn't like that, we would talk more like legalese," noted Piantadosi.
For highly intellectual conversations with esteemed Nobel Prize winners, showing off those ponderous words jam-packed with information might aid in bringing about a propitious connection. But, as always, simplicity has its advantages.





is a member of the 



Be the first to comment on this article!