Mark Kamlet, Carnegie Mellon University; Ashique KhudaBukhsh, Carnegie Mellon University, and Tom Mitchell, Carnegie Mellon University
It’s no secret that U.S. politics has become highly polarized.
Even so, there are probably few living Americans who ever witnessed anything that quite compares with this fall’s first presidential debate.
Was it really the case that the nation could do no better than a verbal food fight, with two candidates hurling fourth-grade insults and talking past each other?
To us, the discordant debate was just one more symptom of the nation’s fraying civic discourse, which, in a recent study, we were able to show extends to the words we use to talk about politics.
Earlier this year, we started constructing a data set that consists of all of the viewer comments on YouTube videos posted by four television networks – MSNBC, CNN, Fox News and One America News Network – that target slices of the political spectrum. Together, the data set contains over 85 million comments on over 200,000 videos from 6.5 million viewers since 2014.
We studied whether there are distinct variants of English written in the comments sections, akin to the distinction between British English and American English.
Using machine learning methods, we found these permutations do exist. Moreover, we can rank them in terms of the “left-ness” and the “right-ness.” To the best of our knowledge, this is the first empirical demonstration of quantifiable linguistic differences in news audiences.
Our second finding, however, was even more unexpected.
Our machine learning translation system found that words with vastly different meanings, like “KKK” and “BLM,” were used in the exact same contexts depending on the YouTube channel being analyzed.
The company a word keeps
When translating two different languages – say, Spanish and English – automated translation systems like Google Translate begin with a large training set of texts in both languages. The system then applies machine learning methods to become better at translating.
Over the years, this technology has become increasingly accurate, thanks to two key insights.
The first dates back to the 1950s, when linguist John Rupert Firth came up with the aphorism “You shall know a word by the company it keeps.”
To modern machine translation systems, the “company” a word keeps is its “context,” or the words surrounding it. For example, the English word “grape” occurs in contexts such as “grape juice” and “grape vine,” while the equivalent word in Spanish, uva, occurs in the same contexts – jugo de uva, vid de uva – in Spanish sentences.
The second important discovery came rather recently. A 2013 study found a way to identify – and thereby link – a word’s context in one language to its context in another. Modern machine translation depends heavily on this process.
What we have done is to use this type of translation in an entirely new way: to translate English to English.
When ‘Trumptards’ become ‘snowflakes’
That may sound bizarre. Why translate English to English?
Well, consider American English and British English. Many words are the same in both languages. Yet there can be subtle differences. For instance, “apartment” in American English may translate into “flat” in British English.
For the purposes of our study, we labeled the language used in each network’s comment section “MSNBC-English,” “CNN-English,” “Fox-English” and “OneAmerica-English.” After analyzing the comments, our translation algorithms uncovered two different patterns of “misaligned words” – terms that aren’t identical across the comment sections but are used in the same contexts.
One type was similar to “flat” and “apartment,” in the sense that both are describing ostensibly the same thing. However, the word pairs we uncovered have different intonations. For example, we found that what one community calls “Pelosi,” the other one calls “Pelousy”; and “Trump” in one news-language translates into “Drumpf” in another.
A second – and deeper – kind of misalignment occurred when the two words refer to two fundamentally different things.
For example, we found that in CNN-English, “KKK” – the abbreviation for the Ku Klux Klan – is translated by our algorithm to “BLM” – shorthand for Black Lives Matter – in Fox-English. The algorithm is basically finding that the comments made by one community about KKK are very much like the comments made by the other about BLM. While the belief systems of the KKK and BLM are about as different as can be, depending on the comment section, they seem to each represent something similarly ominous and threatening.
CNN-English and Fox-English are not the only two languages displaying these types of misalignments. The conservative end of the spectrum itself breaks into two languages. For example, “mask” in Fox-English translates to “muzzle” in OneAmerica-English, reflecting the differing attitudes across these subcommunities.
There seems to be a mirrorlike duality at play. “Conservatism” becomes “liberalism,” “red” is translated to “blue,” while “Cooper” is converted into “Hannity.”
There’s also no lack of what can only be called childish name-calling.
“Trumptards” in CNN-English translates to “snowflakes” in Fox-English; “Trumpty” in CNN-English translates to “Obummer” in Fox-English; and “republicunts” in CNN-English translates to “democraps” in Fox-English.
Linguists have long emphasized how effective communication among people with different beliefs requires common ground. Our findings show that the way we talk about political issues is becoming more divergent; depending on who’s writing, a common word can be imbued with an entirely different meaning.
[Deep knowledge, daily. Sign up for The Conversation’s newsletter.]
We wonder: How far are we from the point of no return when these linguistic differences begin to erode the common ground needed for productive communication?
Have echo chambers on social media exacerbated political polarization to the point where these linguistic misalignments have become ingrained in political discourse?
When will “democracy” in one language variant stop translating into “democracy” in the other?
Mark Kamlet, University Professor of Economics and Public Policy, Carnegie Mellon University; Ashique KhudaBukhsh, Project Scientist at the School of Computer Science, Carnegie Mellon University, and Tom Mitchell, Founders University Professor of Machine Learning, Carnegie Mellon University
This article is republished from The Conversation under a Creative Commons license. Read the original article.