Next time you tweet “yeah suuuuuuure” @ someone, what you don’t know is that an artificial brain somewhere is getting upgraded with those tweets.
Wait a second before you delete the Twitter app. This might sound like next-level internet stalking or material for an upcoming Black Mirror episode, but it isn’t — how we type out the way we talk could help AI understand us in the future. To the horror of spelling and grammar tyrants everywhere, University of Vermont researcher Tyler Gray and his team are using the “stretched words” often found on social media to develop AI that understands not only the definition, but a tone of voice that can’t be heard.
“Stretched words, as in the example above, sometimes called elongated words, are also an integral part of many languages, especially in spoken language,” Gray said in a study recently published in PLOS One. “However, rather than completely changing the meaning of the word, this stretching … is often used to modify the meaning of the base word in some way.”
Take all the “yeah suuuuuuure” tweets floating around in cyberspace. “Sure” is usually positive in written language, meaning “yes” or “without a doubt”. The exception could be a “‘Sure,' she sighed, rolling her eyes” in a novel or script. Comics have the added advantage of actually showing a sarcastic “sure” when it appears in a word bubble right next to a speaker rolling their eyes. However, when “sure” evolves into its stretched form “suuuuuuure” without that surrounding context, it is more likely to come from a place of sarcasm than anything else.
You won’t find stretched words in any dictionary. The thing is, they are still an important part of spoken language, however much they’d make your high school English teacher cringe. Besides giving away tone of voice such as sarcasm, they can also be used to up the emphasis in reactions like “that rocket is huuuuuuuge”, as well as showing emotion on an otherwise emotionless screen by upping the excitement in “yeeeeeessssssss” or the possible fear and disappointment in “noooooooooo”. Gray used a random sample of about 10 percent of all tweets (so something you said could possibly be in there), but did not share the individual messages (if they did get something you said, it won’t go viral).
Tweets were then broken down almost like molecules. Finding stretchable words was made easier by making all text lowercase and collecting what Gray calls tokens, which needed to have at least 30 characters and any single character consecutively on repeat at least 29 times, or two on repeat 28 times. These were shortened to the lowest common denominator of those sets of letters, the kernel, with brackets around repeating letters. For example, “hahahahahahahaha” would end up as the kernel “[h][a]”. He then excluded anything that wasn’t a letter. Stretched words that were related—as in different stretched versions of the same base words—sometimes ended up as different kernels if the letters were uneven. “yeeeeeessssss” would be processed as “y[e][s]” while “yyyyeeeeees” would be “[y][e]s”.
“It is known that natural language processing (NLP) can be hard with social media because of the nonstandard language that is often used,” Gray said. “Natural language processing software and toolkits could use the techniques we developed to help with processing stretched words. For example, stretched words could first be distilled to their kernels, and the base word could be extracted from that.”
With over 7,500 kernels, Gray and his team were able to use them to identify the stretched words they matched. They measured stretch words using characteristics they call balance and stretch. Balance is the evenness of letters in the stretched word, such as “hahahahahahahaha”, and overall stretch is the total amount of letters. Making AI understand altered meanings of stretched words is still a work in progress, never mind overcoming typos. But how cool would it be if the robot that made your pizza heard “Duuuuuude, there's no pepperoni!” and interpreted the stretched “dude” as “Oh no”, getting your order right without spitting on it?
It’s possible something you tweet will end up being dissected in Gray’s research, but just remember his team and others who may do further studies on how stretched words translate to typed form aren’t trying to creep on you.