5 min read

With a little help from machine learning, you might know what the people on the other end of a Hangouts session are really looking at on their screens. Based on research published at the CRYPTO 2018 Conference in Santa Barbara last week your webcam could give details on what’s on your screen, if the person on the other end is listening the right way. All you’ll need to do is process the audio picked up by their microphones.

Daniel Genkin of the University of Michigan, Mihir Pattani of the University of Pennsylvania, Roei Schuster of Cornell Tech and Tel Aviv University, and Eran Tromer of Tel Aviv University and Columbia University investigated a potential new avenue of remote surveillance dubbed as “Synesthesia”. It is a side-channel attack that can reveal the contents of a remote screen, providing access to potentially sensitive information based solely on “content-dependent acoustic leakage from LCD screens.”

Anyone who remembers working with cathode ray tube monitors is familiar with the phenomenon of coil whine. Even though LCD screens consume a lot less power than the old cathode ray tube (CRT), they still generate the same sort of noise, though in a totally different frequency range.

Because of the way computer screens render a display—sending signals to each pixel of each line with varying intensity levels for each sub-pixel—the power sent to each pixel fluctuates as the monitor goes through its refresh scans. Variations in the intensity of each pixel create fluctuations in the sound created by the screen’s power supply, leaking information about the image being refreshed—information that can be processed with machine learning algorithms to extract details about what’s being displayed.

That audio could be captured and recorded in a number of ways, as demonstrated by the researchers:

  • Over a device’s embedded microphone or an attached webcam microphone during a Skype, Google Hangouts, or other streaming audio chat
  • Through recordings from a nearby device, such as a Google Home or Amazon Echo
  • Over a nearby smartphone; or with a parabolic microphone from distances up to 10 meters
  • Even a reasonably cheap microphone could pick up and record the audio from a display, even though it is just on the edge of human hearing

And it turns out that audio can be exploited with a little bit of machine learning black magic. The researchers began by attempting to recognize simple, repetitive patterns. They created a simple program that displays patterns of alternating horizontal black and white stripes of equal thickness (in pixels), which shall be referred to as Zebras, the researchers recounted in their paper. These “zebras” each had a different period, measured by the distance in pixels between black stripes. As the program ran, the team recorded the sound emitted by a Soyo DYLM2086 monitor. With each different period of stripes, the frequency of the ultrasonic noise shifted in a predictable manner.

The variations in the audio only really provide reliable data about the average intensity of a particular line of pixels, so it can’t directly reveal the content of a screen. However, by applying supervised machine learning in three different types of attacks, the researchers demonstrated that it was possible to extract a surprising amount of information about what was on the remote screen.

After training, a neural-network-generated classifier was able to reliably identify which of the Alexa top 10 websites was being displayed on a screen based on audio captured over a Google Hangouts call—with 96.5 percent accuracy. In a second experiment, the researchers were able to reliably capture on-screen keyboard strokes on a display in portrait mode (the typical tablet and smartphone configuration) with 96.4 percent accuracy, for transition times of one and three seconds between key “taps.” On a landscape-mode display, accuracy of the classifiers was much lower, with a first-guess success rate of only 40.8 percent. However, the correct typed word was in the top three choices 71.9 percent of the time for landscape mode, meaning that further human analysis could still result in accurate data capture. (The correct typed word was in the top three choices for the portrait mode classifier 99.6 percent of the time.)

In a third experiment, the researchers used guided machine learning in an attempt to extract text from displayed content based on the audio—a much more fine-grained sort of data than detecting changes in screen keyboard intensity. In this case, the experiment focused on a test set of 100 English words and also used somewhat ideal display settings for this sort of capture: all the letters were capitalized (in the Fixedsys Excelsior typeface with a character size 175 pixels wide) and black on an otherwise white screen. The results, as the team reported them, were promising:

The per-character validation set accuracy (containing 10% of our 10,000 trace collection) ranges from 88% to 98%, except for the last character where the accuracy was 75%. Out of 100 recordings of test words, for two of them preprocessing returned an error. For 56 of them, the most probable word on the list was the correct one. For 72 of them, the correct word appeared in the list of top-five most probable words.

While these tests were all done with a single monitor type, the researchers also demonstrated that a “cross screen” attack was possible—by using a remote connection to display the same image on a remote screen and recording the audio, it was possible to calibrate a baseline for the targeted screen.

It’s clear that there are limits to the practicality of acoustic side-channels as a means of remote surveillance. But as people move to use mobile devices such as smartphones and tablets for more computing tasks—with embedded microphones, limited screen sizes, and a more predictable display environment—the potential for these sorts of attacks could rise. And mitigating the risk would require re-engineering of current screen technology. So, while it remains a small risk, it’s certainly one that those working with sensitive data will need to kept in mind—especially if they’re spending much time in Google Hangouts with that data on-screen.

Read more on this page.

Read Next:

Google Titan Security key with secure FIDO two factor authentication is now available for purchase

6 artificial intelligence cybersecurity tools you need to know

Defending Democracy Program: How Microsoft is taking steps to curb increasing cybersecurity threats to democracy

Being a Senior Content Marketing Editor at Packt Publishing, I handle vast array of content in the tech space ranging from Data science, Web development, Programming, Cloud & Networking, IoT, Security and Game development. With prior experience and understanding of Marketing I aspire to grow leaps and bounds in the Content & Digital Marketing field. On the personal front I am an ambivert and love to read inspiring articles and books on life and in general.

LEAVE A REPLY

Please enter your comment!
Please enter your name here