How AI Can Listen to Your Typing and Swipe Your Secrets

In this era of digitalization, instances of cybercrimes have become frequent occurrences. The realm of cybersecurity is constantly haunted by various threats, including phishing attacks, malware infections, password theft, to name a few.

However, a new and disconcerting peril has surfaced as a result of the rapid progress in artificial intelligence.

Artificial intelligence has now attained the capability to discern typing sounds — and then decipher the text being typed.

This form of attack, known as acoustic side-channel attacks, poses a nascent but significant threat to the digital domain. Experts are expressing concerns that video conferencing platforms like Zoom, and the regular use of in-built microphones, are exponentially amplifying these risks.

Prominent researchers such as Joshua Harrison, Ehsan Toreini and Maryam Mehrnezhad emphasize that due to recent advancements in deep learning and the surge in online activities through personal devices, the menace of acoustic side-channel attacks is more pronounced than ever before.

How Does Acoustic Snooping Function?

The mechanism underpinning the acoustic attack involves the intricate mapping of typed letters to the auditory emissions generated by keystrokes.

Exploring the Threats Posed by Acoustic Attacks

The emergence of acoustic intruding, enabled by artificial intelligence’s capability to decipher keystrokes from typing sounds, has become a concerning attack vector.

The widespread adoption of video conferencing apps like Zoom, MS-Teams and Skype during the remote work surge has amplified the risk of acoustic eavesdropping. Notably, Zoom’s user base skyrocketed from 10 million in 2019 to over 300 million in 2020 amidst the pandemic only.

Modern device microphones have the potential to effortlessly capture typing sounds, forwarding them to AI surveillance tools. This significantly heightens the threat of acoustic attacks, enabling the theft of sensitive data like passwords, financial details, credit card information and confidential messages.

Often, users unknowingly vocalize confidential information while engaged in audio/video calls, oblivious to the fact that an omnipresent microphone is broadcasting distinct keystroke sounds.

Exploiting this, attackers can deploy basic speech recognition algorithms to sift through these audio streams, separating voices to extract typing sounds. Noise reduction technologies further enhance call quality by eliminating background noise and clarifying the reception of acoustic keystroke signals. The spectrum of risks associated with acoustic espionage is expanding, affecting both individuals and entire enterprise networks.

Some areas where these attacks can be used:

Sensitive financial and confidential data is susceptible to compromise, including but not limited to social security numbers, credit card details, bank account information, and passwords. Such breaches could potentially contribute to identity theft and various forms of financial fraud.
Cybercriminals have the capability to acoustically intercept password manager master keys, enabling them to decrypt entire identity databases within an organization.
Instances of corporate espionage attacks involve covertly eavesdropping on business plans, confidential company information and financial details.
Advertising technology companies have the potential to clandestinely gather extensive personal and behavioral data by covertly monitoring users’ typing patterns.
All personal information faces high risk, including intimate details that could be exposed or traded. This poses a threat to professionals like politicians, journalists, and lawyers, potentially leading to embarrassment and damage to reputation.

How to Protect Yourself?

User vigilance and caution stand out as the paramount considerations for safeguarding one’s privacy.

When engaging with video chat applications, it becomes imperative to adeptly manage microphones, deploy anomaly detection, employ encryption and utilize purpose-built physical microphone blockers, all aimed to prevent acoustic eavesdropping capabilities effectively.

Following are some of the security measures to be followed.

Utilize highly randomized and strong passwords.
Input passwords silently.
Activate two-factor or multifactor authentication.
Employ audio masking devices.
Refrain from entering sensitive information while on video calls.
Prefer keyboards equipped with acoustic shielding.
Employing touch-sensitive keyboards.
Incorporate ambient background sounds.

Do We Really Need to Worry?

While the risk is a constant presence, our concern might not be as escalated as it initially sounds.

Firstly, the efficacy of such methods relies heavily on acquiring a suitable and distinct set of sample data, a criterion that alters across various systems such as smartphones, tablets, MacBooks or external keyboards.

Another influential aspect is the individual’s typing style — ranging from forceful key presses to gentle touches. This variation in sound dynamics can challenge the precision of sound-capturing devices. Consequently, constructing an accurate dataset that aligns with your unique typing patterns and keyboard becomes a complex endeavour.

Furthermore, interpreting the sequence of captured sounds and distinguishing between elements like passwords and email addresses poses a formidable challenge. Some AI models might also encounter difficulties in discerning between specific keys like ‘Shift’ or ‘Ctrl’.

The Bottom Line

The rise of acoustic side-channel attacks signifies a potentially transformative shift within the domain of cyber threats. These techniques of acoustic eavesdropping challenge the very foundations of privacy and confidentiality during data input.

While users might feel assured when visual access is restricted, interpreting keystroke sounds can compromise supposedly secure environments.

Machine learning algorithms, having been trained on audio snippets of typing, exhibit astonishing proficiency in extracting textual information.

With various research findings demonstrating accuracy levels surpassing 90% in isolating keystrokes from audio data alone, addressing this menace necessitates the implementation of innovative and highly advanced defensive strategies.