EchoSpeech: Revolutionizing Communication with Silent-Speech Recognition Technology

Researchers at Cornell University have developed EchoSpeech, a silent-speech recognition interface that employs acoustic-sensing and artificial intelligence to continuously recognize up to 31 unvocalized commands based on lip and mouth movements. This low-power, wearable interface can be operated on a smartphone and requires only a few minutes of user training data for command recognition.

Ruidong Zhang, a doctoral student of information science, is the lead author of “EchoSpeech: Continuous Silent Speech Recognition on Minimally-obtrusive Eyewear Powered by Acoustic Sensing,” which will be presented at the Association for Computing Machinery Conference on Human Factors in Computing Systems (CHI) this month in Hamburg, Germany.

“For people who cannot vocalize sound, this silent speech technology could be an excellent input for a voice synthesizer. It could give patients their voices back,” Zhang said, highlighting the technology’s potential applications with further development.

Real-World Applications and Privacy Advantages

In its current form, EchoSpeech could be used for communicating with others via smartphone in environments where speech is inconvenient or inappropriate, such as noisy restaurants or quiet libraries. The silent speech interface can also be paired with a stylus and utilized with design software like CAD, significantly reducing the need for a keyboard and a mouse.

Equipped with microphones and speakers smaller than pencil erasers, the EchoSpeech glasses function as a wearable AI-powered sonar system, sending and receiving soundwaves across the face and detecting mouth movements. A deep learning algorithm then analyzes these echo profiles in real-time with approximately 95% accuracy.

“We’re moving sonar onto the body,” said Cheng Zhang, assistant professor of information science and director of Cornell’s Smart Computer Interfaces for Future Interactions (SciFi) Lab.

Existing silent-speech recognition technology typically relies on a limited set of predetermined commands and necessitates the user to face or wear a camera. Cheng Zhang explained that this is neither practical nor feasible and also raises significant privacy concerns for both the user and those they interact with.

EchoSpeech’s acoustic-sensing technology eliminates the need for wearable video cameras. Moreover, since audio data is smaller than image or video data, it requires less bandwidth to process and can be transmitted to a smartphone via Bluetooth in real-time, according to François Guimbretière, professor in information science.

“And because the data is processed locally on your smartphone instead of uploaded to the cloud,” he said, “privacy-sensitive information never leaves your control.”

EchoSpeech: Revolutionizing Communication with Silent-Speech Recognition Technology

Real-World Applications and Privacy Advantages

Latest stories

CMS Uses Machine Learning to Fully Reconstruct LHC Collisions

LANL: AI Accelerates Elucidation of Nuclear Forces with Explosive Neutron...

PNNL: Integrating AI into Biological Research

Rick Stevens on the Genesis Mission and the Future of...

Inside the DOE’s 26 AI Challenges for Genesis Mission

You might also like...

CMS Uses Machine Learning to Fully Reconstruct LHC Collisions

LANL: AI Accelerates Elucidation of Nuclear Forces with Explosive Neutron Star Data

PNNL: Integrating AI into Biological Research