The smart necklace can recognize the silence command
But those technologies require audible voices. What if a person is unable to speak or if the voice being raised in a particular setting is not appropriate?
Cheng Zhang, assistant professor of information science at the Cornell Ann S. Bowers College of Computer and Information Science, and doctoral student Ruidong Zhang have the answers: SpeeChin, a speech recognition device. silent (SSR) can identify silent commands using images of neck and facial skin deformities using a neck-mounted infrared (IR) camera.
The technology is detailed in “SpeeChin: A smart necklace for silent speech recognition”, published December 31 in the Computing Society’s Proceedings of Interactive Technology. , Mobile, Wearable and Popular.
Ruidong Zhang will also present the paper in October at the Ubiquitous Computing conference (UbiComp 2022).
“There are two questions: First, why a necklace? And second, why silence?” Zhang said. “We feel like necklaces are a form factor that people are used to, as opposed to on-ear wearables, which may not be as comfortable. As for silent voices, people might think, ‘I’ve got a voice recognition device on my phone.’ But you need pronunciation for those people, and that may not always be socially appropriate, or the person may not be able to pronounce the words.
“The device is capable of learning a person’s voice patterns, even when speaking silently,” he said.
“We are introducing a whole new form factor, new hardware, into the field,” said Ruidong Zhang, who built the initial prototype in 2020 at his home in China.
The device is superficially similar to NeckFace, a technology that Cheng Zhang and members of his SciFi Lab team unveiled last year. NeckFace continuously monitors facial expressions by using infrared cameras to capture images of the chin and face from below the neck.
Like the NeckFace, the SpeeChin has an IR camera mounted on a 3D printed necklace case, which is hung on a silver chain with the camera pointing up to the wearer’s chin. To increase stability, the developers designed a wing on each side and put a coin on the bottom.
Convenience and privacy are two reasons why necklace-mounted IR cameras may be preferable to traditional front-facing cameras, says Cheng Zhang. “A camera in front of you is taking a picture of what’s behind you, and that raises privacy concerns,” he said.
For their initial trial, with 20 participants (10 English speakers, 10 Mandarin Chinese), measurements were taken to determine the baseline position of the chin, then micro-imaging. The splitter is used to train the device to recognize simple commands.
Ruidong Zhang allowed participants to speak 54 commands in English, including numbers, interactive commands, voice assistant commands, punctuation commands, and navigation commands. Then he does the same with 44 simple Mandarin words or phrases.
SpeeChin recognized commands in English and Mandarin with an average accuracy of 90.5% and 91.6%, respectively. To further test its limits, the researchers did another study with 10 participants, all of whom silently uttered a specially designed list of 72 “nonwords” using phonemes. – combination of 18 consonants and four vowels.
Finally, the researchers recruited six participants to speak 10 Mandarin sentences and 10 English phrases while walking. The success rate was lower in this study, partly due to variation in walking style (eg, more versus less head movement) among participants.
The project exemplifies the power of determination: Ruidong Zhang built a lab in his home, complete with a soldering station, and recruited people from his hometown as research participants.
“But because I live in a small city and it’s very difficult to find English speakers,” he said, “we actually went to Hangzhou, of Zhejiang University, to recruit English speakers. It was an unforgettable experience for me.”
Support for this work came from the Cornell Department of Information Science, and in part from a Shanghai Jiaotong University-Cornell seed grant from the Cornell China Center.