- Researchers use a recurrent neural network to help people who are paralyzed and can’t communicate.
- The network converts the neural activity into speech acoustics.
- It can help patients communicate unconstrained vocabularies at a natural speaking rate.
Many neurological conditions result in the loss of communication, leaving patients to completely rely on assistive devices. These devices enable them to type sentences character by character at up to 10 words per minute. However, this speed is too slow as compared to everyday conversation which takes place at about 150 words per minute.
To enable far higher or even natural communication rates, researchers at the University of California San Francisco have used a biomimetic approach which emphasizes on vocal tract movements and sound that they produce.
They have shown that it is possible to generate synthesized speech directly from brain signals. These signals precisely coordinate about 100 muscles to move the lips, tongue, jaw, and larynx, shaping breath into sounds that eventually form words and sentences.
The team recorded high-density electrocorticography signals from 5 participants who were being treated for epilepsy (a neurological disorder). All participants were asked to read sentences aloud while electrodes placed on their brains’ surface measured the resulting signals.
Recurrent Neural Network
The researchers developed a recurrent neural network to decode cortical signals with an explicit intermediate representation of the articulatory dynamics, and eventually synthesize audible speech.
Reference: Nature | DOI:10.1038/s41586-019-1119-1 | UC San Francisco
The neural network is trained on the sound of the participants speaking sentences aloud, along with the cortical signals. They used ADAM optimizer to train the algorithm. For the first and second stage of training, a batch size of 256 and 25 was used, respectively.
The stacked deep encoder-decoder network explicitly incorporated the brain signals to decode the primary physiological correlate of neural activity and then transformed it into speech acoustics. It was optimized to decode acoustics directly from the electrodes.
Speech synthesis from neurally decoded spoken sentences | Courtesy of researchers
This statistical mapping enables generalization with limited training datasets. The researchers were able to achieve satisfactory performance with 25 minutes of speech, and performance continuously increased as they fed more data.
What’s Next?
The study presents an advanced method for addressing a big obstacle posed by patients suffering from neurological disorders. According to the generalization outcomes, speakers share a similar kinematic state-space representation, which is independent of the speaker. The model knowledge — mapping of kinematics to sound across participants — can be transferred.
Tapping into this low-dimensional representation of neural activity from different people could facilitate brain-computer interface learning. The findings can open new doors for realizing speech restoration for patients with paralysis.
Read: AI Can Read Research Papers And Provide A Plain-English Summary
The neural network developed in this study provides the capability to communicate unconstrained vocabularies at a natural speaking rate. This direct speech synthesis approach captures prosodic elements of speech, including pitch intonation, that aren’t available with text output. Moreover, it may be easier and intuitive to learn to use for patients in whom the cortical processing of articulation is still intact.