- A new neural network can analyze raw audio and text data to discover patterns indicating depression.
- It’s a context-free model that doesn’t rely on any specific sets of questions and their responses.
- The average prediction accuracy of the model is 77%.
Patient Health Questionnaire (PHQ) is the standard procedure for screening and diagnosing depression. It involves asking a set of questions to determine whether the patient finds pleasure in doing things, feels tired, has a poor appetite or struggles to sleep, etc.
In the last couple of years, AI has been proven to work phenomenally well in the medical field which has led to the development of machine learning methods to identify words and intonations in the speech that may indicate depression.
However, these methods are based on an individual’s specific answers to certain questions. Although they are fairly accurate, their dependence on types of questions being asked limits where and how they can be utilized.
To remove such limitations, a team of researchers from MIT presented a new neural network that can analyze raw audio and text data from interviews to discover patterns indicating depression. It’s capable of accurately predicting whether the patient is depressed, without requiring any additional information from Q&A.
How Does It Work?
Speech is the main source through which we can identify if an individual is sad, happy, excited, or has some serious cognitive condition like depressive disorder. To develop an effective depression-detection model, one needs to minimize the number of constraints on the types of data being used.
The new neural network works on the same methodology: it can be deployed in any causal conversations, then it automatically picks up from the natural interaction and the state of the patient. Since each patient talks in a different manner, the model looks at every tiny detail.
The model detects the patterns that indicate depression and maps them to new patients, with no additional data. Authors are calling it “context-free modeling” it because doesn’t rely on any constraints of types of questions and their responses.
Whereas other models require a specific set of inputs, such as a simple to-the-point query, ‘Do you have a history of depression?’ Such methods use direct responses to these types of questions to determine if a patient is depressed. However, in reality, that is not how conversations work.
The new model uses sequence modeling, in which audio and text data obtained from both non-depressed and depressed people are processed one by one. It focuses on words like ‘down’, ‘low’, or ‘sad’, and pairs them with flatter and monotone audio signals. How fast or slow a person is speaking is also taken into account.
The neural network carefully analyzes the speaking style, sequence of words, and determines that those patterns are more likely to be observed in individuals who are normal or suffering through some sort of mental disorder. Then, if the model detects the similar sequences in new people, it can tell if they are depressed too.
Testing and Applications
The neural network is trained on 142 interactions with virtual agents and patients having mental disorders. All these data were collected from the Distress Analysis Interview Corpus that consists of text, audio, and video interviews.
In terms of depression, each individual is rated from 0 to 27. The score represents the depression level: higher score (between 20 and 27) means the person is extremely depressed. In this experiment, 28 individuals were marked as depressed.
The neural network was evaluated on two parameters: precision and recall. Precision calculates which of the people marked as depressed were actually depressed, whereas recall calculates the model’s accuracy in identifying all people (in the whole dataset) who were diagnosed as depressed.
The model achieved a score of 71% in precision and 83% in recall. Taking all errors into account, the average score was 77%. These results are quite impressive as compared to other models.
Researchers plan to test this network on more subjects with other types of disorders like dementia. Right now, the model can be considered as a black box, but they will try to find what types of patterns the network detects across scores of raw data.
In the future, the method could be deployed in mobile applications to monitor users’ voice and text for any kind of distress and send alerts. This would be extremely helpful for people who cannot get to doctors due to cost, distance or lack of awareness.