- Facebook posts alone can predict diseases like diabetes, depression, anxiety, and psychoses.
- Like genomic information, social media content is capable of personalizing health care.
More than 2 billion people share information about their everyday lives over social media platforms, often revealing their personality, sentiments, and demographics. The number is expected to reach over 3 billion [monthly active social media users] by 2021, around 1/3rd of the entire population.
Such information contains useful health signals at the population level. Recently, researchers at Penn Medicine and Stony Brook University linked patients’ electronic medical records (EMR) with their social media data to identify certain markers of disease.
The research team included 999 patients who agreed to share their medical records and social media information. They analyzed approximately 949,000 Facebook status updates containing over 20 million words. Each participant’s post contained a minimum of 500 words.
Researchers used natural language processing — a subfield of artificial intelligence concerned with interactions between human (natural) and computer languages — to encode each participant language as a 700 dimensional patient language encoding.
They then categorized the diagnoses from participant’s EMR into 21 groups based on prevalence within the sample and Elixhauser Comorbidity Index.
In other words, researchers analyzed the language patterns [of facebook posts] — words, sentences, bunch of related words — and their connection with 21 standard categories of EMR diagnoses.
Overall, they used three models to examine the predictive power for the patients –
- The first model analyzed the language of Facebook post
- The second one used demographics like sex and age.
- The third model merged the two datasets.
Facebook content substantially improved upon the accuracy of predicting 18 of the 21 disease categories. It was proven highly efficient at predicting diabetes, pregnancy, depression, anxiety, psychoses and other mental health conditions.
In fact, 10 categories were more effectively predicated by Facebook post than by the traditional demographic factors (sex, age, and race).
“We can treat language pattern analogous to a genome and see similar diseases seem to have similar linguistic patterns” – Andrew Schwartz, senior author.
Medical diagnoses linked with encoded social media language can serve as a screening tool and used to elucidate disease epidemiology.
Like genomic information, social media content is capable of personalizing health care. By examining several medical conditions, researchers can better understand how these conditions are associated with each other, which can enable new applications for artificial intelligence for medicine.
To further improve the results, future studies can compare differences in health related data disclosed by patients of different demographic populations and on other platforms like Twitter.