- New deep learning model analyzes breast tissues in mammograms and accurately estimates density ratings.
- These ratings are an independent risk factor for breast cancer.
- The model takes less than one second to process one mammogram and it could be easily scaled throughout hospitals.
In the United States, breast cancer death rates are significantly higher than any other type of cancer, besides lung cancer (in women). According to the breastcancer.org, about 12.4 percent of women in the US develop invasive breast cancer over the course of their lifetime.
Mammography is a method of utilizing low-power X-rays to analyze the human breast for screening and diagnosis. However, dense tissue can make this process difficult by masking cancers on the mammogram. Typically, the evaluation of breast density depends on subjective human assessment. The outcomes vary across radiologists due to several factors.
Now, the researchers at Massachusetts General Hospital (MGH) and MIT have built an automated tool that accurately analyzes dense breast tissue in mammograms. It’s a deep-learning model trained on tens of thousands of high definition digital mammograms so that it can learn how to differentiate between different kinds of breast tissues.
Given a new mammographic image, the tools can detect a density measurement that is as reliable as expert radiologists. According to the authors, this is the first AI of its kind that has been successfully demonstrated on patients in the hospital. They believe that this technology can be broadly implemented across the nation and it will bring higher reliability to breast tissue assessments.
The tool is based on a convolutional neural network that is made up of neurons with learnable weights and biases. They trained and tested the network on a rich dataset containing over 58,000 mammographic images randomly taken from 39,000 women screened from 2009 to 2011. About 41,000 of these images were used for training, and 8,600 for testing.
Each demographic image contains a standard BI-RAIDS (breast imaging reporting and data system) density rating in 4 groups:
- Heterogeneous (mostly dense)
- Scattered Density
In the training and testing phase, nearly 40% were rated as dense and heterogeneous. Throughout the training phase, the network is fed with random mammographic pictures for evaluation. It gradually learns to map mammograms in a way that they closely align with experts’ density ratings.
For example, fatty breast tissue networks seem thinner with gray area throughout, whereas dense breasts consist of fibrous and glandular connective tissue that appear as tightly packed network of solid white patches and thick white lines. In the testing phase, the network sees new mammographic images and estimates the most likely density group.
The tool was implemented at MGH’s breast imaging department, where it was installed in an isolated machine. Typically, a mammogram is generated and sent to a facility for evaluation, which is performed by an expert radiologist. After all necessary investigations, he/she assigns a density rating to each mammogram.
When experts pull up a scan in their facility, they will see the ratings assigned by this deep-learning tool, which they can further reject or accept.
The network takes less than one second to process one mammogram and it could be scaled throughout hospitals in the city, without spending much money and extensive manpower.
Radiologist assessment vs deep learning (DL) assessment for binary test | Courtesy of researchers
Between January and May (2018), the network observed more than 10,000 mammographic pictures, and it was able to achieve 94% agreement among experts in a binary test, where they had to determine whether breasts where either dense and heterogeneous or scattered and fatty. For all 4 BI-RAIDS groups, it aligned with experts opinion 90% of the time.
Radiologist assessment vs deep learning (DL) assessment for 4 BI-RAIDS groups | Courtesy of researchers
In general testing (based on the training dataset), the network matched the radiologists’ interpretations 87% of the time in binary tests, and 77% across 4 BI-RADS groups.
The conventional prediction techniques use a metric named kappa score, where 1 represents that estimations agree every time and lower value represents fewer cases of agreements. For existing methods, Kappa scores reach up to 0.6, whereas for the new model, it reaches to 0.85 in clinical application and 0.76 in general testing. This clearly indicates that the new tool makes better estimation than conventional techniques.