Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Multicenter Study
. 2012 Sep-Oct;19(5):913-6.
doi: 10.1136/amiajnl-2011-000607. Epub 2012 Jan 29.

Automatic classification of mammography reports by BI-RADS breast tissue composition class

Affiliations
Multicenter Study

Automatic classification of mammography reports by BI-RADS breast tissue composition class

Bethany Percha et al. J Am Med Inform Assoc. 2012 Sep-Oct.

Abstract

Because breast tissue composition partially predicts breast cancer risk, classification of mammography reports by breast tissue composition is important from both a scientific and clinical perspective. A method is presented for using the unstructured text of mammography reports to classify them into BI-RADS breast tissue composition categories. An algorithm that uses regular expressions to automatically determine BI-RADS breast tissue composition classes for unstructured mammography reports was developed. The algorithm assigns each report to a single BI-RADS composition class: 'fatty', 'fibroglandular', 'heterogeneously dense', 'dense', or 'unspecified'. We evaluated its performance on mammography reports from two different institutions. The method achieves >99% classification accuracy on a test set of reports from the Marshfield Clinic (Wisconsin) and Stanford University. Since large-scale studies of breast cancer rely heavily on breast tissue composition information, this method could facilitate this research by helping mine large datasets to correlate breast composition with other covariates.

PubMed Disclaimer

Conflict of interest statement

Competing interests: None.

Figures

Figure 1
Figure 1
A diagrammatic explanation of the rules used to assign reports to different BI-RADS tissue composition classes. Each row represents a pattern unique to the class shown at the left. White rectangles represent sets of words or word stems that must be present at a given location to fulfill the rule. Gray rectangles represent words/stems that cannot be present at a location for the rule to be fulfilled. The small gray boxes represent unspecified words. The asterisk (*) is used to denote multiple possible word endings. So, for example, a report would be assigned to class 2 if it contained the stem scatter followed by 0, 1, or 2 other words, and then the stem fibrogland or fibronodul. Similarly, a report would be assigned to class 1 if it contained the word breast(s) or tissue followed immediately by the phrase is/are fatty, but the stem fibrogland or fibronodul did not occur immediately after fatty.

References

    1. Boyd NF, Martin LJ, Bronskill M, et al. Breast tissue composition and susceptibility to breast cancer. J Nat Cancer Inst 2010;102:1224–37 - PMC - PubMed
    1. Boyd NF, Rommens JM, Vogt K, et al. Mammographic breast density as an intermediate phenotype for breast cancer. Lancet Oncol 2005;6:798–808 - PubMed
    1. Martin LJ, Melnichouk O, Guo H, et al. Family history, mammographic density, and risk of breast cancer. Cancer Epidemiol Biomarkers Prev 2010;19:456–63 - PubMed
    1. Carney PA, Miglioretti DL, Yankaskas BC, et al. Individual and combined effects of age, breast density, and hormone replacement therapy use on the accuracy of screening mammography. Ann Intern Med 2003;138:168–75 - PubMed
    1. American College of Radiology (ACR) Breast Imaging Reporting and Data System (BI-RADS). 3rd edn Reston, VA: American College of Radiology, 2003

Publication types