Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Aug 3;18(1):360.
doi: 10.1186/s12859-017-1768-8.

Parallel multiple instance learning for extremely large histopathology image analysis

Affiliations

Parallel multiple instance learning for extremely large histopathology image analysis

Yan Xu et al. BMC Bioinformatics. .

Abstract

Background: Histopathology images are critical for medical diagnosis, e.g., cancer and its treatment. A standard histopathology slice can be easily scanned at a high resolution of, say, 200,000×200,000 pixels. These high resolution images can make most existing imaging processing tools infeasible or less effective when operated on a single machine with limited memory, disk space and computing power.

Results: In this paper, we propose an algorithm tackling this new emerging "big data" problem utilizing parallel computing on High-Performance-Computing (HPC) clusters. Experimental results on a large-scale data set (1318 images at a scale of 10 billion pixels each) demonstrate the efficiency and effectiveness of the proposed algorithm for low-latency real-time applications.

Conclusions: The framework proposed an effective and efficient system for extremely large histopathology image analysis. It is based on the multiple instance learning formulation for weakly-supervised learning for image classification, segmentation and clustering. When a max-margin concept is adopted for different clusters, we obtain further improvement in clustering performance.

Keywords: Histopathology image analysis; Microscopic image analysis; Multiple instance learning; Parallelization.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

The study protocol was approved by the Research Ethics Committee of the Department of Pathology in Zhejiang University. All the individuals used for the analyses have provided written, informed consent.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
Parallel Multiple Instance Learning (P-MIL) on High-Performance-Computing (HPC) cluster. Red: positive instances; Green: negative instances. At first, we divide and distribute data to the nodes. The master will collect the results calculated by individual nodes, train multiple classifiers and choose the best one. Next, the slaves receive the best weak classifier and calculate an individual α value. The master node then will synchronize all the nodes, choose the α best and broadcast it. At last, all the nodes will update classifiers with the α best and update new clusters with the new classifiers through communication, in which the master will coordinate to ensure data coherence. The program will continue running in a loop until the loop ends
Fig. 2
Fig. 2
Illustrations of max-margin using linear classifier. Green, red and purple dots represent three specific cancer subtypes, while black dots represent non-cancer instances. Linear boundaries are trained to separate cancer subtypes from each other (intra-class) and the non-cancer (inter-class)
Fig. 3
Fig. 3
Illustrations of cluster competition using max-margin linear classifier. Green and red dots represent two classes. In a, two classes are initialized by K-means method. In bd, cluster competition takes place until the model converges. Specifically, instances in each class are classified by linear classifiers, according to which they update their labels. Then, a new classifier is trained based on the new labels. The cluster competition converges when both classifiers and labels of instances become in a stable state
Fig. 4
Fig. 4
The mirrored Receiver Operating Characteristic (ROC) curve for comparisons of piece-level classification results with Multiple Instance Learning (MIL), Multiple Clustered Instance Learning (MCIL) and Parallel Multiple Instance Learning (P-MIL). The generalized mean (GM) model is the soft-max function in the methods
Fig. 5
Fig. 5
Image Types: a: The original images. bd: The instance-level segmentations for MIL, MCIL and P-MIL respectively. e: The ground truth. ANC: abnormal; NC: normal. Different colors represent different cancer subtypes

References

    1. Madabhushi A. Digital pathology image analysis: opportunities and challenges. Imaging in Medicine. 2009;1(1):7–10. doi: 10.2217/iim.09.9. - DOI - PMC - PubMed
    1. Sertel O, Kong J, Shimada H, Catalyurek UV, Saltz JH, Gurcan MN. Computer-aided prognosis of neuroblastoma on whole-slide images: Classification of stromal development. Pattern Recog. 2009;42(6):1093–103. doi: 10.1016/j.patcog.2008.08.027. - DOI - PMC - PubMed
    1. Gurcan MN, Boucheron LE, Can A, Madabhushi A, Rajpoot NM, Yener B. Histopathological image analysis: A review. Biomed Eng, IEEE Rev. 2009;2:147–71. doi: 10.1109/RBME.2009.2034865. - DOI - PMC - PubMed
    1. Morra M, Potts E. Choices. United States: Harper Collins; 2003.
    1. Chen Y, Bi J, Wang JZ. Miles: Multiple-instance learning via embedded instance selection. IEEE Trans Pattern Anal Mach Intell. 2006;28(12):1931–47. doi: 10.1109/TPAMI.2006.248. - DOI - PubMed