Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Aug;28(8):827-38.
doi: 10.1038/nbt.1665. Epub 2010 Jul 30.

The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models

Leming Shi  1 Gregory CampbellWendell D JonesFabien CampagneZhining WenStephen J WalkerZhenqiang SuTzu-Ming ChuFederico M GoodsaidLajos PusztaiJohn D Shaughnessy JrAndré OberthuerRussell S ThomasRichard S PaulesMark FieldenBart BarlogieWeijie ChenPan DuMatthias FischerCesare FurlanelloBrandon D GallasXijin GeDalila B MegherbiW Fraser SymmansMay D WangJohn ZhangHans BitterBenedikt BrorsPierre R BushelMax BylesjoMinjun ChenJie ChengJing ChengJeff ChouTimothy S DavisonMauro DelorenziYouping DengViswanath DevanarayanDavid J DixJoaquin DopazoKevin C DorffFathi ElloumiJianqing FanShicai FanXiaohui FanHong FangNina GonzaludoKenneth R HessHuixiao HongJun HuanRafael A IrizarryRichard JudsonDilafruz JuraevaSamir LababidiChristophe G LambertLi LiYanen LiZhen LiSimon M LinGuozhen LiuEdward K LobenhoferJun LuoWen LuoMatthew N McCallYuri NikolskyGene A PennelloRoger G PerkinsReena PhilipVlad PopoviciNathan D PriceFeng QianAndreas SchererTieliu ShiWeiwei ShiJaeyun SungDanielle Thierry-MiegJean Thierry-MiegVenkata ThodimaJohan TryggLakshmi VishnuvajjalaSue Jane WangJianping WuYichao WuQian XieWaleed A YousefLiang ZhangXuegong ZhangSheng ZhongYiming ZhouSheng ZhuDhivya ArasappanWenjun BaoAnne Bergstrom LucasFrank BertholdRichard J BrennanAndreas BunessJennifer G CatalanoChang ChangRong ChenYiyu ChengJian CuiWendy CzikaFrancesca DemichelisXutao DengDamir DosymbekovRoland EilsYang FengJennifer FostelStephanie Fulmer-SmentekJames C FuscoeLaurent GattoWeigong GeDarlene R GoldsteinLi GuoDonald N HalbertJing HanStephen C HarrisChristos HatzisDamir HermanJianping HuangRoderick V JensenRui JiangCharles D JohnsonGiuseppe JurmanYvonne KahlertSadik A KhuderMatthias KohlJianying LiLi LiMenglong LiQuan-Zhen LiShao LiZhiguang LiJie LiuYing LiuZhichao LiuLu MengManuel MaderaFrancisco Martinez-MurilloIgnacio MedinaJoseph MeehanKelci MiclausRichard A MoffittDavid MontanerPiali MukherjeeGeorge J MulliganPadraic NevilleTatiana NikolskayaBaitang NingGrier P PageJoel ParkerR Mitchell ParryXuejun PengRon L PetersonJohn H PhanBrian QuanzYi RenSamantha RiccadonnaAlan H RoterFrank W SamuelsonMartin M SchumacherJoseph D ShambaughQiang ShiRichard ShippyShengzhu SiAaron SmalterChristos SotiriouMat SoukupFrank StaedtlerGuido SteinerTodd H StokesQinglan SunPei-Yi TanRong TangZivana TezakBrett ThornMarina TsyganovaYaron TurpazSilvia C VegaRoberto VisintainerJuergen von FreseCharles WangEric WangJunwei WangWei WangFrank WestermannJames C WilleyMatthew WoodsShujian WuNianqing XiaoJoshua XuLei XuLun YangXiao ZengJialu ZhangLi ZhangMin ZhangChen ZhaoRaj K PuriUwe ScherfWeida TongRussell D WolfingerMAQC Consortium
Affiliations

The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models

Leming Shi et al. Nat Biotechnol. 2010 Aug.

Abstract

Gene expression data from microarrays are being applied to predict preclinical and clinical endpoints, but the reliability of these predictions has not been established. In the MAQC-II project, 36 independent teams analyzed six microarray data sets to generate predictive models for classifying a sample with respect to one of 13 endpoints indicative of lung or liver toxicity in rodents, or of breast cancer, multiple myeloma or neuroblastoma in humans. In total, >30,000 models were built using many combinations of analytical methods. The teams generated predictive models without knowing the biological meaning of some of the endpoints and, to mimic clinical reality, tested the models on data that had not been used for training. We found that model performance depended largely on the endpoint and team proficiency and that different approaches generated models of similar performance. The conclusions and recommendations from MAQC-II should be useful for regulatory agencies, study committees and independent investigators that evaluate methods for global gene expression analysis.

PubMed Disclaimer

Conflict of interest statement

COMPETING FINANCIAL INTERESTS

The authors declare competing financial interests: details accompany the full-text HTML version of the paper at http://www.nature.com/naturebiotechnology/.

Figures

Figure 1
Figure 1
Experimental design and timeline of the MAQC-II project. Numbers (1–11) order the steps of analysis. Step 11 indicates when the original training and validation data sets were swapped to repeat steps 4–10. See main text for description of each step. Every effort was made to ensure the complete independence of the validation data sets from the training sets. Each model is characterized by several modeling factors and seven internal and external validation performance metrics (Supplementary Tables 1 and 2). The modeling factors include: (i) organization code; (ii) data set code; (iii) endpoint code; (iv) summary and normalization; (v) feature selection method; (vi) number of features used; (vii) classification algorithm; (viii) batch-effect removal method; (ix) type of internal validation; and (x) number of iterations of internal validation. The seven performance metrics for internal validation and external validation are: (i) MCC; (ii) accuracy; (iii) sensitivity; (iv) specificity; (v) AUC; (vi) mean of sensitivity and specificity; and (vii) r.m.s.e. s.d. of metrics are also provided for internal validation results.
Figure 2
Figure 2
Model performance on internal validation compared with external validation. (a) Performance of 18,060 models that were validated with blinded validation data. (b) Performance of 13 candidate models. r, Pearson correlation coefficient; N, number of models. Candidate models with binary and continuous prediction values are marked as circles and squares, respectively, and the standard error estimate was obtained using 500-times resampling with bagging of the prediction results from each model. (c) Distribution of MCC values of all models for each endpoint in internal (left, yellow) and external (right, green) validation performance. Endpoints H and L (sex of the patients) are included as positive controls and endpoints I and M (randomly assigned sample class labels) as negative controls. Boxes indicate the 25% and 75% percentiles, and whiskers indicate the 5% and 95% percentiles.
Figure 3
Figure 3
Performance, measured using MCC, of the best models nominated by the 17 data analysis teams (DATs) that analyzed all 13 endpoints in the original training-validation experiment. The median MCC value for an endpoint, representative of the level of predicability of the endpoint, was calculated based on values from the 17 data analysis teams. The mean MCC value for a data analysis team, representative of the team’s proficiency in developing predictive models, was calculated based on values from the 11 non-random endpoints (excluding negative controls I and M). Red boxes highlight candidate models. Lack of a red box in an endpoint indicates that the candidate model was developed by a data analysis team that did not analyze all 13 endpoints.
Figure 4
Figure 4
Correlation between internal and external validation is dependent on data analysis team. Pearson correlation coefficients between internal and external validation performance in terms of MCC are displayed for the 14 teams that submitted models for all 13 endpoints in both the original (x axis) and swap (y axis) analyses. The unusually low correlation in the swap analysis for DAT3, DAT11 and DAT36 is a result of their failure to accurately predict the positive endpoint H, likely due to operator errors (Supplementary Table 6).
Figure 5
Figure 5
Effect of modeling factors on estimates of model performance. (a) Random-effect models of external validation performance (MCC) were developed to estimate a distinct variance component for each modeling factor and several selected interactions. The estimated variance components were then divided by their total in order to compare the proportion of variability explained by each modeling factor. The endpoint code contributes the most to the variability in external validation performance. (b) The BLUP plots of the corresponding factors having proportion of variation larger than 1% in a. Endpoint abbreviations (Tox., preclinical toxicity; BR, breast cancer; MM, multiple myeloma; NB, neuroblastoma). Endpoints H and L are the sex of the patient. Summary normalization abbreviations (GA, genetic algorithm; RMA, robust multichip analysis). Classification algorithm abbreviations (ANN, artificial neural network; DA, discriminant analysis; Forest, random forest; GLM, generalized linear model; KNN, K-nearest neighbors; Logistic, logistic regression; ML, maximum likelihood; NB, Naïve Bayes; NC, nearest centroid; PLS, partial least squares; RFE, recursive feature elimination; SMO, sequential minimal optimization; SVM, support vector machine; Tree, decision tree). Feature selection method abbreviations (Bscatter, between-class scatter; FC, fold change; KS, Kolmogorov-Smirnov algorithm; SAM, significance analysis of microarrays).

Comment in

References

    1. Marshall E. Getting the noise out of gene arrays. Science. 2004;306:630–631. - PubMed
    1. Frantz S. An array of problems. Nat Rev Drug Discov. 2005;4:362–363. - PubMed
    1. Michiels S, Koscielny S, Hill C. Prediction of cancer outcome with microarrays: a multiple random validation strategy. Lancet. 2005;365:488–492. - PubMed
    1. Ntzani EE, Ioannidis JP. Predictive ability of DNA microarrays for cancer outcomes and correlates: an empirical assessment. Lancet. 2003;362:1439–1444. - PubMed
    1. Ioannidis JP. Microarrays and molecular research: noise discovery? Lancet. 2005;365:454–455. - PubMed

Publication types

MeSH terms