Hello. Sign in to personalize your visit. New user? Register now.  
Journal of Computational Biology
A Bayesian Network Classification Methodology for Gene Expression Data

To cite this article:
Paul Helman, Robert Veroff, Susan R. Atlas, Cheryl Willman. Journal of Computational Biology. 2004, 11(4): 581-615. doi:10.1089/cmb.2004.11.581.

Published in Volume: 11 Issue 4: January 20, 2005

Full Text: • PDF for printing (173 KB) • PDF w/ links (232.6 KB)


Paul Helman
Computer Science Department, University of New Mexico, Albuquerque, NM 87131.
Robert Veroff
Computer Science Department, University of New Mexico, Albuquerque, NM 87131.
Susan R. Atlas
Department of Physics and Astronomy and Center for Advanced Studies, University of New Mexico, Albuquerque, NM 87131.
Cheryl Willman
Department of Pathology and UNM Cancer Research and Treatment Center, UNM School of Medicine, University of New Mexico, Albuquerque, NM 87131.

We present new techniques for the application of a Bayesian network learning framework to the problem of classifying gene expression data. The focus on classification permits us to develop techniques that address in several ways the complexities of learning Bayesian nets. Our classification model reduces the Bayesian network learning problem to the problem of learning multiple subnetworks, each consisting of a class label node and its set of parent genes. We argue that this classification model is more appropriate for the gene expression domain than are other structurally similar Bayesian network classification models, such as Naive Bayes and Tree Augmented Naive Bayes (TAN), because our model is consistent with prior domain experience suggesting that a relatively small number of genes, taken in different combinations, is required to predict most clinical classes of interest. Within this framework, we consider two different approaches to identifying parent sets which are supported by the gene expression observations and any other currently available evidence. One approach employs a simple greedy algorithm to search the universe of all genes; the second approach develops and applies a gene selection algorithm whose results are incorporated as a prior to enable an exhaustive search for parent sets over a restricted universe of genes. Two other significant contributions are the construction of classifiers from multiple, competing Bayesian network hypotheses and algorithmic methods for normalizing and binning gene expression data in the absence of prior expert knowledge. Our classifiers are developed under a cross validation regimen and then validated on corresponding out-of-sample test sets. The classifiers attain a classification rate in excess of 90% on out-of-sample test sets for two publicly available datasets. We present an extensive compilation of results reported in the literature for other classification methods run against these same two datasets. Our results are comparable to, or better than, any we have found reported for these two sets, when a train-test protocol as stringent as ours is followed.

Free first page

This paper was cited by:

Cancer classification from serial analysis of gene expression with event models
Xin Jin, Anbang Xu, Rongfang Bie
Applied Intelligence. Sep 2008, Vol. 29, No. 1: 35-46
CrossRef
Microarray analysis: basic strategies for successful experiments
Scott A. Ness
Molecular Biotechnology. Jul 2007, Vol. 36, No. 3: 205-219
CrossRef
Gene expression overlap affects karyotype prediction in pediatric acute lymphoblastic leukemia
S B Martin, M P Mosquera-Caro, J W Potter, G S Davidson, E Andries, H Kang, P Helman, R L Veroff, S R Atlas, M Murphy, X Wang, K Ar, Y Xu, I-M Chen, F A Schultz, C S Wilson, R Harvey, E Bedrick, J Shuster, A J Carroll, B Camitta, C L Willman
Leukemia. Jul 2007, Vol. 21, No. 6: 1341-1344
CrossRef
Current awareness on comparative and functional genomics
Comparative and Functional Genomics. Mar 2005, Vol. 6, No. 1-2: 97-112
CrossRef
All articles
Previous Next