Biostatistics, Aventis, Mail Stop B-203C. P. O. Box 6800, Bridgewater, NJ 08807-6800.
Charles B. Epstein
Aventis Cambridge Genomics Center, 26 Landsdowne St., Cambridge, MA 02139-4234.
Automated high-throughput sequencing of cDNA clones from numerous libraries has generated a wealth of information about both genome sequence and relative transcript abundances. A common statistical challenge in the analysis of library sequences is to infer whether there is differential expression for the same transcript under two different conditions, such as normal and diseased tissue. In contrast to the continuously variable intensity measurements from microarray experiments, data from cDNA library sequencing presents itself as a discrete count of the incidence of some clone or transcript in a finite sample. In this paper, we first propose a statistical model for data generated from cDNA library sequencing efforts. The model is based on the Poisson mixed with generalized inverse Gaussian (PGIG), introduced by Sichel (1971, 1975). PGIG has been used in modeling population abundance, ecological studies, word frequencies in publications, etc. Using data from the literature, we show that the proposed model provides a good fit to the observed data. Using this new model for cDNA library data, we developed an empirical Bayesian significance test (EBST) for inferring the statistical significance of differential gene expression from discrete data.
This paper was cited by:
Stream temperature and the potential growth and survival of juvenile Oncorhynchus mykiss in a southern California creek
DAVID A. BOUGHTON, MICHAEL GIBSON, ROBERT YEDOR, ELISE KELLEY
Freshwater Biology. Aug 2007, Vol. 52, No. 7: 1353-1364