|
Journal of Computational Biology
Methods in Comparative Genomics: Genome Correspondence, Gene Identification and Regulatory Motif Discovery
To cite this article:
Manolis Kellis, Nick Patterson, Bruce Birren, Bonnie Berger, Eric S. Lander.
Journal of Computational Biology.
March 2004,
11(2-3): 319-355.
doi:10.1089/1066527041410319.
Manolis Kellis Whitehead Institute Center for Genome Research, MIT, Cambridge, MA 02139; Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139 Nick Patterson Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139 Bruce Birren Whitehead Institute Center for Genome Research, MIT, Cambridge, MA 02139.2 Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139 Bonnie Berger Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139 Eric S. Lander Department of Biology, MIT, Cambridge, MA 02139 In Kellis et al. (2003), we reported the genome sequences of S. paradoxus, S. mikatae, and S. bayanus and compared these three yeast species to their close relative, S. cerevisiae. Genomewide comparative analysis allowed the identification of functionally important sequences, both coding and noncoding. In this companion paper we describe the mathematical and algorithmic results underpinning the analysis of these genomes. (1) We present methods for the automatic determination of genome correspondence. The algorithms enabled the automatic identification of orthologs for more than 90% of genes and intergenic regions across the four species despite the large number of duplicated genes in the yeast genome. The remaining ambiguities in the gene correspondence revealed recent gene family expansions in regions of rapid genomic change. (2) We present methods for the identification of protein-coding genes based on their patterns of nucleotide conservation across related species. We observed the pressure to conserve the reading frame of functional proteins and developed a test for gene identification with high sensitivity and specificity. We used this test to revisit the genome of S. cerevisiae, reducing the overall gene count by 500 genes (10% of previously annotated genes) and refining the gene structure of hundreds of genes. (3) We present novel methods for the systematic de novo identification of regulatory motifs. The methods do not rely on previous knowledge of gene function and in that way differ from the current literature on computational motif discovery. Based on genomewide conservation patterns of known motifs, we developed three conservation criteria that we used to discover novel motifs. We used an enumeration approach to select strongly conserved motif cores, which we extended and collapsed into a small number of candidate regulatory motifs. These include most previously known regulatory motifs as well as several noteworthy novel motifs. The majority of discovered motifs are enriched in functionally related genes, allowing us to infer a candidate function for novel motifs. Our results demonstrate the power of comparative genomics to further our understanding of any species. Our methods are validated by the extensive experimental knowledge in yeast and will be invaluable in the study of complex genomes like that of the human.  This paper was cited by:Global alignment of multiple protein interaction networks with application to functional orthology detection R. Singh, J. Xu, B. Berger Proceedings of the National Academy of Sciences. Oct 2008, Vol. 105, No. 35: 12763-12768 CrossRef Delineating Slowly and Rapidly Evolving Fractions of the Drosophila Genome Jonathan M. Keith, Peter Adams, Stuart Stephen, John S. Mattick Journal of Computational Biology. May 2008, Vol. 15, No. 4: 407-430 Abstract | Full Text PDF | Reprints & PermissionsOrthology and Functional Conservation in Eukaryotes Kara Dolinski, David Botstein Annual Review of Genetics. Jan 2008, Vol. 41, No. 1: 465-507 CrossRef Natural history and evolutionary principles of gene duplication in fungi Ilan Wapinski, Avi Pfeffer, Nir Friedman, Aviv Regev Nature. Oct 2007, Vol. 449, No. 7158: 54-61 CrossRef Current awareness on comparative and functional genomics Comparative and Functional Genomics. Mar 2005, Vol. 6, No. 1-2: 97-112 CrossRef
|
|