Hello. Sign in to personalize your visit. New user? Register now.  
Journal of Computational Biology
Nonrandom Clusters of Palindromes in Herpesvirus Genomes

To cite this article:
Ming-Ying Leung, Kwok Pui Choi, Aihua Xia, Louis H.Y. Chen. Journal of Computational Biology. April 2005, 12(3): 331-354. doi:10.1089/cmb.2005.12.331.

Published in Volume: 12 Issue 3: April 21, 2005

Full Text: • PDF for printing (170.5 KB) • PDF w/ links (197.4 KB)


Ming-Ying Leung
Department of Mathematical Sciences, University of Texas at El Paso, El Paso, TX 79968-0514.
Kwok Pui Choi
Department of Statistics and Applied Probability, National University of Singapore, Singapore 117543.
Aihua Xia
Department of Mathematics and Statistics, University of Melbourne, VIC 3010, Australia.
Louis H.Y. Chen
Institute for Mathematical Sciences, National University of Singapore, Singapore 118402.

Palindromes are symmetrical words of DNA in the sense that they read exactly the same as their reverse complementary sequences. Representing the occurrences of palindromes in a DNA molecule as points on the unit interval, the scan statistics can be used to identify regions of unusually high concentration of palindromes. These regions have been associated with the replication origins on a few herpesviruses in previous studies. However, the use of scan statistics requires the assumption that the points representing the palindromes are independently and uniformly distributed on the unit interval. In this paper, we provide a mathematical basis for this assumption by showing that in randomly generated DNA sequences, the occurrences of palindromes can be approximated by a Poisson process. An easily computable upper bound on the Wasserstein distance between the palindrome process and the Poisson process is obtained. This bound is then used as a guide to choose an optimal palindrome length in the analysis of a collection of 16 herpesvirus genomes. Regions harboring significant palindrome clusters are identified and compared to known locations of replication origins. This analysis brings out a few interesting extensions of the scan statistics that can help formulate an algorithm for more accurate prediction of replication origins.

Free first page

This paper was cited by:

A Latent Model to Detect Multiple Clusters of Varying Sizes
Minge Xie, Qiankun Sun, Joseph Naus
Biometrics. May 2009
CrossRef
Statistical properties of thermodynamically predicted RNA secondary structures in viral genomes
M. Spanò, F. Lillo, S. Miccichè, R. N. Mantegna
The European Physical Journal B. Nov 2008, Vol. 65, No. 3: 323-331
CrossRef
Inverted and mirror repeats in model nucleotide sequences
Fabrizio Lillo, Marco Spanò
Physical Review E. Nov 2007, Vol. 76, No. 4
CrossRef
All articles
Previous Next