Hello. Sign in to personalize your visit. New user? Register now.  
Journal of Computational Biology
Numerical Comparison of Several Approximations of the Word Count Distribution in Random Sequences

To cite this article:
Stéphane Robin, Sophie Schbath. Journal of Computational Biology. September 2001, 8(4): 349-359. doi:10.1089/106652701752236179.

Full Text: • PDF for printing (233.4 KB) • PDF w/ links (244.9 KB)


Stéphane Robin
INRA, Unité Mathématique, Informatique & Génome, F78026 Versailles.
Sophie Schbath
INRA, Unité Mathématique, Informatique & Génome, F78026 Versailles.

The exact distribution of word counts in random sequences and several approximations have been proposed in the past few years. The exact distribution has no theoretical limit but may require prohibitive computation time. On the other hand, approximate distributions can be rapidly calculated but, in practice, are only accurate under specific conditions. After making a survey of these distributions, we compare them according to both their accuracy and computational cost. Rules are suggested for choosing between Gaussian approximations, compound Poisson approximation, and exact distribution. This work is illustrated with the detection of exceptional words in the phage Lambda genome.

Free first page

This paper was cited by:

Compound Poisson Approximation of the Number of Occurrences of a Position Frequency Matrix (PFM) on Both Strands
Utz J. Pape, Sven Rahmann, Fengzhu Sun, Martin Vingron
Journal of Computational Biology. Jul 2008, Vol. 15, No. 6: 547-564
Abstract | Full Text PDF | Reprints & Permissions
Assessing the Exceptionality of Network Motifs
F. Picard, J.-J. Daudin, M. Koskas, S. Schbath, S. Robin
Journal of Computational Biology. Jan 2008, Vol. 15, No. 1: 1-20
Abstract | Full Text PDF | Reprints & Permissions
String Matching and 1d Lattice Gases
Muhittin Mungan
Journal of Statistical Physics. Feb 2007, Vol. 126, No. 1: 207-242
CrossRef
Searching for Multiple Words in a Markov Sequence
Yonil Park, John L. Spouge
INFORMS Journal on Computing. Oct 2004, Vol. 16, No. 4: 341-347
CrossRef
A compound Poisson model for word occurrences in DNA sequences
Stephane Robin
Journal of the Royal Statistical Society: Series C (Applied Statistics). Nov 2002, Vol. 51, No. 4: 437-451
CrossRef
All articles
Previous Next