|
Journal of Computational Biology
Numerical Comparison of Several Approximations of the Word Count Distribution in Random Sequences
To cite this article:
Stéphane Robin, Sophie Schbath.
Journal of Computational Biology.
September 2001,
8(4): 349-359.
doi:10.1089/106652701752236179.
Stéphane Robin INRA, Unité Mathématique, Informatique & Génome, F78026 Versailles. Sophie Schbath INRA, Unité Mathématique, Informatique & Génome, F78026 Versailles. The exact distribution of word counts in random sequences and several approximations have been proposed in the past few years. The exact distribution has no theoretical limit but may require prohibitive computation time. On the other hand, approximate distributions can be rapidly calculated but, in practice, are only accurate under specific conditions. After making a survey of these distributions, we compare them according to both their accuracy and computational cost. Rules are suggested for choosing between Gaussian approximations, compound Poisson approximation, and exact distribution. This work is illustrated with the detection of exceptional words in the phage Lambda genome.  This paper was cited by:Compound Poisson Approximation of the Number of Occurrences of a Position Frequency Matrix (PFM) on Both Strands Utz J. Pape, Sven Rahmann, Fengzhu Sun, Martin Vingron Journal of Computational Biology. Jul 2008, Vol. 15, No. 6: 547-564 Abstract | Full Text PDF | Reprints & PermissionsAssessing the Exceptionality of Network Motifs F. Picard, J.-J. Daudin, M. Koskas, S. Schbath, S. Robin Journal of Computational Biology. Jan 2008, Vol. 15, No. 1: 1-20 Abstract | Full Text PDF | Reprints & PermissionsString Matching and 1d Lattice Gases Muhittin Mungan Journal of Statistical Physics. Feb 2007, Vol. 126, No. 1: 207-242 CrossRef Searching for Multiple Words in a Markov Sequence Yonil Park, John L. Spouge INFORMS Journal on Computing. Oct 2004, Vol. 16, No. 4: 341-347 CrossRef A compound Poisson model for word occurrences in DNA sequences Stephane Robin Journal of the Royal Statistical Society: Series C (Applied Statistics). Nov 2002, Vol. 51, No. 4: 437-451 CrossRef
|
|