|
Journal of Computational Biology
Checking Homogeneity of Motifs' Distribution in Heterogenous Sequences
To cite this article:
Sabrina Ledent, Stéphane Robin.
Journal of Computational Biology.
July/August 2005,
12(6): 672-685.
doi:10.1089/cmb.2005.12.672.
Sabrina Ledent Unité Mathématique, Informatique et Génome, Institut National de la Recherche Agronomique (INRA), F-78350 Jouy-en-Josas, France. Stéphane Robin Unité Mathématique, Informatique et Génome, Institut National de la Recherche Agronomique (INRA), F-78350 Jouy-en-Josas, France. Unité mixte de recherche Institut National Agronomique Paris-Grignon/INRA de Mathématiques et Informatique Appliquées, 5 rue Claude Bernard, F-75005 Paris, France. Studying the distribution of a motif along sequences may help in the understanding of its biological function, or to detect regions of interest. A statistical model is needed to assess the significance of the observed distribution. We propose a heterogenous compound Poisson process to model the possibility of overlap between occurrences and some heterogeneity of the sequence known a priori. The estimation procedure of the parameters is described and tests of homogenous sub-models are proposed. We also consider the detection of rich regions using either cumulated distances or moving intervals, via a homogenization technique. Illustrations of the method are given with applications to bacterial genomes.
|