Hello. Sign in to personalize your visit. New user? Register now.  
Journal of Computational Biology
Efficient Tools for Computing the Number of Breakpoints and the Number of Adjacencies between Two Genomes with Duplicate Genes

To cite this article:
Sébastien Angibaud, Guillaume Fertin, Irena Rusu, Annelyse Thévenin, Stéphane Vialette. Journal of Computational Biology. October 2008, 15(8): 1093-1115. doi:10.1089/cmb.2008.0061.

Full Text: • PDF for printing (1,205.8 KB) • PDF w/ links (1,261 KB)


Sébastien Angibaud 
Laboratoire d'Informatique de Nantes-Atlantique (LINA), UMR CNRS 6241, Université de Nantes, Nantes Cedex, France.
Guillaume Fertin 
Laboratoire d'Informatique de Nantes-Atlantique (LINA), UMR CNRS 6241, Université de Nantes, Nantes Cedex, France.
Irena Rusu 
Laboratoire d'Informatique de Nantes-Atlantique (LINA), UMR CNRS 6241, Université de Nantes, Nantes Cedex, France.
Annelyse Thévenin 
Laboratoire de Recherche en Informatique (LRI), UMR CNRS 8623, Université Paris-Sud, Orsay, France.
Stéphane Vialette 
IGM-LabInfo, UMR CNRS 8049, Université Paris-Est, Marne-la-Vallée, France.

Comparing genomes of different species is a fundamental problem in comparative genomics. Recent research has resulted in the introduction of different measures between pairs of genomes: for example, reversal distance, number of breakpoints, and number of common or conserved intervals. However, classical methods used for computing such measures are seriously compromised when genomes have several copies of the same gene scattered across them. Most approaches to overcome this difficulty are based either on the exemplar model, which keeps exactly one copy in each genome of each duplicated gene, or on the maximum matching model, which keeps as many copies as possible of each duplicated gene. The goal is to find an exemplar matching, respectively a maximum matching, that optimizes the studied measure. Unfortunately, it turns out that, in presence of duplications, this problem for each above-mentioned measure is NP-hard. In this paper, we propose to compute the minimum number of breakpoints and the maximum number of adjacencies between two genomes in presence of duplications using two different approaches. The first one is an exact, generic 0–1 linear programming approach, while the second is a collection of three heuristics. Each of these approaches is applied on each problem and for each of the following models: exemplar, maximum matching and intermediate model, that we introduce here. All these programs are run on a well-known public benchmark dataset of γ-Proteobacteria, and their performances are discussed.

Free first page
All articles
Previous Next