Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
Mais filtros











Intervalo de ano de publicação
1.
Bioinformatics ; 39(3)2023 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-36790056

RESUMO

MOTIVATION: The rank distance model represents genome rearrangements in multi-chromosomal genomes as matrix operations, which allows the reconstruction of parsimonious histories of evolution by rearrangements. We seek to generalize this model by allowing for genomes with different gene content, to accommodate a broader range of biological contexts. We approach this generalization by using a matrix representation of genomes. This leads to simple distance formulas and sorting algorithms for genomes with different gene contents, but without duplications. RESULTS: We generalize the rank distance to genomes with different gene content in two different ways. The first approach adds insertions, deletions and the substitution of a single extremity to the basic operations. We show how to efficiently compute this distance. To avoid genomes with incomplete markers, our alternative distance, the rank-indel distance, only uses insertions and deletions of entire chromosomes. We construct phylogenetic trees with our distances and the DCJ-Indel distance for simulated data and real prokaryotic genomes, and compare them against reference trees. For simulated data, our distances outperform the DCJ-Indel distance using the Quartet metric as baseline. This suggests that rank distances are more robust for comparing distantly related species. For real prokaryotic genomes, all rearrangement-based distances yield phylogenetic trees that are topologically distant from the reference (65% similarity with Quartet metric), but are able to cluster related species within their respective clades and distinguish the Shigella strains as the farthest relative of the Escherichia coli strains, a feature not seen in the reference tree. AVAILABILITY AND IMPLEMENTATION: Code and instructions are available at https://github.com/meidanis-lab/rank-indel. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genômica , Modelos Genéticos , Filogenia , Genoma , Mutação INDEL , Algoritmos
3.
Bull Math Biol ; 78(4): 786-814, 2016 04.
Artigo em Inglês | MEDLINE | ID: mdl-27072561

RESUMO

The genome median problem is an important problem in phylogenetic reconstruction under rearrangement models. It can be stated as follows: Given three genomes, find a fourth that minimizes the sum of the pairwise rearrangement distances between it and the three input genomes. In this paper, we model genomes as matrices and study the matrix median problem using the rank distance. It is known that, for any metric distance, at least one of the corners is a [Formula: see text]-approximation of the median. Our results allow us to compute up to three additional matrix median candidates, all of them with approximation ratios at least as good as the best corner, when the input matrices come from genomes. We also show a class of instances where our candidates are optimal. From the application point of view, it is usually more interesting to locate medians farther from the corners, and therefore, these new candidates are potentially more useful. In addition to the approximation algorithm, we suggest a heuristic to get a genome from an arbitrary square matrix. This is useful to translate the results of our median approximation algorithm back to genomes, and it has good results in our tests. To assess the relevance of our approach in the biological context, we ran simulated evolution tests and compared our solutions to those of an exact DCJ median solver. The results show that our method is capable of producing very good candidates.


Assuntos
Genoma , Modelos Genéticos , Algoritmos , Simulação por Computador , Evolução Molecular , Conceitos Matemáticos , Modelos Estatísticos , Filogenia
4.
Artigo em Inglês | MEDLINE | ID: mdl-23702549

RESUMO

Recently, the Single-Cut-or-Join (SCJ) operation was proposed as a basis for a new rearrangement distance between multichromosomal genomes, leading to very fast algorithms, both in theory and in practice. However, it was not clear how well this new distance fares when it comes to using it to solve relevant problems, such as the reconstruction of evolutionary history. In this paper, we advance current knowledge, by testing SCJ's ability regarding evolutionary reconstruction in two aspects: 1) How well does SCJ reconstruct evolutionary topologies? and 2) How well does SCJ reconstruct ancestral genomes? In the process of answering these questions, we implemented SCJ-based methods, and made them available to the community. We ran experiments using as many as 200 genomes, with as many as 3,000 genes. For the first question, we found out that SCJ can recover typically between 60 percent and more than 95 percent of the topology, as measured through the Robinson-Foulds distance (a.k.a. split distance) between trees. In other words, 60 percent to more than 95 percent of the original splits are also present in the reconstructed tree. For the second question, given a topology, SCJ's ability to reconstruct ancestral genomes depends on how far from the leaves the ancestral is. For nodes close to the leaves, about 85 percent of the gene adjacencies can be recovered. This percentage decreases as we move up the tree, but, even at the root, about 50 percent of the adjacencies are recovered, for as many as 64 leaves. Our findings corroborate the fact that SCJ leads to very conservative genome reconstructions, yielding very few false-positive gene adjacencies in the ancestrals, at the expense of a relatively larger amount of false negatives. In addition, experiments with real data from the Campanulaceae and Protostomes groups show that SCJ reconstructs topologies of quality comparable to the accepted trees of the species involved. As far as time is concerned, the methods we implemented can find a topology for 64 genomes with 2,000 genes each in about 10.7 minutes, and reconstruct the ancestral genomes in a 64-leaf tree in about 3 seconds, both on a typical desktop computer. It should be noted that our code is written in Java and we made no significant effort to optimize it.


Assuntos
Rearranjo Gênico , Genômica/métodos , Modelos Genéticos , Filogenia , Animais , Campanulaceae , Simulação por Computador , Evolução Molecular , Genoma , Software
5.
J Hered ; 103(3): 342-8, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22315242

RESUMO

Cattle are divided into 2 groups referred to as taurine and indicine, both of which have been under strong artificial selection due to their importance for human nutrition. A side effect of this domestication includes a loss of genetic diversity within each specialized breed. Recently, the first taurine genome was sequenced and assembled, allowing for a better understanding of this ruminant species. However, genetic information from indicine breeds has been limited. Here, we present the first genome sequence of an indicine breed (Nellore) generated with 52X coverage by SOLiD sequencing platform. As expected, both genomes share high similarity at the nucleotide level for all autosomes and the X chromosome. Regarding the Y chromosome, the homology was considerably lower, most likely due to uncompleted assembly of the taurine Y chromosome. We were also able to cover 97% of the annotated taurine protein-coding genes.


Assuntos
Bovinos/genética , Genoma , Animais , Cromossomos de Mamíferos/genética , Códon/genética , Mapeamento de Sequências Contíguas , Masculino , Análise de Sequência de DNA , Homologia de Sequência do Ácido Nucleico
6.
Artigo em Inglês | MEDLINE | ID: mdl-21339538

RESUMO

The breakpoint distance is one of the most straightforward genome comparison measures. Surprisingly, when it comes to defining it precisely for multichromosomal genomes with both linear and circular chromosomes, there is more than one way to go about it. Pevzner and Tesler gave a definition in a 2003 paper, Tannier et al. defined it differently in 2008, and in this paper we provide yet another alternative, calling it SCJ for single-cut-or-join, in analogy to the popular double cut and join (DCJ) measure. We show that several genome rearrangement problems, such as median and halving, become easy for SCJ, and provide linear and higher polynomial time algorithms for them. For the multichromosomal linear genome median problem, this is the first polynomial time algorithm described, since for other distances this problem is NP-hard. In addition, we show that small parsimony under SCJ is also easy, and can be solved by a variant of Fitch's algorithm. In contrast, big parsimony is NP-hard under SCJ. This new distance measure may be of value as a speedily computable, first approximation to distances based on more realistic rearrangement models.


Assuntos
Algoritmos , Rearranjo Gênico/genética , Genômica/métodos , Modelos Genéticos , Filogenia
7.
Genet Mol Res ; 5(1): 269-83, 2006 Mar 31.
Artigo em Inglês | MEDLINE | ID: mdl-16755517

RESUMO

Nowadays, there are many phylogeny reconstruction methods, each with advantages and disadvantages. We explored the advantages of each method, putting together the common parts of trees constructed by several methods, by means of a consensus computation. A number of phylogenetic consensus methods are already known. Unfortunately, there is also a taboo concerning consensus methods, because most biologists see them mainly as comparators and not as phylogenetic tree constructors. We challenged this taboo by defining a consensus method that builds a fully resolved phylogenetic tree based on the most common parts of fully resolved trees in a given collection. We also generated results showing that this consensus is in a way a kind of "median" of the input trees; as such it can be closer to the correct tree in many situations.


Assuntos
Algoritmos , Sequência Consenso/genética , Evolução Molecular , Modelos Genéticos , Filogenia , Animais , Análise por Conglomerados , Humanos , Software
8.
J Comput Biol ; 9(5): 743-5, 2002.
Artigo em Inglês | MEDLINE | ID: mdl-12487761

RESUMO

One possible model to study genome evolution is to represent genomes as permutations of genes and compute distances based on the minimum number of certain operations (rearrangements) needed to transform one permutation into another. Under this model, the shorter the distance, the closer the genomes are. Two operations that have been extensively studied are the reversal and the transposition. A reversal is an operation that reverses the order of the genes on a certain portion of the permutation. A transposition is an operation that "cuts" a certain portion of the permutation and "pastes" it elsewhere in the same permutation. In this note, we show that the reversal and transposition distance of the signed permutation pi(n) = (-1 -2.-(n - 1)-n) with respect to the identity is left floor n/2 right floor + 2 for all n>or=3. We conjecture that this value is the diameter of the permutation group under these operations.


Assuntos
Evolução Molecular , Genoma , Modelos Genéticos , Biologia Computacional/métodos , Genes , Matemática
9.
Microbiol Mol Biol Rev ; 66(2): 272-99, 2002 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-12040127

RESUMO

The transport systems of the first completely sequenced genome of a plant parasite, Xylella fastidiosa, were analyzed. In all, 209 proteins were classified here as constitutive members of transport families; thus, we have identified 69 new transporters in addition to the 140 previously annotated. The analysis lead to several hints on potential ways of controlling the disease it causes on citrus trees. An ADP:ATP translocator, previously found in intracellular parasites only, was found in X. fastidiosa. A P-type ATPase is missing-among the 24 completely sequenced eubacteria to date, only three (including X. fastidiosa) do not have a P-type ATPase, and they are all parasites transmitted by insect vectors. An incomplete phosphotransferase system (PTS) was found, without the permease subunits-we conjecture either that they are among the hypothetical proteins or that the PTS plays a solely metabolic regulatory role. We propose that the Ttg2 ABC system might be an import system eventually involved in glutamate import rather than a toluene exporter, as previously annotated. X. fastidiosa exhibits fewer proteins with > or =4 alpha-helical transmembrane spanners than any other completely sequenced prokaryote to date. X. fastidiosa has only 2.7% of all open reading frames identifiable as major transporters, which puts it as the eubacterium having the lowest percentage of open reading frames involved in transport, closer to two archaea, Methanococcus jannaschii (2.4%) and Methanobacterium thermoautotrophicum (2.4%).


Assuntos
Proteínas de Bactérias/genética , Proteínas de Transporte/genética , Gammaproteobacteria/genética , Gammaproteobacteria/metabolismo , Proteínas da Membrana Bacteriana Externa/genética , Proteínas da Membrana Bacteriana Externa/metabolismo , Proteínas de Bactérias/metabolismo , Transporte Biológico Ativo , Proteínas de Transporte/metabolismo , Gammaproteobacteria/patogenicidade , Genoma Bacteriano , Plantas/microbiologia
10.
Genet. mol. biol ; 24(1/4): 9-15, 2001. ilus
Artigo em Inglês | LILACS | ID: lil-313867

RESUMO

O projeto SUCEST (Sugarcane EST Project) produziu 291.904 ESTs de cana-de-açúcar. Nesse projeto, o Laboratório de Bioinformática criou o web site que foi o "ponto de encontro" dos 74 laboratórios de sequenciamento e data mining que fizeram parte do consórcio para o projeto. O Laboratório de Bioinformática (LBI) recebeu, processou, analisou e disponibilizou ferramentas para a exploraçäo dos dados. Neste artigo os dados, serviços e programas implementados pelo LBI para o projeto säo descritos, incluindo o procedimento de clustering que gerou 43.141 clusters.


Assuntos
Biologia Computacional , Etiquetas de Sequências Expressas , Análise por Conglomerados , Biblioteca Gênica , Plantas , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA