Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 51
Filtrar
Mais filtros











Intervalo de ano de publicação
1.
Theor Appl Genet ; 134(11): 3577-3594, 2021 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-34365519

RESUMO

KEY MESSAGE: We propose to use the natural variation between individuals of a population for genome assembly scaffolding. In today's genome projects, multiple accessions get sequenced, leading to variant catalogs. Using such information to improve genome assemblies is attractive both cost-wise as well as scientifically, because the value of an assembly increases with its contiguity. We conclude that haplotype information is a valuable resource to group and order contigs toward the generation of pseudomolecules. Quinoa (Chenopodium quinoa) has been under cultivation in Latin America for more than 7500 years. Recently, quinoa has gained increasing attention due to its stress resistance and its nutritional value. We generated a novel quinoa genome assembly for the Bolivian accession CHEN125 using PacBio long-read sequencing data (assembly size 1.32 Gbp, initial N50 size 608 kbp). Next, we re-sequenced 50 quinoa accessions from Peru and Bolivia. This set of accessions differed at 4.4 million single-nucleotide variant (SNV) positions compared to CHEN125 (1.4 million SNV positions on average per accession). We show how to exploit variation in accessions that are distantly related to establish a genome-wide ordered set of contigs for guided scaffolding of a reference assembly. The method is based on detecting shared haplotypes and their expected continuity throughout the genome (i.e., the effect of linkage disequilibrium), as an extension of what is expected in mapping populations where only a few haplotypes are present. We test the approach using Arabidopsis thaliana data from different populations. After applying the method on our CHEN125 quinoa assembly we validated the results with mate-pairs, genetic markers, and another quinoa assembly originating from a Chilean cultivar. We show consistency between these information sources and the haplotype-based relations as determined by us and obtain an improved assembly with an N50 size of 1079 kbp and ordered contig groups of up to 39.7 Mbp. We conclude that haplotype information in distantly related individuals of the same species is a valuable resource to group and order contigs according to their adjacency in the genome toward the generation of pseudomolecules.


Assuntos
Chenopodium quinoa/genética , Variação Genética , Genoma de Planta , Arabidopsis/genética , Bolívia , Chile , Mapeamento de Sequências Contíguas , Marcadores Genéticos , Genética Populacional , Haplótipos , Peru
2.
BMC Genomics ; 21(1): 148, 2020 Feb 11.
Artigo em Inglês | MEDLINE | ID: mdl-32046653

RESUMO

BACKGROUND: RNA-Seq is the preferred method to explore transcriptomes and to estimate differential gene expression. When an organism has a well-characterized and annotated genome, reads obtained from RNA-Seq experiments can be directly mapped to that genome to estimate the number of transcripts present and relative expression levels of these transcripts. However, for unknown genomes, de novo assembly of RNA-Seq reads must be performed to generate a set of contigs that represents the transcriptome. These contig sets contain multiple transcripts, including immature mRNAs, spliced transcripts and allele variants, as well as products of close paralogs or gene families that can be difficult to distinguish. Thus, tools are needed to select a set of less redundant contigs to represent the transcriptome for downstream analyses. Here we describe the development of Compacta to produce contig sets from de novo assemblies. RESULTS: Compacta is a fast and flexible computational tool that allows selection of a representative set of contigs from de novo assemblies. Using a graph-based algorithm, Compacta groups contigs into clusters based on the proportion of shared reads. The user can determine the minimum coverage of the contigs to be clustered, as well as a threshold for the proportion of shared reads in the clustered contigs, thus providing a dynamic range of transcriptome compression that can be adapted according to experimental aims. We compared the performance of Compacta against state of the art clustering algorithms on assemblies from Arabidopsis, mouse and mango, and found that Compacta yielded more rapid results and had competitive precision and recall ratios. We describe and demonstrate a pipeline to tailor Compacta parameters to specific experimental aims. CONCLUSIONS: Compacta is a fast and flexible algorithm for the determination of optimum contig sets that represent the transcriptome for downstream analyses.


Assuntos
Mapeamento de Sequências Contíguas/métodos , RNA-Seq/métodos , Software , Algoritmos , Arabidopsis/genética , Análise por Conglomerados
3.
Gigascience ; 8(12)2019 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-31782791

RESUMO

BACKGROUND: Sugarcane cultivars are polyploid interspecific hybrids of giant genomes, typically with 10-13 sets of chromosomes from 2 Saccharum species. The ploidy, hybridity, and size of the genome, estimated to have >10 Gb, pose a challenge for sequencing. RESULTS: Here we present a gene space assembly of SP80-3280, including 373,869 putative genes and their potential regulatory regions. The alignment of single-copy genes in diploid grasses to the putative genes indicates that we could resolve 2-6 (up to 15) putative homo(eo)logs that are 99.1% identical within their coding sequences. Dissimilarities increase in their regulatory regions, and gene promoter analysis shows differences in regulatory elements within gene families that are expressed in a species-specific manner. We exemplify these differences for sucrose synthase (SuSy) and phenylalanine ammonia-lyase (PAL), 2 gene families central to carbon partitioning. SP80-3280 has particular regulatory elements involved in sucrose synthesis not found in the ancestor Saccharum spontaneum. PAL regulatory elements are found in co-expressed genes related to fiber synthesis within gene networks defined during plant growth and maturation. Comparison with sorghum reveals predominantly bi-allelic variations in sugarcane, consistent with the formation of 2 "subgenomes" after their divergence ∼3.8-4.6 million years ago and reveals single-nucleotide variants that may underlie their differences. CONCLUSIONS: This assembly represents a large step towards a whole-genome assembly of a commercial sugarcane cultivar. It includes a rich diversity of genes and homo(eo)logous resolution for a representative fraction of the gene space, relevant to improve biomass and food production.


Assuntos
Mapeamento de Sequências Contíguas/métodos , Glucosiltransferases/genética , Fenilalanina Amônia-Liase/genética , Saccharum/crescimento & desenvolvimento , Biomassa , Produtos Agrícolas/genética , Produtos Agrícolas/crescimento & desenvolvimento , Variação Genética , Tamanho do Genoma , Genoma de Planta , Família Multigênica , Proteínas de Plantas/genética , Poliploidia , Regiões Promotoras Genéticas , Saccharum/genética
4.
Mol Phylogenet Evol ; 135: 193-202, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-30914393

RESUMO

Holoparasitism has led to extreme plastome reduction. Plastomes in the legume holoparasite Pilostyles (Apodanthaceae) are the most reduced in both size and gene content known so far in Embryophytes. Here, we found that the Pilostyles boyacensis plastome, the only American species sequenced so far, is reduced to seven functional genes, accD, rpl2, rrn16 (=16S), rrn23 (=23S), rps3, rps12 and a putative oxidoreductase (PbOx). An additional gene, not annotated in the genome, is actively transcribed between accD and rps12, and by synteny we predict corresponds to rps4. We present data on plastome assembly, transcriptomic data that confirm the transcriptional activity of all genes and describe for the first time six transcript variants of a putative ORF likely having oxidoreductase activity. Our data show that such extreme reduction in P. boyacensis is similar but not identical to that reported in one Australian and one African species of the genus. Such intercontinental similarity suggests that the legume-Pilostyles holoparasitism was already in place during the main African-Australian-South American break-up. We compare plastome content and synteny between the three sequenced species, perform phylogenetic analyses across angiosperms of the six annotated plastome genes, and discuss the odd phylogenetic affinities of 16S and 23S, likely caused by HGT prior the diversification of both legumes and Pilostyles.


Assuntos
Genes de Plantas , Genomas de Plastídeos/genética , Magnoliopsida/genética , África , Sequência de Aminoácidos , Austrália , Sequência de Bases , Mapeamento de Sequências Contíguas , Anotação de Sequência Molecular , Filogenia , Sintenia/genética , Transcrição Gênica
5.
Gene ; 691: 96-105, 2019 Apr 05.
Artigo em Inglês | MEDLINE | ID: mdl-30630096

RESUMO

Vriesea carinata is an endemic bromeliad from the Brazilian Atlantic Forest. It has trichome and tank system in their leaves which allows to absorb water and nutrients. It belongs to Bromeliaceae family, which includes several species highly enriched of cysteine-proteases (CysPs). These proteolytic enzymes regulate processes as senescence, cell differentiation, pathogen-linked programmed cell death and mobilization of proteins. Although, their biological importance, there are not genomic resources in V. carinata that can help to identify and understand their molecular mechanisms involved in different biological processes. Thus high-throughput transcriptome sequencing of V. carinata is necessary to generate sequences for the purpose of gene discovery and functional genomic studies. In the present study, we sequenced and assembled the V. carinata transcriptome to the identification of CysPs. A total of 43,232 contigs were assembled for the leaf tissue. BLAST analysis indicated that 23,803 contigs exhibited similarity to non-redundant Viridiplantae proteins. 28.24% of the contigs were classified into the COG database, and gene ontology categorized them into 61 functional groups. A metabolic pathway analysis with KEGG revealed 9679 contigs assigned to 31 metabolic pathways. Among 16 full-length CysPs identified, 11 were evaluated in respect to their expression patterns in the leaf apex, base and inflorescence tissues. The results showed differential expression levels of legumain, metacaspase, pyroglutamyl and papain-like CysPs depending of the leaf region. These results provide a global overview of V. carinata gene functions and expression activities of CysPs in those tissues.


Assuntos
Bromeliaceae/genética , Mapeamento de Sequências Contíguas/métodos , Cisteína Proteases/genética , Perfilação da Expressão Gênica/métodos , Regulação da Expressão Gênica de Plantas , Sequenciamento de Nucleotídeos em Larga Escala , Redes e Vias Metabólicas , Anotação de Sequência Molecular , Família Multigênica , Folhas de Planta/genética , Proteínas de Plantas/genética , Análise de Sequência de RNA
6.
Gene ; 654: 23-35, 2018 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-29425825

RESUMO

Retinoic acid receptors (RAR) and retinoid X receptors (RXR) are ligand-mediated transcription factors that synchronize intricate signaling networks in metazoans. Dimer formation between these two nuclear receptors mediates the recruitment of co-regulatory complexes coordinating the progression of signaling cascades during developmental and regenerative events. In the present study we identified and characterized the receptors for retinoic acid in the sea cucumber Holothuria glaberrima; a model system capable of regenerative organogenesis during adulthood. Molecular characterizations revealed the presence of three isoforms of RAR and two of RXR as a consequence of alternative splicing events. Various analyses including: primary structure sequencing, phylogenetic analysis, protein domain prediction, and multiple sequence alignment further confirmed their identity. Semiquantitative reverse transcription PCR analysis of each receptor isoform herein identified showed that the retinoid receptors are expressed in all tissues sampled: the mesenteries, respiratory trees, muscles, gonads, and the digestive tract. During regenerative organogenesis two of the receptors (RAR-L and RXR-T) showed differential expression in the posterior segment while RAR-S is differentially expressed in the anterior segment of the intestine. This work presents the first description of the components relaying the signaling for retinoic acid within this model system.


Assuntos
Perfilação da Expressão Gênica , Holothuria/fisiologia , Intestinos/fisiologia , Receptores do Ácido Retinoico/metabolismo , Processamento Alternativo , Animais , Biologia Computacional , Mapeamento de Sequências Contíguas , DNA Complementar/metabolismo , Regulação da Expressão Gênica , Holothuria/genética , Fases de Leitura Aberta , Filogenia , Regeneração , Receptores X de Retinoides/metabolismo , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Análise de Sequência de DNA , Transdução de Sinais
7.
Gigascience ; 7(2)2018 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-29267857

RESUMO

Background: For more than 25 years, the golden mussel, Limnoperna fortunei, has aggressively invaded South American freshwaters, having travelled more than 5000 km upstream across 5 countries. Along the way, the golden mussel has outcompeted native species and economically harmed aquaculture, hydroelectric powers, and ship transit. We have sequenced the complete genome of the golden mussel to understand the molecular basis of its invasiveness and search for ways to control it. Findings: We assembled the 1.6-Gb genome into 20 548 scaffolds with an N50 length of 312 Kb using a hybrid and hierarchical assembly strategy from short and long DNA reads and transcriptomes. A total of 60 717 coding genes were inferred from a customized transcriptome-trained AUGUSTUS run. We also compared predicted protein sets with those of complete molluscan genomes, revealing an exacerbation of protein-binding domains in L. fortunei. Conclusions: We built one of the best bivalve genome assemblies available using a cost-effective approach using Illumina paired-end, mate-paired, and PacBio long reads. We expect that the continuous and careful annotation of L. fortunei's genome will contribute to the investigation of bivalve genetics, evolution, and invasiveness, as well as to the development of biotechnological tools for aquatic pest control.


Assuntos
Mapeamento de Sequências Contíguas/métodos , Genoma , Espécies Introduzidas , Mytilidae/genética , Proteínas/genética , Transcriptoma , Animais , Brasil , Ontologia Genética , Sequenciamento de Nucleotídeos em Larga Escala , Anotação de Sequência Molecular , Mytilidae/classificação , Fases de Leitura Aberta , Controle de Pragas , Filogenia , Domínios e Motivos de Interação entre Proteínas , Proteínas/metabolismo
8.
J Comput Biol ; 25(2): 194-199, 2018 02.
Artigo em Inglês | MEDLINE | ID: mdl-29039688

RESUMO

The development of next-generation sequencing platforms increased substantially the capacity of data generation. In addition, in the past years, the costs for whole genome sequencing have been reduced that made it easier to access this technology. As a result, the storage and analysis of the data generated became a challenge, ushering in the development of bioinformatic tools, such as programs and programming languages, able to store, process, and analyze this huge amount of information. In this article, we present MELC genomics, a framework for genome assembly in a simple and fast workflow.


Assuntos
Mapeamento de Sequências Contíguas/métodos , Genômica/métodos , Software , Sequenciamento Completo do Genoma/métodos , Animais , Humanos
9.
BMC Genomics ; 18(1): 204, 2017 02 27.
Artigo em Inglês | MEDLINE | ID: mdl-28241794

RESUMO

BACKGROUND: The parasite Echinococcus canadensis (G7) (phylum Platyhelminthes, class Cestoda) is one of the causative agents of echinococcosis. Echinococcosis is a worldwide chronic zoonosis affecting humans as well as domestic and wild mammals, which has been reported as a prioritized neglected disease by the World Health Organisation. No genomic data, comparative genomic analyses or efficient therapeutic and diagnostic tools are available for this severe disease. The information presented in this study will help to understand the peculiar biological characters and to design species-specific control tools. RESULTS: We sequenced, assembled and annotated the 115-Mb genome of E. canadensis (G7). Comparative genomic analyses using whole genome data of three Echinococcus species not only confirmed the status of E. canadensis (G7) as a separate species but also demonstrated a high nucleotide sequences divergence in relation to E. granulosus (G1). The E. canadensis (G7) genome contains 11,449 genes with a core set of 881 orthologs shared among five cestode species. Comparative genomics revealed that there are more single nucleotide polymorphisms (SNPs) between E. canadensis (G7) and E. granulosus (G1) than between E. canadensis (G7) and E. multilocularis. This result was unexpected since E. canadensis (G7) and E. granulosus (G1) were considered to belong to the species complex E. granulosus sensu lato. We described SNPs in known drug targets and metabolism genes in the E. canadensis (G7) genome. Regarding gene regulation, we analysed three particular features: CpG island distribution along the three Echinococcus genomes, DNA methylation system and small RNA pathway. The results suggest the occurrence of yet unknown gene regulation mechanisms in Echinococcus. CONCLUSIONS: This is the first work that addresses Echinococcus comparative genomics. The resources presented here will promote the study of mechanisms of parasite development as well as new tools for drug discovery. The availability of a high-quality genome assembly is critical for fully exploring the biology of a pathogenic organism. The E. canadensis (G7) genome presented in this study provides a unique opportunity to address the genetic diversity among the genus Echinococcus and its particular developmental features. At present, there is no unequivocal taxonomic classification of Echinococcus species; however, the genome-wide SNPs analysis performed here revealed the phylogenetic distance among these three Echinococcus species. Additional cestode genomes need to be sequenced to be able to resolve their phylogeny.


Assuntos
Equinococose/genética , Echinococcus/genética , Genoma de Protozoário , Animais , Proteínas Argonautas/antagonistas & inibidores , Proteínas Argonautas/genética , Proteínas Argonautas/metabolismo , Hibridização Genômica Comparativa , Mapeamento de Sequências Contíguas , Ilhas de CpG , Metilação de DNA , Equinococose/parasitologia , Equinococose/patologia , Echinococcus/classificação , Echinococcus/metabolismo , Humanos , Sequências Repetitivas Dispersas/genética , Filogenia , Polimorfismo de Nucleotídeo Único , Proteínas de Protozoários/antagonistas & inibidores , Proteínas de Protozoários/genética , Proteínas de Protozoários/metabolismo
10.
BMC Genomics ; 18(1): 6, 2017 01 03.
Artigo em Inglês | MEDLINE | ID: mdl-28049478

RESUMO

BACKGROUND: Wolbachia is a bacterial endosymbiont that naturally infects a wide range of insect species, and causes drastic changes to host biology. Stable infections of Wolbachia in mosquitoes can inhibit infection with medically important pathogens such as dengue virus and malaria-causing Plasmodium parasites. However, some native Wolbachia strains can enhance infection with certain pathogens, as is the case for the mosquito Aedes fluviatilis, where infection with Plasmodium gallinaceum is enhanced by the native wFlu Wolbachia strain. To better understand the biological interactions between mosquitoes and native Wolbachia infections, and to investigate the process of pathogen enhancement, we used RNA-Seq to generate the transcriptome of Ae. fluviatilis with and without Wolbachia infection. RESULTS: In total, we generated 22,280,160 Illumina paired-end reads from Wolbachia-infected and uninfected mosquitoes, and used these to make a de novo transcriptome assembly, resulting in 58,013 contigs with a median sequence length of 443 bp and an N50 of 2454 bp. Contigs were annotated through local alignments using BlastX, and associated with both gene ontology and KEGG orthology terms. Through baySeq, we identified 159 contigs that were significantly upregulated due to Wolbachia infection, and 98 that were downregulated. Critically, we saw no changes to Toll or IMD immune gene transcription, but did see evidence that wFlu infection altered the expression of several bacterial recognition genes, and immune-related genes that could influence Plasmodium infection. wFlu infection also had a widespread effect on a number of host physiological processes including protein, carbohydrate and lipid metabolism, and oxidative stress. We then compared our data set with transcriptomic data for other Wolbachia infections in Aedes aegypti, and identified a core set of 15 gene groups associated with Wolbachia infection in mosquitoes. CONCLUSIONS: While the scale of transcriptional changes associated with wFlu infection might be small, the scope is rather large, which confirms that native Wolbachia infections maintain intricate molecular relationships with their mosquito hosts even after lengthy periods of co-evolution. We have also identified several potential means through which wFlu infection might influence Plasmodium infection in Ae. fluviatilis, and these genes should form the basis of future investigation into the enhancement of Plasmodium by Wolbachia.


Assuntos
Aedes/genética , Aedes/microbiologia , Perfilação da Expressão Gênica , Interações Hospedeiro-Patógeno/genética , Transcriptoma , Wolbachia , Animais , Biologia Computacional/métodos , Mapeamento de Sequências Contíguas , Regulação da Expressão Gênica , Ontologia Genética , Sequenciamento de Nucleotídeos em Larga Escala , Anotação de Sequência Molecular
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA