BENCHMARKING AND IMPROVING PLANT GENOME ANNOTATION THROUGH AN COMBINATION OF BIOINFORMATICS TOOLS

Published in 08/11/2023 - ISBN: 978-65-272-0061-1

Paper Title
BENCHMARKING AND IMPROVING PLANT GENOME ANNOTATION THROUGH AN COMBINATION OF BIOINFORMATICS TOOLS
Authors
  • Héctor Oberti
  • Paulo Aecyo Francisco da Silva
  • RAFAELY PANTOJA OLIVEIRA
  • João Victor dos Anjos Almeida
  • Francisco Prosdocimi
  • Alessandro Varani
Modality
Poster
Subject area
DNA and Genomics
Publishing Date
08/11/2023
Country of Publishing
Brazil | Brasil
Language of Publishing
Inglês
Paper Page
https://www.even3.com.br/anais/xmeeting2023/642609-benchmarking-and-improving-plant-genome-annotation-through-an-combination-of-bioinformatics-tools
ISBN
978-65-272-0061-1
Keywords
comparative genomics, transposable elements, gene structure, genome topography
Summary
Since the publication of the Arabidopsis thaliana genome sequence in 2000, plant genomes have become recognized for their complexity and size variability. Plant genomes often present varying degrees of ploidy due to distinct whole-genome duplication (WGD) events, frequently succeeded by fractionation events, posing significant challenges for assembly. This complexity, along with the presence of pseudogenes, gene family expansions, and extensive transposable elements (TEs), makes achieving accurate annotation also a significant challenge. To address this, we developed a pipeline to enhance the accuracy of structural and functional plant genome annotations. Our pipeline, integrates and optimizes existing tools to create comprehensive annotations using protein and RNA-seq data  (generated by either short or long read sequencing technologies). The initial phase uses a modified version of the Extensive de novo TE Annotator (EDTA) to annotate TEs. This step annotates autonomous elements (Gypsy and Copia along with a full characterization of all their respective evolutionary lineages) and non-autonomous LTR elements (e.g., LARD, TRIM, TR-GAG, and BARE-2), in addition to an accurated detection of TIRs, Helitrons, LINE, and SINE elements, resulting in enhanced TE annotation. This pipeline also allows to date the insertion time of LTR elements in the genomes and to investigate the evolutionary history of those elements by applying a phylogenetic approach, and generates a softmasked genome for structural gene annotation. The second stage generates multiple protein and transcriptome evidences using: BRAKER, Exonerate, GALBA, GeMoMa, GenomeThreader, PASA, StringTie2 and TransDecoder, all benchmarked by the BUSCO. These evidences are integrated using EVidenceModeler and further processed in PASA to add UTR annotations and models for alternatively spliced isoforms. The structural annotation is complemented with gene topology information (e.g., singleton, dispersed, proximal and tandem duplicated, WGD-derived, and transposed) and gene conversion events, generated by MCScanX, along with retrocopy identification by RetroScan. Functional annotation is enhanced through a BLAST search against public databases such as UniProt, NCBI RefSeq, PlantTFDB, NLR Atlas and PRGdb. EggNOG-mapper and InterProScan provide additional functional annotations, including Gene Ontologies, Enzyme Commission numbers (ECs), Carbohydrate-Active enZYmes (CAZymes), PFAM, and InterPro IDs for each gene. Metabolic gene clusters are predicted using PlantiSMASH. The final annotation output includes annotated GFF3 and FASTA nucleotide and protein files, visualizable using the JBrowse2 tool. In order to benchmark our pipeline, we applied it to the genomes of two endemic plants from the Amazonian forest: Theobroma grandiflorum and Banisteriopsis caapi, two plants that hold significant cultural, commercial and biotechnological importance. Around 20% of both genomes are composed of potentially active, full-length TEs, increasing to 60% when considering TE fragments. We also revealed the relative abundance of different evolutionary lineages from LTR within the Gypsy and Copia families, illustrating the positioning of Gypsy chromovirus (CRM) elements in proximity to pericentromeric regions. The structural gene annotation of both species achieved a BUSCO completeness score of above 99%, indicating high-quality annotation. The development and successful application of our pipeline highlight its potential to improve the accuracy of plant genome annotation, addressing a significant challenge in the field of plant genomics.
Title of the Event
X-Meeting / BSB 2023
City of the Event
Curitiba
Title of the Proceedings of the event
X-Meeting presentations
Name of the Publisher
Even3
Means of Dissemination
Meio Digital

How to cite

OBERTI, Héctor et al.. BENCHMARKING AND IMPROVING PLANT GENOME ANNOTATION THROUGH AN COMBINATION OF BIOINFORMATICS TOOLS.. In: X-Meeting presentations. Anais...Curitiba(PR) Campus da indústria, 2023. Available in: https//www.even3.com.br/anais/xmeeting2023/642609-BENCHMARKING-AND-IMPROVING-PLANT-GENOME-ANNOTATION-THROUGH-AN-COMBINATION-OF-BIOINFORMATICS-TOOLS. Access in: 22/05/2025

Paper

Even3 Publicacoes