AN AUTOMATED AND INTEGRATIVE PIPELINE FOR SURVEILLANCE OF VIRUSES IN POLLINATORS USING PUBLIC DATA

Published in 08/11/2023 - ISBN: 978-65-272-0061-1

DOI
10.29327/1331270.2-4  
Paper Title
AN AUTOMATED AND INTEGRATIVE PIPELINE FOR SURVEILLANCE OF VIRUSES IN POLLINATORS USING PUBLIC DATA
Authors
  • Vinícius Castro Santos
  • Aristóteles Góes Neto
  • Eric Roberto Guimarães Rocha Aguiar
Modality
Poster
Subject area
RNA and transcriptomics
Publishing Date
08/11/2023
Country of Publishing
Brazil | Brasil
Language of Publishing
Inglês
Paper Page
https://www.even3.com.br/anais/xmeeting2023/635111-an-automated-and-integrative-pipeline-for-surveillance-of-viruses-in-pollinators-using-public-data
ISBN
978-65-272-0061-1
Keywords
assembly, bees, mapping, metatranscriptomics, phylogenetics, RNA-seq
Summary
Insects, particularly bees, play a critical role in pollinating 90% of angiosperms and around 9.5% of global agricultural production. To better understand the interactions between pollinators and viruses or other microbiome organisms, several research projects have been initiated with the aim of identifying the microbiome and RNA viruses in pollinators through bioinformatics analysis. However, despite the availability of numerous pipelines to analyze metatranscriptomics-derived data, none of them provide tools to characterize novel viruses or perform automated phylogenetic analyses in organisms other than humans. This creates difficulties for researchers who lack bioinformatics expertise. To address this issue, our project proposes the development of integrated, modular, and automated pipelines that can analyze large RNA-seq datasets from Illumina NGS of bees and other pollinators. Our strategy takes advantage of multiple software and parameters that allow the customized execution of sensitive steps for virus identification, such as the mapping and assembly processes. The pipeline is currently being developed using the R, Python, and Bash programming languages and executed on the GNU/Linux system. The workflow consists of seven phases, including preprocessing, mapping, de novo assembly, taxonomic and functional annotation, quantification, and data visualization. The pipeline can automatically download public databases from the NCBI’s Sequence Read Archive (SRA) or take users’ sequencing data as input. The quality of the FASTQ files is checked using the Fastq tool and filtered using Fastp, Cutadapt or Trimommatic to remove adapters or low-quality reads. The mapping process and removal of host transcripts can be performed using Bowtie2, STAR, or HISAT2, and unaligned data is further assembled using a specific assembler (e.g. Trinity, SPAdes, Oases) or an integrative assembling strategy, which takes advantage of different tools to improve the quality and completeness of the assembled transcripts. Users who are solely interested in viral sequences can map them against the HoloBee-mop database of non-viral honeybee sequences to enhance the assembly of bee RNA-seq data. However, it is also feasible to identify the complete microbiome. The assembled data is then clustered using the CD-HIT-EST tool, which removes chimeric and redundant sequences. Taxonomic annotation is made using the DIAMOND Blastx tool with the NCBI’s non-redundant protein database (NR), and the full taxonomy of each species is retrieved using the Taxonkit software. Candidate viral transcripts are also annotated using the Blastn tool with the NCBI’s non-redundant nucleotide database (NT), ORFs are predicted using orfipy software, and conserved domains are determined using HMMER with the Pfam database. The abundance of viral sequences is quantified using Salmon. The output of the process includes TSV files with metagenomics and viral data, a FASTA file with viral sequences, and donut, bar, radar, and heatmap graphs. The next steps of the project involve adding phylogenetic analyses for RNA viruses, functional annotation, and implementing a scalable and programmable active search module in public databases of NCBI next-generation sequencing data (SRA) coming from bee species for surveillance purposes. The pipeline will also be tested using public data, and different software and strategies will be tested for the mapping and assembly processes.
Title of the Event
X-Meeting / BSB 2023
City of the Event
Curitiba
Title of the Proceedings of the event
X-Meeting presentations
Name of the Publisher
Even3
Means of Dissemination
Meio Digital
DOI

How to cite

SANTOS, Vinícius Castro; NETO, Aristóteles Góes; AGUIAR, Eric Roberto Guimarães Rocha. AN AUTOMATED AND INTEGRATIVE PIPELINE FOR SURVEILLANCE OF VIRUSES IN POLLINATORS USING PUBLIC DATA.. In: X-Meeting presentations. Anais...Curitiba(PR) Campus da indústria, 2023. Available in: https//www.even3.com.br/anais/xmeeting2023/635111-AN-AUTOMATED-AND-INTEGRATIVE-PIPELINE-FOR-SURVEILLANCE-OF-VIRUSES-IN-POLLINATORS-USING-PUBLIC-DATA. Access in: 12/09/2025

Paper

Even3 Publicacoes