UNRAVELING THE SHORT AND LONG-READS SEQUENCING TECHNOLOGIES INTO LONG NON-CODING RNA COMPUTATIONAL METHODS

Published in 26/04/2022 - ISBN: 978-65-5941-645-5

Paper Title
UNRAVELING THE SHORT AND LONG-READS SEQUENCING TECHNOLOGIES INTO LONG NON-CODING RNA COMPUTATIONAL METHODS
Authors
  • Alisson Gaspar Chiquitto
  • Lucas Otávio Leme SIlva
  • Douglas Silva Domingues
  • ALEXANDRE ROSSI PASCHOAL
Modality
Xpress presentation
Subject area
Omics
Publishing Date
26/04/2022
Country of Publishing
Brasil
Language of Publishing
Inglês
Paper Page
https://www.even3.com.br/anais/xmeetingxp2021/410181-unraveling-the-short-and-long-reads-sequencing-technologies-into-long-non-coding-rna-computational-methods
ISBN
978-65-5941-645-5
Keywords
Non-coding RNAs, high-throughput sequencing technologies, coding, methods, benchmarking
Summary
Next-generation sequencing technologies (NGS) opened up a new era in omics analysis. For example, novel long-read technologies bring the potential to improve the quality of transcriptome annotation. Consequently, long non-coding RNAs (lncRNA) are probably the most benefited class of transcripts that would have improved annotation using this technique. However, there is a lack of reports that evaluate the benefit of these novel NGS data for lncRNA identification. Our contribution was to address two real-world cases: can computational methods of lncRNA identification be directly used in long-reads to make a more direct identification of these transcripts? Are they also able to efficiently identify plant lncRNAs, where most computational approaches are not well trained? For this, we used 8 human genomic datasets, 17 plant datasets, and 15 tools for lncRNA identification. We calculated sensitivity (the proportion of lncRNA correctly identified as lncRNA) for the cases we had lncRNA catalogs. We used as lncRNA catalogs GENCODE (v21 and v38) for human data, and for plants we used CANTATAdb 2.0 and NONCODE. The computational methods RNAmining and PLEK obtained the best sensitivity values when tested in GENCODE v21 (short-read based). Meanwhile, the best sensitivity values for GENCODE v38 (long-read based) dataset were achieved by RNAmining and LncADeep. By calculating the average of the two GENCODE versions (v21 and v38), we found that the five tools with the best average sensitivity are RNAmining, CNCI, LncADeep, lncRNAnet, and PLEK. Considering all analyses, RNAmining was the tool with the best sensitivity for GENCODE data. In human long-read IsoDB (PacBio based) and Nanopore datasets, the highest number of lncRNA sequences was identified by the tools CNCI and lncADeep, respectively. For the short-reads lncRNA datasets in plants (Cantatadb 2.0 and NONCODE), RNAplonc, PlncPro, lncMachine and CPC2 tools achieved the best results with over 90% of sensitivity. In particular, CPC2 and RNAplonc were the tools that most identified lncRNAs in long-reads in general. These results indicate that the combination of these two tools has considerable potential for applications in short and long read technologies due to their consistent results across a wide range of plant taxa. Crema presented the less reliable results. In summary, we present a benchmarking of lncRNA predictors with human and plant data to analyze the impact of short and long reading sequencing on the identification of lncRNAs.
Title of the Event
X-Meeting XPerience 2021
Title of the Proceedings of the event
X-Meeting presentations
Name of the Publisher
Even3
Means of Dissemination
Meio Digital

How to cite

CHIQUITTO, Alisson Gaspar et al.. UNRAVELING THE SHORT AND LONG-READS SEQUENCING TECHNOLOGIES INTO LONG NON-CODING RNA COMPUTATIONAL METHODS.. In: X-Meeting presentations. Anais...São Paulo(SP) AB3C, 2021. Available in: https//www.even3.com.br/anais/xmeetingxp2021/410181-UNRAVELING-THE-SHORT-AND-LONG-READS-SEQUENCING-TECHNOLOGIES-INTO-LONG-NON-CODING-RNA-COMPUTATIONAL-METHODS. Access in: 22/06/2025

Paper

Even3 Publicacoes