NANOPORE DIRECT RNA SEQUENCING REVEALS A MYRIAD OF ANTISENSE, INTERGENIC AND ALTERNATIVELY SPLICED ISOFORMS OF LONG NON-CODING RNAS

Published in 21/11/2024 - ISBN: 978-65-272-0843-3

Paper Title
NANOPORE DIRECT RNA SEQUENCING REVEALS A MYRIAD OF ANTISENSE, INTERGENIC AND ALTERNATIVELY SPLICED ISOFORMS OF LONG NON-CODING RNAS
Authors
  • Maria Bárbara Borges de Santana
  • Sebastián Urquiza-Zurich
  • Laura Shimohara Bradaschia
  • Lucas de Freitas Lacerda
  • Sergio Lavandero
  • Thaís Gaudencio do Rêgo
  • Vinicius Maracaja Coutinho
Modality
Poster
Subject area
RNA and transcriptomics
Publishing Date
21/11/2024
Country of Publishing
Brazil | Brasil
Language of Publishing
Inglês
Paper Page
https://www.even3.com.br/anais/xmeeting-2024/837271-nanopore-direct-rna-sequencing-reveals-a-myriad-of-antisense-intergenic-and-alternatively-spliced-isoforms-of-lo
ISBN
978-65-272-0843-3
Keywords
lncRNAs, Alternative Splicing, Nanopore, Bioinformatics.
Summary
Alternative splicing (AS) is a pervasive process in mammals, known to affect most multi-exonic genes and accounting for an immense RNA diversity of isoforms. Many regulatory aspects of AS networks and functions have been uncovered, however, little of the alternative splicing regarding non-coding RNAs has been truly explored. Among the non-coding transcriptome, we have long non-coding RNAs (lncRNAs) playing crucial regulatory roles in the cell, including in the splicing process itself. Hence, from a pilot Nanopore direct RNA-seq assay of neonatal rat cardiomyocyte samples, we characterize newly discovered lncRNAs based on a standard categorization (antisense, intergenic and intronic) and detected AS events. The methodology employed a new bioinformatics pipeline in Nextflow (github.com/lfreitasl/lncinference) to analyze Nanopore RNA-seq data. The process involved quality control, splice-aware mapping to the reference genome, de novo transcriptome assembly, discovery and characterization of new lncRNAs. Therefore, tools such as minimap2, StringTie, GffCompare and RNAmining, in addition to personalized scripts in Python and R, were essential for efficient and accurate analysis. The early results reveal a total of 7,869 new lncRNAs, of which 7,136 are multi-exonic and 733 are mono-exonic. Given this, the majority of these lncRNAs were transcribed from protein coding genes (6,897), whereas smaller amounts came from intergenic regions (506), antisense transcription (311), lncRNA genes (110) and pseudogenes (45). The antisense lncRNAs were transcribed from the opposite strand of protein coding (263) and lncRNA (45) genes, and pseudogenes (3). Interestingly, protein coding genes transcribed RNA isoforms that “lost” the coding potential after splicing, generating a discrepant amount of lncRNAs classified by GffCompare as “j ? multi-exon with at least one junction match” (5,575), which covers multiple AS events, such as exon skipping, mutually exclusive exons, alternative splice sites, alternative first and last exons. Furthermore, 1,127 new lncRNAs expressed from protein coding genes originated from intron retention, and 195 were intronic to a protein coding transcript. Most pseudogene-derived lncRNAs involved intron retention events (27), while lncRNA genes generated mostly “j” lncRNAs (86). The newly identified lncRNAs exhibit a diversity of median transcript sizes, exon length and numbers across all the categories assigned: intron retained lncRNAs represented the largest transcripts (2,489-2,515 nt) while intronic lncRNAs were the smallest (775 nt). In general, the new lncRNAs have a median length of 2,006nt and a number of 6 exons per transcript. Among the new lncRNAs, the multi-exonic transcripts are larger (2,060 nt) than mono-exonic (1,427). However, the single exons of mono-exonic have larger length than each of the multiple exons of multi-exonic (129 nt). These divergences of length and exon number is also observed in reference mRNAs and lncRNAs from the Ensembl annotation mRatBN7.2, although the new lncRNAs are larger in size and exon number than the reference lncRNAs (1,264nt and 3 exons), and similar, but smaller than the reference mRNAs (2,332 nt and 8 exons). Finally, the discovery of this vast array of new lncRNAs, along with their diverse splicing and transcription origins, underscores the complexity and regulatory potential within the neonatal rat cardiomyocyte non-coding transcriptome.
Title of the Event
20º Congresso Brasileiro de Bioinformática: X-Meeting 2024
City of the Event
Salvador
Title of the Proceedings of the event
X-Meeting presentations
Name of the Publisher
Even3
Means of Dissemination
Meio Digital

How to cite

SANTANA, Maria Bárbara Borges de et al.. NANOPORE DIRECT RNA SEQUENCING REVEALS A MYRIAD OF ANTISENSE, INTERGENIC AND ALTERNATIVELY SPLICED ISOFORMS OF LONG NON-CODING RNAS.. In: X-Meeting presentations. Anais...Salvador(BA) Hotel Deville Prime, 2024. Available in: https//www.even3.com.br/anais/xmeeting-2024/837271-NANOPORE-DIRECT-RNA-SEQUENCING-REVEALS-A-MYRIAD-OF-ANTISENSE-INTERGENIC-AND-ALTERNATIVELY-SPLICED-ISOFORMS-OF-LO. Access in: 16/07/2025

Paper

Even3 Publicacoes