STANDARDIZED ANNOTATION OF LONG NON-CODING RNAS IN PLANTS

Published in 08/11/2023 - ISBN: 978-65-272-0061-1

Paper Title
STANDARDIZED ANNOTATION OF LONG NON-CODING RNAS IN PLANTS
Authors
  • Lucas Otávio Leme SIlva
  • Douglas Silva Domingues
  • Wonder Alexandre Luz Alves
  • ALEXANDRE ROSSI PASCHOAL
  • Tatianne Costa Negri Rocha
Modality
Poster
Subject area
RNA and transcriptomics
Publishing Date
08/11/2023
Country of Publishing
Brazil | Brasil
Language of Publishing
Inglês
Paper Page
https://www.even3.com.br/anais/xmeeting2023/628948-standardized-annotation-of-long-non-coding-rnas-in-plants
ISBN
978-65-272-0061-1
Keywords
standardization, redundancy, reliability score, artificial intelligence, non-coding RNAs
Summary
The Long non-coding RNAs (lncRNAs) are RNA molecules with a length greater than 200 nucleotides, which play key roles in gene regulation such as chromatin modulation, protein interaction, stability, and translation of cytoplasmic mRNAs. In plants, there are 11 publicly available lncRNA databases. However, there is a lack of standardization among lncRNAs in the more than 100 completely sequenced plants today. Each database follows a methodology for annotating lncRNAs using different pipelines. Therefore, standardization is necessary. In this context, the objective of this work was to standardize lncRNA data in a single repository for plant genomes. Eleven databases were used, including AlnC (804 species), CantataDB (39 species), EVLncRNAs (43 species), GreenC (94 species), LncPheDB (9 species), NONCODE (23 species), PNRD (150 species), PlncDB (80 species), PlncRNADB (4 species), TAIR (Arabidopsis thaliana), and lncRNAdb (10 species). In addition to plants, some databases also contain data from algae, animals, and fungi, which were excluded, totaling 838 unique plant species available. Of these species, 12,896,356 lncRNAs were collected. NExt, we performed an analysis to remove redundancy by using the CD-HIT-EST-2D tool. CD-HIT tool compares two databases and returns which sequences have similarity based on the defined parameters. In case, we used the 90\% similarity as parameter. The result was the creation of a unique and non-redundant set of lncRNA sequences with a reliability score. This score is simply the number of databases that corroborated each of the lncRNAs. This standardization will improve applications to train and test artificial intelligence models to identify long non-coding RNAs in plants.
Title of the Event
X-Meeting / BSB 2023
City of the Event
Curitiba
Title of the Proceedings of the event
X-Meeting presentations
Name of the Publisher
Even3
Means of Dissemination
Meio Digital

How to cite

SILVA, Lucas Otávio Leme et al.. STANDARDIZED ANNOTATION OF LONG NON-CODING RNAS IN PLANTS.. In: X-Meeting presentations. Anais...Curitiba(PR) Campus da indústria, 2023. Available in: https//www.even3.com.br/anais/xmeeting2023/628948-STANDARDIZED-ANNOTATION-OF-LONG-NON-CODING-RNAS-IN-PLANTS. Access in: 06/10/2025

Paper

Even3 Publicacoes