Published in 26/04/2022 - ISBN: 978-65-5941-645-5

Paper Title
  • Romário Oliveira de Sales
  • Kelly Gomes Duarte
  • Bruno Toledo Piza Martinucci
  • Murilo Cervato
Xpress presentation
Subject area
Publishing Date
Country of Publishing
Language of Publishing
Paper Page
Ngs, Bioinformatics tools, Taxonomic, Contaminants
Massive parallel sequencing methods, also known as Next-generation sequencing (NGS), produce millions of short sequences (reads). In general, in the bioinformatics pipeline, reads are assigned to a reference genome, which can identify their respective biological origin and provide correlation with genomic, transcriptomic and epigenetic levels of the individuals. Typically, the amount of reads assigned correctly to the reference genome ranges between 70% and 90%, leaving a consistent fraction of unassigned sequences. These unaligned sequences can be attributed to low quality bases or sequence differences between the sample reads and the reference genome compared. Investigate the origin of unmapped reads is fundamental to better assess the quality of the entire experiment and check for possible contamination by external or internal components of the reaction. In this work, we evaluated the diversity of microorganisms, bacteria and viruses present in undetermined reads with the barcodes of 36 NextSeq 500 routines. This analysis allowed us to investigate the presence of contaminating sequences in NGS data, thus suggesting the presence of contaminating organisms in the samples which can be derived from the sequencing reagents themselves, the biological source or the laboratory process. A complementary investigation, experimental validation or intensification of the laboratory's decontamination processes can be considered in these cases. Routine data characterization was performed based on the metric parameters for each run, available on the Illumina Sequencing Analysis Viewer. Unassined reads were analyzed using bioinformatics tools, such as the Kraken taxonomic sequence classification system. Fastq files were mapped based on a pre-compiled database called Standard, which provides information on archaea, bacteria, viral, plasmid, human and UniVec_Core, available at (https://benlangmead.github.io/aws-indexes/k2). From the result of the report generated by kraken, was used to combine with KrakenTools (https://github.com/jenniferlu717/KrakenTools) to 36 reports. The Krona visualization tool was used to generate the combined results of reports generated by Kraken. The comparative analysis between the two flowcells, Mid and High, showed that Mid has a greater abundance of bacteria (19%) when compared to High, genus Klebsiella (2%) is the most prevalent. Concerning viral diversity (2%), Heunggongvirae was the most dominant and the remaining 62% of the reads were corresponded to Homo Sapiens in Mid. However, at High, Homo sapiens was identified in 84% of the reads. Amongst it, only 0.4% were classified as bacteria, with Mycobacterium genus (0.07%) being the most abundant, having 100x less readings mapped to viruses when compared to Mid. Regarding the combined analysis of the diversity of 36 reports resulted in, 74% of the reads assigned to Homo sapiens, while bacterial diversity represents 9% of the reads, the Enterobacteriaceae family (2%), Pseudomonadaceae (1%), and the genus Klebsiella were identified, which is the most frequent (0.9%). The presence of different categories of microorganisms found in races related to viruses and bacteria prove to be of different origins. Therefore, we suggest further investigation, involving clinical data from patients, in order to identify the origin of these contaminants and investigate the clinical implications.
Title of the Event
X-Meeting XPerience 2021
Title of the Proceedings of the event
X-Meeting presentations
Name of the Publisher
Means of Dissemination
Meio Digital
LinkGet DOI

How to cite

SALES, Romário Oliveira de et al.. EVALUATION OF NON-INDEXED READINGS IN EXOME SEQUENCING.. In: X-Meeting presentations. Anais...São Paulo(SP) AB3C, 2021. Available in: https//www.even3.com.br/anais/xmeetingxp2021/410794-EVALUATION-OF-NON-INDEXED-READINGS-IN-EXOME-SEQUENCING. Access in: 23/04/2024


Even3 Publicacoes