Es deduplicator

4/30/2023

The typical process keeps the read that has the highest UMI frequency and the highest quality score ( Liu, 2019). In either scenario, the methods for grouping the reads by their UMIs are similar. More recently, tools that skip the alignment step have been developed with a gain in speed on larger datasets. Then, the reads with the same alignment coordinates and UMIs are deduplicated. The most common process to analyze these data is by aligning the sequences to a reference genome or transcriptome with the UMI tag attached to the header. UMIs can be used in different NGS methods ( Kinde et al., 2011 Salk et al., 2018 Saunders et al., 2020) in a variety of approaches. This method allows monitoring of each target molecule and, consequently, helps reduce PCR amplification bias and increase the accurate quantification and subsequent comparison of targets. A UMI is a short sequence (usually 8–16 nucleotides, but this can vary depending on the study) that is specific to a molecule and is generated by permutations of a string of randomized nucleotides ( Kivioja et al., 2011 Islam et al., 2014). Sample preparation involves the introduction of a UMI to each target molecule before PCR amplification.

This limitation was overcome by the use of unique molecular identifiers (UMIs), facilitating detection and removal of PCR duplicates. This is mainly due to the NGS library preparation process, which includes multiple rounds of polymerase chain reaction (PCR) amplification, introducing PCR duplicates and artifacts in the output sequence. In clinical applications, the detection of true mutants in low-frequency alleles or rare subclones that may contribute to the disease at an early stage remains a big challenge for cancer studies. However, detection of variants with low frequency (below ∼1–3%) still remains a difficult task because of background noise ( Fox et al., 2014). Through the NGS technologies, researchers are able to study whole genomes (whole-genome sequencing) or smaller regions (exome sequencing), with an unparalleled depth and sensitivity compared to Sanger sequencing ( Shen et al., 2015). The introduction of next-generation sequencing (NGS) has revolutionized genomic research and has impacted tremendously clinical applications ( Lander et al., 2001 Shen et al., 2015). UMIc is an open-source tool implemented in R and is freely available from. We have tested the tool using different scenarios of UMI-tagged library data, having in mind the aspect of a wide application. Our approach takes into account the frequency and the Phred quality of nucleotides and the distances between the UMIs and the actual sequences. Here, we propose an alignment free framework serving as a preprocessing step of fastq files, called UMIc, for deduplication and correction of reads building consensus sequences from each UMI. A UMI adds a unique identity to different DNA/RNA input molecules through polymerase chain reaction (PCR) amplification, thus reducing bias of this step. 4Laboratory for MEMS Applications, IMTEK-Department of Microsystems Engineering, University of Freiburg, Freiburg, GermanyĪ recent refinement in high-throughput sequencing involves the incorporation of unique molecular identifiers (UMIs), which are random oligonucleotide barcodes, on the library preparation steps.3Unit for Hematological Diagnostics, Department of Internal Medicine II, University Medical Center Schleswig-Holstein, Kiel, Germany.2Department of Genetics, Development and Molecular Biology, School of Biology, Aristotle University of Thessaloniki, Thessaloniki, Greece.1Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece.Maria Tsagiopoulou 1†, Maria Christina Maniou 1†, Nikolaos Pechlivanis 1,2, Anastasis Togkousidis 1, Michaela Kotrová 3, Tobias Hutzenlaub 4,5, Ilias Kappas 2, Anastasia Chatzidimitriou 1 and Fotis Psomopoulos 1*

0 Comments

Es deduplicator

Leave a Reply.

Author

Archives

Categories