The result of microbial genome sequencing projects often contain unwanted information originated from contamination by either cells or DNAs from other organisms. It is nearly impossible to detect every contamination event, especially from partially sequenced genome assemblies. However, using a robust phylogenetic marker that is not easily subjected to lateral gene transfer may provide a way to detect some clear contamination cases. Here, we present a new method named ContEst16S (Contamination Estimator by 16S), in which 16S rRNA gene fragments from the query genome assemblies are screened to see if the genome assembly is contaminated or not.
596 genomes are confirmed to be contaminated (out of 69,745 genomes) in the Genbank database.
Lee I, Chalita M, Ha SM, Na SI, Yoon SH, Chun J. (2017). ContEst16S: an algorithm that identifies contaminated prokaryotic genomes using 16S RNA gene sequences. Int J Syst Evol Microbiol. 67(6):2053-2057.