论文部分内容阅读
Next generation sequencing and accompanying bioinformatics has a great potential for microbial systematics and taxonomy.Indeed,the number of genomes released into public domain databases is being rapidly increased,albeit its utilization is hampered by the lack of adequate bioinformatics tools.EzGenome database is being developed to aim to help microbiologists in various sub-disciplines with curated databases and efficient bioinformatics tools.In this talk,I will explain the bioinformatics background of EzGenome database and new algorithm for calculating average nucleotide identity for the taxonomic use.OrthoANI algorithm and related bioinformatics tools.Microbial taxonomy serves as a fundamental framework for all microbiological disciplines,and in particular,the species concept of Bacteria and Archaea is of premium importance.Species demarcation in Bacteria and Archaea has been mainly based on overall genome relatedness.Current practice of obtaining these values between two strains is shifting from experimentally determined similarity that is usually obtained by DNA-DNA hybridization(DDH)to genome sequence-based similarity.Average nucleotide identity(ANI)is a simple algorithm that mimics DDH in which the genome sequence of a query strain is divided into 1,020bp-long fragments and compared by BLASTN program against the whole genome sequence of a subject strain which was not fragmented.Because of its algorithmic nature,ANI values between two genome sequences can be calculated in a reciprocal manner.General practice is to obtain the average value between the two reciprocal ANI calculations,though these values may be different each other significantly.We examined a large set of reciprocal ANI values of closely related species and found that 55%exhibited over 0.1%discrepancy between the reciprocal ANI values.Moreover,1,101 pairs showed discrepancy higher than 1%with the highest being 4.15%difference Given that 95~96%ANI values are considered as the species boundary,this level of discrepancies is significant enough to affect our taxonomic interpretation.To resolve this problem,we have developed new ANI algorithm,named "OrthoANI",to include the concept of orthology.From a large scale calculation,we found that reciprocal OrthoANl values are always almost identical;average discrepancy is 0.00042%with the maximum of 0.05%,overcoming limitation of the original ANI algorithm.The correlation between the original ANI and OrthoANI is very high,so the same range(95~96%)of OrthoANI values can be used as the species demarcation cutoff instead of the original ANI.It is,therefore,fair to say that our OrthoANI algorithm resolves the reciprocal inconsistency of the original ANI and provides a more robust way of calculating the similarity between two genome sequences for the taxonomic use.Two different types of software for calculating OrthoANI are available at http://www.ezbiocloud.net/sw/oat.