论文部分内容阅读
朴素贝叶斯算法是一种常见的基于内容的垃圾邮件过滤算法,但是,传统朴素贝叶斯过滤存在判断内容的不确定性和邮件表示不完整性等问题。分析邮件信头各域在正常邮件和垃圾邮件中表现出的不同属性,提取非特征信息,结合特征信息和非特征信息改进朴素贝叶斯算法。实验结果表明,改进的朴素贝叶斯分类方法与单纯使用特征信息的方法相比,垃圾邮件的召回率和准确率更高,凸显了该方法涵盖邮件信息、克服内容判断缺陷的优势。
Naïve Bayes algorithm is a common content-based spam filtering algorithm. However, the traditional naïve Bayesian filtering has the problem of judging the content uncertainty and mailing incompleteness. The different attributes of mail header fields in normal mail and spam are analyzed. Non-feature information is extracted, and the naive Bayes algorithm is improved by combining feature information and non-feature information. The experimental results show that the improved naive Bayesian classification method has higher recall rate and accuracy compared with the method of using only feature information, which highlights the advantages of this method in covering e-mail messages and overcoming the defects of content judgment.