论文部分内容阅读
本文把汉字识别归结为无记忆信道对离散信源的信息传输模型。由此出发导出了正确识别率、识别速度的计算公式,分析了影响正确识别率和识别速度的各种因素。给出了正确识别率与被识别字域大小的关系曲线。曲线表明,出现概率越大的汉字对正确识别率的贡献也越大。在汉字综合频度表的6763个汉字中,出现概率大的前4081个汉字对正确识别率的贡献为99.9%,而余下的2682个汉字对正确识别率的贡献仅仅为0.1%。 文中还对提高识别速度的途径进行了探讨,并作了模拟实验,给出了具有启示性的实验结果。
In this paper, Chinese character recognition is attributed to a message transmission model of memoryless channel to discrete sources. Based on this, the formulas of correct recognition rate and recognition speed are derived, and various factors that affect the correct recognition rate and recognition speed are analyzed. The relationship between the correct recognition rate and the size of the recognized field is given. The curve shows that the greater the probability of occurrence of Chinese characters, the greater the contribution to the correct recognition rate. Among the 6763 Chinese characters in the comprehensive Chinese frequency table, the first 4081 Chinese characters with a high probability of appearance contribute 99.9% to the correct recognition rate, while the remaining 2682 Chinese characters contribute only 0.1% to the correct recognition rate. The paper also discusses ways to improve the speed of recognition, and makes a simulation experiment, gives a revelation of the experimental results.