论文部分内容阅读
民国报纸数字化实践工作中,质检环节尤为重要,高质量的数据是后期提供优质服务的可靠保障。质检问题涉及报纸、版式和OCR文字识别三个方面。报纸层面存在记录标识号、报名、出版日期和版次等问题;版式层面涉及栏目范围、篇目置标和标题置标等问题;OCR文字识别的问题主要为多字、少字、符号和字形识别错误等。
The Republic of China newspaper digital practice, the quality control links are particularly important, high-quality data is the reliable guarantee of providing quality services later. Quality control problems related to newspapers, typography and OCR text recognition in three areas. There are issues such as the record identification number, the registration date, the publication date and the version number at the newspaper level. The layout level refers to the questions such as the scope of the column, the setting of the title and the title setting, etc. The problems of OCR text recognition mainly include multiple characters, less characters, symbols and font recognition Error and so on.