论文部分内容阅读
【目的】实现自动提取科技期刊全文元数据并生成HTML文件。【方法】以方正排版文件为对象,在可以提取出来文章的标题、摘要等元数据的基础上,将文章的正文内容元数据化,提出了包含图、表、公式等的广义元数据概念,并建立了提取图、表元数据的提取规则,同时将方正排版数学公式转化为La Te X表达式。然后利用VB编程软件编写了自动提取广义元数据程序并将元数据重新组合生成HTML格式的文件。【结果】根据方正BD排版语言的特点,建立的提取规则能有效提取全文并元数据化,最后可直接生成HTML文件。【结论】实际应用表明了利用广义元数据生成HTML文件的有效性和可行性。
【Objective】 To automatically extract the full text metadata of scientific journals and generate HTML files. 【Method】 Based on the Founder typesetting documents, based on the metadata such as title, abstract and so on, the text content of the article can be metadataized. The concept of generalized meta data including charts, tables, formulas, At the same time, the formula of Founder layout math is transformed into La Te X expression. Then use VB programming software to write automatic extraction of metadata programs and metadata reorganization to generate HTML format files. 【Result】 According to the characteristics of Founder BD typesetting language, the extraction rules established can effectively extract the full text and metadata, and finally generate HTML files directly. 【Conclusion】 The practical application shows the effectiveness and feasibility of generating HTML files using generalized metadata.