论文部分内容阅读
在数据抽取中,领域网页是多数具有特点的网页,包含了大量的领域术语。针对领域网页的特征总结,分析出有效的消除“噪音”的方法,为数据抽取做好坚实的基础。
In data extraction, domain web pages are the most characteristic web pages that contain a large number of domain terms. According to the feature summaries of domain web pages, we analyze the effective way to eliminate “noise” and make a solid foundation for data extraction.