Determining the Real Data Completeness of a Relational Dataset

来源 :计算机科学技术学报（英文版） | 被引量 : 0次 | 上传用户：happyhubby

【摘要】

：

Low quality of data is a serious problem in the new era of big data, which can severely reduce the usability of data, mislead or bias the querying, analyzing an

【作者】

：

Yong-Nan Liu Jian-Zhong Li Zhao-Nian Zou

【机构】

：

School of Computer Science and Engineering, Harbin Institute of Technology, Harbin 150001, China

【出处】

：

计算机科学技术学报（英文版）

【发表日期】

：

2016年4期

【关键词】

：

data quality data completeness functional dependency data completeness model opt

下载到本地 , 更方便阅读

下载此文赞助VIP

声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架

论文部分内容阅读

Low quality of data is a serious problem in the new era of big data, which can severely reduce the usability of data, mislead or bias the querying, analyzing and mining, and leads to huge loss. Incomplete data is common in low quality data, and it is necessary to determine the data completeness of a dataset to provide hints for follow-up operations on it. Little existing work focuses on the completeness of a dataset, and such work views all missing values as unknown values. In this paper, we study how to determine real data completeness of a relational dataset. By taking advantage of given functional dependencies, we aim to determine some missing attribute values by other tuples and capture the really missing attribute cells. We propose a data completeness model, formalize the problem of determining the real data completeness of a relational dataset, and give a lower bound of the time complexity of this problem. Two optimal algorithms to determine the data completeness of a dataset for different cases are proposed. We empirically show the effectiveness and the scalability of our algorithms on both real-world data and synthetic data.

其他文献

虹鳟运输后放养死亡原因

由于虹鳟性情暴躁，鳞片细小，运输过程中易受伤脱落。虽然运输成活率高，但是在放养后的几天内，往往发生较多的死亡。本文对本所冷水鱼良种场一次红鳟死亡原因进行分析，为其它地区虹

期刊

虹鳟养殖运输过程放养死亡原因良种场冷水鱼成活率受伤鳞片地区

基于拓扑扩展的在线社交网络恶意信息源定位算法

随着在线社交网络的飞速发展,社交媒体成为网络用户参与的主要平台.恶意信息常常隐藏于在线社交网络的海量数据中,加之拓扑结构的局部性、恶意信息的伪装性,定位和溯源恶意信

期刊

在线社交网络拓扑扩展恶意信息源定位

湿润烧伤膏促进肛周脓肿手术后伤口愈合的效果观察及护理

该文从挂篮荷载计算、施工流程、支座及临时固结施工、挂篮安装及试验、合拢段施工、模板制作安装、钢筋安装、混凝土的浇筑及养生、测量监控等方面人手,介绍了S226海滨大桥

期刊

农发行延边州分行办理第一笔国际结算业务/农发行延边州分行成立财会检查辅导人才库和推行集中检查辅导制度成效显著/农发行延吉市支行在服务中彰显民族特色

期刊

农发行延边州分行国际结算业务财会检查辅导人才库集中制度成效显著延吉市服务

聚焦创新期待未来(Ⅲ) INTERMAT 2018法国国际工程机械展之零部件

展会期间,除了各大主机品牌,包括发动机以及传动、液压等零部件企业也纷纷亮相,充分展示自己的新产品和新技术.rn康明斯非道路电动动力首次亮相rn康明斯推出的非道路电动动力

期刊

米索前列醇配伍缩宫素预防剖宫产后出血

近3年来,我院对有剖宫产后出血高危因素的产妇,术中使用米索前列醇配伍缩宫素预防产后出血,效果良好,现报告如下。1资料与方法1.1临床资料:2006年1月至2007年12月,在我院住院

期刊

米索前列醇配伍缩宫素预防产后出血剖宫产后出血现报告如下高危因素产妇

慢性肾衰心脏异常X线表现及其影响因素的分析（附87例报告）

目的 :为了提高对慢性肾衰心脏异常X线表现的认识。方法 :统计分析 87例慢性肾衰患者心脏X线表现及其影响因素。结果 :心脏增大占 49.4% ,心脏增大的形态中主动脉型占 46 .5

期刊

慢性肾衰心脏X线摄影

颈动脉内膜剥脱血管成形术的术中配合及护理

期刊

利用养鳗池进行香鱼育苗试验

１９９９年１１月起，我所与东海水产研究所合作进行香鱼育苗试验，现将试验情况总结如下：一、材料与方法１９９９年１０月２６日自外地购进香鱼发眼卵３０万粒，置于８０米２面积的水泥池内，水温为１８℃，盐度为１８‰，水深５５厘米，ｐＨ值

期刊

养鳗香鱼育苗试验试验情况水产东海

脾囊肿5例诊断和治疗

脾囊肿在临床上少见,病因未完全明了,临床表现缺乏特异性,诊断依靠病史、B超、CT检查,以手术治疗为主.手机方式根据囊肿部位、性质选择.

期刊

脾囊肿手术治疗临床表现特异性诊断选择手机检查病因

Determining the Real Data Completeness of a Relational Dataset

与本文相关的学术论文