IDEA:A Utility-Enhanced Approach to Incomplete Data Stream Anonymization

来源 :清华大学学报自然科学版(英文版) | 被引量 : 0次 | 上传用户:a2619040
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
The prevalence of missing values in the data streams collected in real environments makes them impossible to ignore in the privacy preservation of data streams.However,the development of most privacy preservation methods does not consider missing values.A few researches allow them to participate in data anonymization but introduce extra considerable information loss.To balance the utility and privacy preservation of incomplete data streams,we present a utility-enhanced approach for Incomplete Data strEam Anonymization (IDEA).In this approach,a slide-window-based processing framework is introduced to anonymize data streams continuously,in which each tuple can be output with clustering or anonymized clusters.We consider the dimensions of attribute and tuple as the similarity measurement,which enables the clustering between incomplete records and complete records and generates the cluster with minimal information loss.To avoid the missing value pollution,we propose a generalization method that is based on maybe match for generalizing incomplete data.The experiments conducted on real datasets show that the proposed approach can efficiently anonymize incomplete data streams while effectively preserving utility.
其他文献
With the increasing use of cloud computing,high energy consumption has become one of the major challenges in cloud data centers.Virtual Machine (VM) consolidation has been proven to be an efficient way to optimize energy consumption in data centers,and ma
When the input signal has been interfered and glitches occur,the power consumption of Double-Edge Triggered Flip-Flops (DETFFs) will significantly increase.To effectively reduce the power consumption,this paper presents an anti-interference low-power DETF
This research discussed a deep learning method based on an improved generative adversarial network to segment the hippocampus.Different convolutional configurations were proposed to capture information obtained by a segmentation network.In addition,a gene
Event temporal relation extraction is an important part of natural language processing.Many models are being used in this task with the development of deep learning.However,most of the existing methods cannot accurately obtain the degree of association be
Road pricing is an urban traffic management mechanism to reduce traffic congestion.Currently,most of the road pricing systems based on predefined charging tolls fail to consider the dynamics of urban traffic flows and travelers\' demands on the arrival
Integer overflow is a common vulnerability in Ethereum Smart Contracts (ESCs) and often causes huge economic losses.Smart contracts cannot be changed once it is deployed on the blockchain and thus demand further testing.Mutation testing is a fault-based t
Lesion detection in Computed Tomography (CT) images is a challenging task in the field of computer-aided diagnosis.An important issue is to locate the area of lesion accurately.As a branch of Convolutional Neural Networks (CNNs),3D Context-Enhanced (3DCE)
In distributed training,increasing batch size can improve parallelism,but it can also bring many difficulties to the training process and cause training errors.In this work,we investigate the occurrence of training errors in theory and train ResNet-50 on
Identifying the association between metabolites and diseases will help us understand the pathogenesis of diseases,which has great significance in diagnosing and treating diseases.However,traditional biometric methods are time consuming and expensive.Accor
N400 is an objective electrophysiological index in semantic processing for brain.This study focuses on the sensitivity of N400 effect during speech comprehension under the uni-and bi-modality conditions.Varying the Signal-to-Noise Ratio (SNR) of speech si