Probabilistic outlier detection for sparse multivariate geotechnical site investigation data using B

来源 :地学前缘(英文版) | 被引量 : 0次 | 上传用户:axjlzpf
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
Various uncertainties arising during acquisition process of geoscience data may result in anomalous data instances(i.e., outliers)that do not conform with the expected pattern of regular data instances.With sparse multivariate data obtained from geotechnical site investigation, it is impossible to identify outliers with certainty due to the distortion of statistics of geotechnical parameters caused by outliers and their associated statistical uncertainty resulted from data sparsity.This paper develops a probabilistic outlier detection method for sparse multivariate data obtained from geotechnical site investigation.The proposed approach quantifies the outlying probability of each data instance based on Mahalanobis distance and determines outliers as those data instances with outlying probabilities greater than 0.5.It tackles the distortion issue of statistics estimated from the dataset with outliers by a re-sampling technique and accounts, rationally, for the statistical uncertainty by Bayesian machine learning.Moreover, the proposed approach also suggests an exclusive method to determine outlying components of each outlier.The proposed approach is illustrated and verified using simulated and real-life dataset.It showed that the proposed approach properly identifies outliers among sparse multivariate data and their corresponding outlying components in a probabilistic manner.It can significantly reduce the masking effect(i.e., missing some actual outliers due to the distortion of statistics by the outliers and statistical uncertainty).It also found that outliers among sparse multivariate data instances affect significantly the construction of multivariate distribution of geotechnical parameters for uncertainty quantification.This emphasizes the necessity of data cleaning process(e.g., outlier detection)for uncertainty quantification based on geoscience data.
其他文献
TETRA是欧洲电信标准协会(ETSI)为满足专业移动无线用户需求而制定的惟一开放的数字集群移动通信标准。在今年9月北京举行的“2003中国(北京)数字移动集群通信技术及应用国
电力现在已经成为我们社会不能缺少的一部分,电力系统能否正常运行决定其常输电,对人们生活产生重要影响作用,尤其是在电力技术在不断更新以及电力得到大发展的情况下,继电保
本试验在1991~1992年用磁化复合肥对新红星(Starkrimson)苹果树叶面喷施。结果表明:叶面喷施磁化复合肥可有效地提高新红星苹果树叶片和幼果的微量营养元素含量。与CK相比,喷施磁化复合肥后,叶片的Mn、Zn、Cu、
  It is suggested by the data of our community-based screening in the residents of southern China that the incidence of MS according to the diagnostic criteri
会议
输变电工程三维设计模型构建过程复杂,独立建立大量三维设计模型往往需要耗费大量的人力和物力,这成为推广输变电工程三维设计的一大障碍.目前内蒙古电网新建变电站工程均采
目的:探讨硼替佐米对前列腺癌细胞DU145的抑制作用。方法:采用MTT法检测硼替佐米对DU145细胞的生长抑制作用,流式细胞技术测定细胞周期和凋亡率,Western blot检测Bik和Caspas
  XAP2 has recently been identified as tumor suppressor through studies of familial mutations in the XAP2 gene that lead to pituitary tumors and other organs
会议
  Despite advances in the development of insulin analogs that have greatly improved the clinical effectiveness of insulin therapy, optimal glycemic control re
会议
目前我国居民对于日常生活的电力应用需求越来越大,电力能源的发展也越来越快,其在社会发展中占有巨大的比重.但是在电力企业的发展中,线损一直是一个相当重要的指标,它直接
本文结合了互联网运营管理过程中存在的不足,以大理剑川县作为个案有针对性的提出了有效意见的和改进方法.同时,针对运行中的技术维护做出了研究,旨在通过本文和广大相关人员