论文部分内容阅读
基于多元统计分析中对样本完整性的要求,为了在分析中不抛弃大量不完整的化石标本或者不大大减少变量,创建了一种恢复标本残缺数据的方法。本方法基于线性回归理论,假设同类标本个体之间的区别仅仅是大小的区别,形状的区别可以忽略不计,因此,在同类标本中,可以用一件标本的已知测量数据预测另一件标本的残缺测量数据。在多件标本的情况下,对某件标本的某个残缺数据的预测结果是用其他标本分别进行预测所得值的加权平均,加权系数的选取与每件标本的保存完好程度相关。用现生马属头骨及肢骨标本做的数据试验证明,该方法具有良好的稳定性,对标本的种类、数量及残缺值的多少均不敏感,对于尺寸较大的标本或数值较大的数据的预测效果要比对尺寸较小的标本或数值较小的数据的预测效果要好。与传统的线性回归方法的不同之处在于,本方法利用的是样本(即标本)间的线性相关性,传统方法利用的是变量(即测量项)间的线性相关性。在通常情况下,样本间的线性相关程度要优于变量间的线性相关程度。本方法简单实用,在对化石标本进行统计分析,特别是多元统计分析中具有良好的应用前景。
Based on the requirements for sample integrity in multivariate statistical analysis, a method of restoring specimen incomplete data was developed in order not to abandon a large number of incomplete fossil specimens or to reduce the variables greatly in the analysis. This method is based on the linear regression theory. Assuming that the difference between individuals of the same type is only the difference in size, the difference in shape is negligible. Therefore, in the same type of specimen, one specimen can be used to predict the other specimen with the known measurement data Incomplete measurement data. In the case of multiple specimens, the prediction result of a certain incomplete data of a certain specimen is the weighted average of the predicted values of the other specimens, respectively. The selection of the weighted coefficients is related to the preservation of each specimen. The experimental data of horseshoe crab and extremity bone specimens proved that this method has good stability and is insensitive to the type, quantity and the number of incomplete specimens. For larger specimens or larger values Predicting data is better than predicting smaller or smaller data. The difference between this method and the traditional linear regression method is that the method takes advantage of the linear correlation between samples (ie, specimens), and the traditional method utilizes the linear correlation between variables (ie, measurement items). Under normal circumstances, the linear correlation between samples is better than the linear correlation between variables. The method is simple and practical, and has good application prospect in the statistical analysis of fossil specimens, especially in multivariate statistical analysis.