论文部分内容阅读
目的对两种常用的统计推断体系应用于同一问题的结果进行比较,同时反映陕西省年人均住院医疗费用支出情况。方法第五次全国卫生服务调查采用分层多阶段πPS抽样方法,将两种统计推断体系分别应用于陕西省调查数据。结果基于设计的统计推断方法的标准误显著大于基于模型的推断方法,对经过对数变换之后的总体均值的估计二者分别为(4.060840±0.008588)和(4.060051±0.004072)。可见,基于设计的统计推断方法标准误显著大于基于模型的方法。结论两种统计推断体系应用于复杂抽样的大样本数据中各有优缺点,基于设计的统计推断要求一定的抽样比例以保证样本的代表性,且要求除样本数据外的大量辅助信息计算抽样权重;基于模型的统计推断对因变量的总体分布较为敏感,在总体呈现偏态时要进行转换,增加了模型的拟合难度。对于分类自变量要生成哑变量纳入模型,增加了模型的复杂程度。因此要针对两种推断体系各自的优势与不足,以及自身需要选择最适宜的统计推断方法。
Objective To compare the results of two commonly used statistical inference systems applied to the same problem and to reflect the annual per capita expenditure on medical expenses in Shaanxi Province. Methods The fifth national health service survey using hierarchical multi-stage πPS sampling method, the two statistical inference systems were applied to the survey data in Shaanxi Province. Results The standard error of the statistical inference method based on the design was significantly greater than that of the model-based inference method. The estimates of the overall mean after logarithmic transformation were (4.060840 ± 0.008588) and (4.060051 ± 0.004072), respectively. It can be seen that the standard error of design-based statistical inference method is significantly greater than the model-based method. Conclusion The two kinds of statistical inference systems have their own advantages and disadvantages in applying to the large sample data of complex sampling. Based on the statistical inference of the design, a certain sampling ratio is required to ensure the representativeness of the sample, and a large amount of auxiliary information other than the sample data is required to calculate the sampling weight ; And the model-based statistical inference is more sensitive to the overall distribution of the dependent variables. When the overall skewness is present, the conversion is performed, which increases the difficulty of fitting the model. For the classification of variables to generate dummy variables into the model, increasing the complexity of the model. Therefore, we must take into account the respective strengths and weaknesses of the two inference systems and the need to choose the most appropriate statistical inference method.