论文部分内容阅读
Background: Population structure (PS), including population stratification and admixture, is a significant confounder in genome-wide association studies (GWAS) as it may produce spurious associations.Random forest (RF) has been increasingly applied in GWAS data analysis because of its advantage in analyzing high dimensional genetic data.RF creates importance measures for SNPs which are helpful for feature selections.However, if population structure is not appropriately corrected, RF tends to give high importance to disease-unrelated SNPs with different frequencies of allele or genotype among subpopulations, leading to inaccurate results.