论文部分内容阅读
为挖掘畅销轿车的潜在共性,并解决人为分析数据规律时存在效率低、分析不全面等问题,提出了收集数据、建立数据库、数据库挖掘的系统方法。收集了2016年全年销售排行前100轿车的多方面参数配置数据,建立了代表车型的参数配置数据库,并在R语言数据分析环境下进行数据挖掘。采用ggplot2扩展包,从级别、厂商指导价格、品牌3个方面对数据库做了基本分析;进一步结合Arules Viz扩展包,用Apriori算法对数据进行了关联分析,发现了畅销轿车参数配置的潜在共性以及轿车购买者的主流偏好。
In order to excavate the potential commonality of best-selling sedans and solve the problems of low efficiency and incomplete analysis when analyzing human data, a systematic method of data collection, database establishment and database mining is put forward. Various parameters configuration data of the top 100 sedans in 2016 annual sales ranking were collected, a parameter configuration database for the representative models was set up, and data mining was conducted under the R language data analysis environment. Using ggplot2 expansion package, the paper makes a basic analysis of the database from three aspects: level, manufacturer’s guide price and brand. By further combining the Arules Viz extension package with the Apriori algorithm, the data are analyzed and the commonalities of the best sedan parameter configuration are found out. Mainstream preferences of car buyers.