论文部分内容阅读
The name "Data Mining",commonly used to describe a style of data analysis that makes a virtue of exploratory approaches,emerged from the computer science community.In a recent book,Ken Berk describes it as a "muscular " version of EDA (Exploratory Data Analysis).Statistical Learning and Machine learning draw from similar streams of ideas,and have similarly strong connections into computer science,but may pay more attention to the literature and traditions of probability theory and of theoretical statistics."Analytics",focusing on applications in business and commerce,is another name that has come into wide use in recent years. This talk will offer a statisticians view of these different names for data analysis,with their differences in style,concepts,terminology and notation.It will comment on the challenges and innovations that they have fostered.It will comment on common deficiencies in the frameworks of understanding and theory,arising in part from limited attention to insights from the statistical tradition.It will comment on key ideas.Finally.it will comment on what R offers to these diverse communities.in specific analysis tools,as a unifying framework for development of new abilities,and as a means of access to a wide range of methodologies.