Comment by Yann LeCun Of New York University:“Amazing how the “data science is just statistics” crowd completely misses the paramount importance of computation.Data Science is at the juncture of statistics, machine learning, optimization and a few areas of applied math, and a host of other areas of computer science. It is not “contained” in any of the above disciplines.The critiques we hear about data science today are similar to the ones that were addressed to computer science in the 60’s and 70’s, before it became a full-fledged academic discipline: “it’s just a branch of mathematics” or “it’s just a branch of electrical engineering”. What made it turn into a discipline was its size and importance to society.”
Science studies a particular domain, whether it be chemical, biological, physical or whatever. This gives us the sciences of chemistry, biology, physics, etc. Those who study such domains will gather data in one way or another, often by formulating experiments and taking readings. In other words, they gather data. If there were a particular activity devoted to studying data, then there might be some virtue in the term “data science.” And indeed there is such an activity, and it already has a name: it is a branch of mathematics called statistics.
Statistics Versus Data Science
So is data science just statistics by another name? Data scientists seem to view statistics more as a tool they use to a greater or lesser degree in their work rather than the domain of their science, as Bloor suggests. The relationship is kind of like the one between the content of the theoretical courses you’ll find in a computer science degree and what a working coder actually does day to day.
Data scientist Hilary Mason (formerly of Bitly, now Accel Partners)made this comment about Silver’s claim: “I’m a computer scientist by training who explores data and builds algorithms, systems, and products around data. I use statistics in my practice, but would never claim to be an expert statistician.”
O’ Reilly’s Analyzing the Analyzers report seems to confirm the idea that statistics is just one tool of data science rather than the focus of the field. The study showed that data science already involves a range of roles from data businessperson to data researcher, with statistics featuring much more prominently in some roles than others.
Statistics Versus Machine Learning
Commenters on Bloor’s post also pointed to the extensive use of machine learning, and not just statistics, in the data science world. The overlaps and differences between machine learning and statistics is in itself a contentious issue, as both fields are interested in learning from data. They just have different objectives and go about it in different ways. Data scientist and Machine Learning for Hackers author Drew Conway explains the difference this way: … Continue Reading Article