Share on LinkedInTweet about this on TwitterShare on Google+Share on FacebookShare on Reddit

Data Science Paper

Authors:
Foster Provost  and Tom Fawcett 
Department of Information, Operations, and Management Sciences
Leonard N. Stern School of Business
New York University

GO TO PAPER
Abstract
Companies have realized they need to hire data scientists, academic institutions are scrambling to put together data-science programs, and publications are touting data science as a hot—even ‘‘sexy’’—career choice. However, there is confusion about what exactly data science is, and this confusion could lead to disillusionment as the concept diffuses into meaningless buzz.

In this article, we argue that there are good reasons why it has been hard to pin down exactly what is data science. One reason is that data science is intricately intertwined with other important concepts also of growing importance, such as big data and data-driven decision making. Another reason is the natural tendency to associate what a practitioner does with the definition of the practitioner’s field; this can result in overlooking the fundamentals of the field. We believe that trying to define the boundaries of data science precisely is not of the utmost importance.

We can debate the boundaries of the field in an academic setting, but in order for data science to serve business effectively, it is important (i) to understand its relationships to other important related concepts, and (ii) to begin to identify the fundamental principles underlying data science. Once we embrace (ii), we can much better understand and explain exactly what data science has to offer. Furthermore, only once we embrace (ii) should we be comfortable calling it data science. In this article, we present a perspective that addresses all these concepts. We close by offering, as examples, a partial list of fundamental principles underlying data science.

Paper’s Conclusion
Underlying the extensive collection of techniques for mining data is a much smaller set of fundamental concepts comprising data science. In order for data science to flourish as a field, rather than to drown in the flood of popular attention, we must think beyond the algorithms, techniques, and tools in common use. We must think about the core principles and concepts that underlie the techniques, and also the systematic thinking that fosters success in data-driven decision making.

These data science concepts are general and very broadly applicable. Success in today’s data-oriented business environment requires being able to think about how these fundamental concepts apply to particular business problems—to think data-analytically. This is aided by conceptual frameworks that
themselves are part of data science. For example, the automated extraction of patterns from data is a process with welldefined stages. Understanding this process and its stages helps structure problem solving, makes it more systematic, and thus less prone to error.

There is strong evidence that business performance can be improved substantially via data-driven decision making,3 big data technologies,4 and data science techniques based on big data.9,10 Data science supports data-driven decision making—and sometimes allows making decisions automatically
at massive scale—and depends upon technologies for ‘‘big data’’ storage and engineering. However, the principles of data science are its own and should be considered and discussed explicitly

GO TO PDF OF PAPER

Address correspondence to:
F. Provost
Department of Information, Operations,
and Management Sciences
Leonard N. Stern School of Business
New York University
44 W. 4th Street, 8th Floor
New York, NY 10012
E-mail: fprovost@stern.nyu.edu

Share on LinkedInTweet about this on TwitterShare on Google+Share on FacebookShare on Reddit

Blog Publisher / Head of Data Science Search

Founder & Head of Data Science Search at Starbridge Partners, LLC.