IN THIS FOURTH EDITION of the O’Reilly Data Science Salary Survey, we’ve analyzed input from 983 respondents working in the data space, across a variety of industries— representing 45 countries and 45 US states. Through the results of our 64-question survey, we’ve explored which tools data scientists, analysts, and engineers use, which tasks they engage in, and of course—how much they make.
Key findings include:
- Python and Spark are among the tools that contribute most to salary.
- Among those who code, the highest earners are the ones who code the most.
- SQL, Excel, R and Python are the most commonly used tools.
- Those who attend more meetings, earn more.
- Women make less than men, for doing the same thing.
- Country and US state GDP serves as a decent proxy for geographic salary variation (not as a direct estimate, but as an additional input for a model).
- The most salient division between tool and tasks usage is between those who mostly use Excel, SQL, and a small number of closed source tools—and those who use more open source tools and spend more time coding.
- R is used across this division: even people who don’t code much or use many open source tools, use R.
- A secondary division emerges among the coding half— separating a younger, Python-heavy data scientist/analyst group, from a more experienced data scientist/engineer cohort that tends to use a high number of tools and earns the highest salaries.
To see our complete model and input your own metrics to predict salary, see Appendix B: The Regression Model (but beware—there’s a transformation involved: don’t forget to square the result!).
FOR THE FOURTH YEAR RUNNING, we at O’Reilly Media have collected survey data from data scientists, engineers, and others in the data space, about their skills, tools, and salary. Across our four years of data, many key trends are more or less constant: median salaries, top tools, and correlations among tool usage. For this year’s analysis, we collected responses from September 2015 to June 2016, from 983 data professionals.
In this report, we provide some different approaches to the analysis, in particular conducting clustering on the respondents (not just tools). We have also adjusted the linear model for improved accuracy, using a square root transform and publicly available data on geographical variation in economies. The survey itself also included new questions, most notably about specific data-related tasks and any change in salary.
Salary: The Big Picture
The median base salary of the entire sample was $87K. This figure is slightly lower than in previous years (last year it was $91K), but this discrepancy is fully attributable to shifts in demographics: this year’s sample had a higher share of non-US respondents and respondents aged 30 or younger. Three-fifths of the sample came from the US, and these respondents had a median salary of $106K.
Understanding Interquartile Range
For a number of survey questions, we show graphs of answer shares and the median salaries of respondents who gave particular answers. While median salary is probably the best number to compare how much two groups of people make, it doesn’t say anything about the spread or variation of salaries. In addition to median, we also show the interquartile range (IQR)—two numbers that delineate salaries of the middle 50%. This range is not a confidence interval, nor is it based on standard deviations.
As an example, the IQR for US respondents was $80K to $138K, meaning one quarter of US respondents had salaries lower than $80K and one quarter had salaries higher than $138K. Perhaps more illustrative of the value of the IQR is comparing the US Northeast and Midwest: the Northeast has a higher median salary ($105K vs. $98K) but the third quartile cutoffs are $133K for the Northeast and $138K for the Midwest. This indicates that there is generally more variation in Midwest salaries, and that among top earners—salaries might be even higher in the Midwest than in the Northeast.
How Salaries Change…get eBook for full results...
NOTE, as specialist recruiter of data scientists and data engineers in New York Metro Area. I see much higher salaries (than listed here) being offered in sectors where data science is the key driver of the business itself (Adtech, Ecommerce, Travel, Social, Mobile, Financial Services, etc.). – Ted O’Brien