This talk was given at Midwest.io
Josh Wills, Director of Data Science at Cloudera has a gift for making fairly complicated technology explanations very digestible to the novice and intermediary techie. What I most love about this video is how Josh explains -very clearly – the issue of translating analytics Machine Learning on a large set of data records (see: individuals) and making it work in a production environment on one individual (think eCommerce). It’s going from a SQL/R/SAS type of environment (pure analysis) to a Java, Scala, C++ programming environment (actual site) and how to deal with that effectively.
“The Data Science Team at Cloudera has a simple mission: build an analytics infrastructure so awesome that it makes Google’s Ads Quality Team seethe with jealousy. To that end, I’ll give an overview of Cloudera’s current data science tools, including Oryx and Spark for building and serving machine learning models, Gertrude for multivariate testing, and Impala for ludicrously high-performance SQL queries against HDFS.” – Josh
About the Speaker
Josh Wills is Cloudera’s Senior Director of Data Science, working with customers and engineers to develop Hadoop-based solutions across a wide-range of industries. He is the founder and VP of the Apache Crunch project for creating optimized MapReduce pipelines in Java and lead developer of Cloudera ML, a set of open-source libraries and command-line tools for building machine learning models on Hadoop. Prior to joining Cloudera, Josh worked at Google, where he worked on the ad auction system and then led the development of the analytics infrastructure used in Google+.