Video | Strata + Hadoop World NYC 2016 | “The Evolution of Massive Scale Data Processing”

In this video, Tyler Akida presents a whirlwind tour of the evolution of massive-scale data processing at Google, from the original MapReduce paradigm to the high-level pipelines of Flume to the streaming approach of MillWheel to the portable, unified streaming/batch model of Google Cloud Dataflow and Apache Beam (incubating).

Tyler also highlights similarities and differences with related open source systems such as Flink, Spark, Storm, and Gearpump, calling out ways in which they’re converging on and diverging from the Beam model and what that means when running Beam pipelines on their respective runners. Watch Video

VIDEO | NYC Machine Learning Meetup 2016 | Dan Melamed “How To Learn From What Your Users Might Not See”

At the Machine Learning Meetup in NYC, Dan Melamed gave a machine learning talk titled: “How To Learn From What Your Users Might Not See”. This talk will focus on contextual bandits and their applications.

In this tutorial, Dan will show how to learn from such data in a principled, efficient, and unbiased manner. The techniques that he will describe were largely responsible for a click-thru rate gain of over 25% on Watch Video

How AirBnB Scaled from 5 to 70+ Data Scientists in 2 Years (via Kaggle)

In 2013, Airbnb had a small, centralized team of five data scientists serving the data needs of the company. Since then, they have grown to become one of the largest, most innovative startup teams with over 70 data scientists now serving separate business units. In addition to setting a consistently high bar on new hires and focusing on technical mentorship from peers, the structure of the organization has been key to successful growth. Read More

Guide | How To Obtain a FREE “Open Source” Masters in Data Science

A FREE Masters in Data Science. More and more people are learning on-line via the flood of excellent “open source” resources of classes, ebooks, software, etc. Clare Corthell has created a website to allow anybody to take virtually the same curriculum offered for a Masters in Data Science for Free.

Will it be an official Masters? No, but an official Masters is not always what is needed. Often its the knowledge and experience working with the tools and techniques necessary to actually do Data Science. For some, this free curriculum will allow business-line leaders, Analysts and Programmers from other fields to fill in the education gaps and get better at their job, as well as, one step closer to being an actual Data Scientist. Read More

Edu-Video | What has Kaggle learned from 2 Million Machine Learning models

Kaggle is a community of almost 450K data scientists who have built nearly 2 million machine learning models to participate in its competitions. Data scientists come to Kaggle to learn, collaborate and develop the state of the art in machine learning. This talk will cover some of the lessons on winning techniques we have learned from the Kaggle community. Watch Video

Edu-Videos | Learn All About Apache Spark (100x Faster than Hadoop MapReduce)

Apache Spark’s popularity as part of big data analytics solutions is exploding. Spark is an open-source data analytics cluster computing framework originally developed in the AMPLab at UC Berkeley. Spark fits into the Hadoop open-source community, building on top of the Hadoop Distributed File System (HDFS). However, Spark promises performance up to 100 times faster than Hadoop MapReduce for certain applications…and that’s why you should care!

Spark’s in-memory cluster computing is very well suited to machine learning algorithms. These Videos will give you a nice introduction to Spark, how it’s being used in business and why you should care…Watch Spark Videos…

Blog Publisher / Head of Data Science Search

Founder & Head of Data Science Search at Starbridge Partners, LLC.