Video | Strata + Hadoop World NYC 2016 | “The Evolution of Massive Scale Data Processing”

In this video, Tyler Akida presents a whirlwind tour of the evolution of massive-scale data processing at Google, from the original MapReduce paradigm to the high-level pipelines of Flume to the streaming approach of MillWheel to the portable, unified streaming/batch model of Google Cloud Dataflow and Apache Beam (incubating).

Tyler also highlights similarities and differences with related open source systems such as Flink, Spark, Storm, and Gearpump, calling out ways in which they’re converging on and diverging from the Beam model and what that means when running Beam pipelines on their respective runners. Watch Video

Edu-Videos | Learn All About Apache Spark (100x Faster than Hadoop MapReduce)

Apache Spark’s popularity as part of big data analytics solutions is exploding. Spark is an open-source data analytics cluster computing framework originally developed in the AMPLab at UC Berkeley. Spark fits into the Hadoop open-source community, building on top of the Hadoop Distributed File System (HDFS). However, Spark promises performance up to 100 times faster than Hadoop MapReduce for certain applications…and that’s why you should care!

Spark’s in-memory cluster computing is very well suited to machine learning algorithms. These Videos will give you a nice introduction to Spark, how it’s being used in business and why you should care…Watch Spark Videos…

Edu-Videos | 100 Most Popular Machine Learning Talks at VideoLectures.Net

Machine learning is a subfield of computer science and artificial intelligence that deals with the construction and study of systems that can learn from data, rather than follow only explicitly programmed instructions (wikipedia).

If you are thinking of doing or becoming a Data Scientist or Advanced Analytics professional, you will absolutely need to master Machine Learning. These 100 Most Popular Talks on Machine Learning topics are a great resource to learn. Review List

Live Roundtable Today Weds 8/21 at 2pm: NoSQL, Hadoop and MapReduce: Building a Modern Data Infrastructure

On Wednesday Aug.21st at 2pm EST Join: Jeffrey Kelly, Wikibon; Joey Jablonski, Kitenga; Christopher Biow, 10gen; Ron Bodkin, Think Big Analytics; John Akred, SVDS for a 60 minute LIVE ROUNDTABLE.

Discussion: In a whirlwind of big data tools like MapReduce, NoSQL, Hadoop, and their cousins and brothers, it’s difficult to understand the stack you need to make your data as useful as possible. How do you decide which tools to use, and once you do decide, how do you make the jump? Read More

Blog Publisher / Head of Data Science Search

Founder & Head of Data Science Search at Starbridge Partners, LLC.