50+ Open Source Tools for Big Data

Open source software tools have become all the rage, especially around big data and that is a GOOD thing. It allows for many players to work off of the same code base to build more add-on tools and it’s cheap and easy for the masses to get set up and use them. Hadoop, R, Cassandra, Mongo DB, Neo4i and HBase are among the most popular, but there are many more.

I have accumulated 3 lists that are very popular. Please let me know if you see things missing and I’ll attempt to create one large master list and post it on the site. Read More…

Read Article →

Guide | How To Obtain a FREE “Open Source” Masters in Data Science

A FREE Masters in Data Science. More and more people are learning on-line via the flood of excellent “open source” resources of classes, ebooks, software, etc. Clare Corthell has created a website to allow anybody to take virtually the same curriculum offered for a Masters in Data Science for Free.

Will it be an official Masters? No, but an official Masters is not always what is needed. Often its the knowledge and experience working with the tools and techniques necessary to actually do Data Science. For some, this free curriculum will allow business-line leaders, Analysts and Programmers from other fields to fill in the education gaps and get better at their job, as well as, one step closer to being an actual Data Scientist. Read More

Read Article →

Edu-Video | What has Kaggle learned from 2 Million Machine Learning models

Kaggle is a community of almost 450K data scientists who have built nearly 2 million machine learning models to participate in its competitions. Data scientists come to Kaggle to learn, collaborate and develop the state of the art in machine learning. This talk will cover some of the lessons on winning techniques we have learned from the Kaggle community. Watch Video

Read Article →

LIST | 500+ Data Science Degrees and Certificates from Universities (via Data Science Central)

“This list of 500+ was started in 2012, updated in 2014 and also very recently according to the author. It was compiled by 101.datascience.community, and broken down by degree (master / bachelor / certificate / doctorate) and location (online / on-site.)” – Source Data Science Central Read More

Read Article →

Edu-Videos | Learn All About Apache Spark (100x Faster than Hadoop MapReduce)

Apache Spark’s popularity as part of big data analytics solutions is exploding. Spark is an open-source data analytics cluster computing framework originally developed in the AMPLab at UC Berkeley. Spark fits into the Hadoop open-source community, building on top of the Hadoop Distributed File System (HDFS). However, Spark promises performance up to 100 times faster than Hadoop MapReduce for certain applications…and that’s why you should care!

Spark’s in-memory cluster computing is very well suited to machine learning algorithms. These Videos will give you a nice introduction to Spark, how it’s being used in business and why you should care…Watch Spark Videos…

Read Article →