Data Science

February 9, 2014 · by Ted O’Brien · in Case Studies, Data Resources & Tools, Hadoop, News Articles, Top Ranked. ·

By: Jack Vaughan : Jack Vaughan is SearchDataManagement’s news and site editor. Email him at [email protected], and follow them on Twitter: @sDataManagementOriginal Article Source

hadoop 2

Jack writes: It was clear when 2013 began that open source Hadoop was entering a new phase. It had moved from its original roots in large-scale, Yahoo-style Web applications and was appearing in analytical pilot projects across a variety of enterprises. During the year, software companies worked to add features to the Hadoop data platform in order to enable its wider use in production. As the spotlight shone on the software often represented by a small elephant, SearchDataManagement endeavored to cut through the hype that can obscure the real trends.

Our editors have reviewed our most popular Hadoop-related stories this year, and taken together, they form a narrative of Hadoop in 2013. The content followed the path of Hadoop and related software tools, such as HBase, as they gained footholds in the enterprise. We also saw flurries of product activity, including new Hadoop distributions from major IT vendors. From mid-year to year’s end, a new version of the platform known as Hadoop 2.0 — complete with enterprise enhancements — gained attention.

Hadoop helps bring big data into a data warehouse. During the year, Hadoop implementers in greater numbers began to place their systems into workflows attached to existing enterprise data warehouses. The effect has been particularly noted on established extract, transform and load, or ETL, architectures. Hadoop’s ability to stage data with less reliance on full-scale up-front schemas has in some cases been a plus.

See these related stories:
Confronting MapReduce, Hadoop complexities
Hadoop’s move up from the developer’s sandbox
Expanded Hadoop use cases will drive need for new enterprise features

Big data, fast: Avoiding Hadoop performance bottlenecks. Experience with Hadoop in the field shows that more than just “some assembly is required.” In very many shops, Hadoop needs tweaking and enhancements to meet enterprise needs. Our reporting also indicated the Hadoop-style of data processing that worked well at Google and Yahoo may not be the cure for every company’s problem.

See these related stories:
Google’s big data infrastructure: Don’t try this at home
Mind the hype in choosing Hadoop technology

EMC, Intel unveil new Hadoop distributions, but how many is too many? If Hadoop was wanting in some areas, there was no shortage of vendors ready to fill in with product improvements. Notably, the year witnessed IT heavyweights EMC and Intel entering the Hadoop Derby. Easier configuration was often a hallmark of the product enhancements.

See this related story:
Evolving Hadoop ecosystem presents new ways to program big data apps

Enterprise Hadoop will need to work with existing processes. In June, the Hadoop Summit in San Jose, Calif., was a coming-out party of sorts for Hadoop 2.0. Spotlighted in this new version is YARN (for Yet Another Resource Negotiator), whose offbeat name belies a significant upgrade that expands Hadoop’s application into undertakings formerly limited to batch processing schemes. This and other bells, whistles and add-ons further targeted Hadoop for use in the enterprise. New features bring new capabilities but also new challenges.

See these related stories:
Hadoop 2 release adds new issues to consider
Big data applications require new thinking on data integration
Where is Apache Hadoop heading?

Security services company uses HBase to calm data downpour. User experiences show that Hadoop’s use can go beyond analytics to include operations. For Omaha, Neb.-based managed security provider Solutionary Inc., Hadoop did a bit of both. As described by the company’s software engineering director, Hadoop and its compatriot HBase columnar database proved to be a worthy alternative to the ever-expanding use of Oracle Database RAC. For another company, Hadoop and HBase were seen as high-horsepower open source alternatives to a proprietary rules-based system.

See this related story: teams work together to use Hadoop framework for DNA app

Jack Vaughan is SearchDataManagement’s news and site editor. Email him at [email protected], and follow them on Twitter: @sDataManagement.

Data Science
© 2024, All Rights Reserved.

Quick Links

Legal Stuff

Social Media