Hadoop at Yahoo!

Introduction

Apache Hadoop* is an open source Java framework for processing and querying vast amounts of data on large clusters of commodity hardware. Hadoop is a top level Apache project, initiated and led by Yahoo!. It relies on an active community of contributors from all over the world for its success.

With a significant technology investment by Yahoo!, Apache Hadoop has become an enterprise-ready cloud computing technology. It is becoming the industry de facto framework for big data processing.

The Hadoop project is an integral part of the Yahoo! cloud infrastructure — and is the heart of many of Yahoo!’s important business processes.

We run the world's largest Hadoop clusters, work with academic institutions and other large corporations on advanced cloud computing research and our engineers are leading participants in the Hadoop community.

Yahoo! sponsors the Annual Hadoop Summit and the monthly Hadoop User Group.

What’s new from Yahoo!?

Hadoop with security

Hadoop with security is a significant update to Apache Hadoop. This update integrates Hadoop with Kerberos, a mature open source authentication standard.

Hadoop with security:

  • Prevents unauthorized access to data on Hadoop clusters
  • Authenticates users sharing business sensitive data
  • Reduces operational costs by consolidating Hadoop clusters
  • Collocates data for new classes of applications

Hadoop with security is available for download here.

Oozie – Yahoo!'s workflow engine for Hadoop

Oozie, Yahoo!'s workflow engine for Hadoop is an open-source workflow solution to manage and coordinate jobs running on Hadoop, including HDFS, Pig and MapReduce.

Oozie was designed for Yahoo!’s complex workflows and data pipelines at global scale. It is integrated with the Yahoo! Distribution of Hadoop with security and is a primary mechanism to manage complex data analysis workloads across Yahoo!.

Oozie is available for download here.

Learn More



* Apache and Hadoop are trademarks of the Apache Software Foundation