Apache Beam

Apache Beam provides an advanced unified programming model, allowing you to implement batch and streaming data processing jobs that can run on any execution engine.

Apache Beam is:

UNIFIED - Use a single programming model for both batch and streaming use cases.
PORTABLE - Execute pipelines on multiple execution environments, including Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow.
EXTENSIBLE - Write and share new SDKs, IO connectors, and transformation libraries.

Get Started

To use Beam for your data processing tasks, start by reading the Beam Overview and performing the steps in the Quickstart. Then dive into the Documentation section for in-depth concepts and reference materials for the Beam model, SDKs, and runners.

Contribute

Beam is an Apache Software Foundation project, available under the Apache v2 license. Beam is an open source community and contributions are greatly appreciated! If you’d like to contribute, please see the Contribute section.

Blog

Feb 13, 2017 - Stateful processing with Apache Beam Feb 1, 2017 - Media recap of the Apache Beam graduation Jan 10, 2017 - Apache Beam established as a new top-level project Jan 9, 2017 - Release 0.4.0 adds a runner for Apache Apex Oct 20, 2016 - Testing Unbounded Pipelines in Apache Beam Oct 11, 2016 - Strata+Hadoop World and Beam Aug 3, 2016 - Apache Beam: Six Months in Incubation

Twitter

Tweets by @ApacheBeam