Apache Beam

Unifies batch processing and streaming processing into one unified structure to create your own Data processing pipeline.

Apache Beam = Batch + Stream

The main components are:

  • PCollections - immutable distributed data structure
  • PTransforms - main processing unit that receives input, transforms the input PCollection data and outputs another PCollection data to another PTransform
  • Pipeline Runners - typically Kubernetes engines
  • Pipeline - a Pipeline can be run on a local computer, in a Virtual Machine, in the data center, or in a service in the Cloud, such as Cloud Dataflow.

results matching ""

    No results matching ""