Member-only story

Building Big Data Processing Pipeline on Local Environment

Reading, validating, and writing local uber eats restaurants file using apache beam local runner

Suraj Mishra
4 min readSep 7, 2022
Photo by JJ Ying on Unsplash

Introduction

  • In this blog we will setup data processing pipeline on local environment using Apache Beam .
  • We will use Kaggle’s Uber Eats Dataset, validate it and write to output file.
  • We will finally deploy it on local environment.

Step1: Download Template Project

  • Lets download template code for apache beam . below command would create directory for word-count-beam example , on top of which we can write our own pipeline logic.

Step2: Coding Pipeline

  • Create LocalFileReader.java File & Add main method that will be entry for your code
  • Add Options Interface that take runtime parameters for you pipeline.
  • finally code your pipeline as per your business logic.

--

--

Suraj Mishra
Suraj Mishra

Written by Suraj Mishra

Staff Software Engineer @PayPal ( All opinions are my own and not of my employer )

No responses yet