Member-only story
Building Big Data Processing Pipeline on Local Environment
Reading, validating, and writing local uber eats restaurants file using apache beam local runner
4 min readSep 7, 2022
Introduction
- In this blog we will setup data processing pipeline on local environment using Apache Beam .
- We will use Kaggle’s Uber Eats Dataset, validate it and write to output file.
- We will finally deploy it on local environment.
Step1: Download Template Project
- Lets download template code for apache beam . below command would create directory for word-count-beam example , on top of which we can write our own pipeline logic.
Step2: Coding Pipeline
- Create LocalFileReader.java File & Add main method that will be entry for your code
- Add Options Interface that take runtime parameters for you pipeline.
- finally code your pipeline as per your business logic.