Building Big Data Processing Pipeline on Local Environment | by Suraj Mishra | Medium

Member-only story
Building Big Data Processing Pipeline on Local Environment
Reading, validating, and writing local uber eats restaurants file using apache beam local runner
Suraj Mishra
·Follow
4 min read·
Sep 7, 2022
--
Photo by JJ Ying on UnsplashIntroductionIn this blog we will setup data processing pipeline on local environment using Apache Beam .
We will use Kaggle’s Uber Eats Dataset, validate it and write to output file.
We will finally deploy it on local environment.
Step1: Download Template ProjectLets download template code for apache beam . below command would create directory for word-count-beam example , on top of which we can write our own pipeline logic.
Step2: Coding PipelineCreate LocalFileReader.java File & Add main method that will be entry for your code
Add Options Interface that take runtime parameters for you pipeline.
finally code your pipeline as per your business logic.
--
--
Written by Suraj Mishra1.2K Followers
·77 Following
Staff Software Engineer @PayPal ( All opinions are my own and not of my employer )
No responses yet
Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams