
Closed
Posted
Paid on delivery
I need an experienced data engineer who can take full ownership of end-to-end pipeline work on GCP. The core stack is Java, Apache Spark, Google Cloud Storage and Airflow. Your primary focus will be to design and build new pipelines that ingest data from both our transactional databases and files already landing in Cloud Storage, then transform and load it into our analytical layer. Along the way, I expect you to monitor, tune and refactor the existing Spark jobs so they keep performing efficiently as volumes grow. Because all orchestration happens in Airflow, you should be comfortable creating clear, idempotent DAGs, setting up alerts and handling retries gracefully. Strong knowledge of GCP services such as Cloud Composer, BigQuery and IAM roles will make your life—and mine—easier. I am based in Bangalore; if you are too, that’s a plus for occasional white-board sessions. Telugu fluency is another nice extra, though not mandatory. Deliverables • Production-ready Spark jobs written in Java and deployed on GCP • Airflow DAGs with parameterised configs, logging and alerting enabled • Documentation covering pipeline design, run-book for maintenance, and optimization decisions • Handover session to walk through code and deployment steps I’m ready to move quickly once I find the right fit, so please highlight similar projects you’ve delivered with this tool chain and any performance gains you achieved.
Project ID: 40325643
11 proposals
Remote project
Active 25 days ago
Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
11 freelancers are bidding on average ₹20,307 INR for this job

I have 21 years of experience, Principal Software engineer at Microsoft, extensive experince with Java and spring boot , developed scalable microservices based applications with MVC and distributed system design. I have built using apache kafka, flink, spark. I can understand and improve the codebase with deployment on GCP, create DAGs and alerting. I can also document the entire pipeline design and optimize it.
₹25,000 INR in 7 days
5.6
5.6

Hi, I am an IIT Grad, Oracle Certified Professional Java SE Developer, ex-BFSI and worked at fortune 500 companies. I will make it a reality for you. As a Java Spark Data Pipeline Engineer, I will design and build scalable pipelines using Java, Apache Spark, and Airflow to ingest data from transactional databases and Cloud Storage, transforming and loading it into analytical layers while monitoring performance and refactoring existing jobs for efficiency. Kindly click on the chat button so we can discuss and get started. Will share you my prior projects done and my resume too. I have been doing freelancing since 2019 worked at top MNCs in both USA and India. Lets connect
₹12,500 INR in 7 days
5.3
5.3

Your GCP data pipeline needs Java Spark jobs that can handle growing volumes while staying maintainable. I'd build this using modular Spark transformations with proper partitioning strategies, then wire everything through parameterized Airflow DAGs with built-in monitoring and retry logic. I built a similar high-volume system - a price aggregation engine that processes 800+ product feeds daily with automated error handling and performance optimization. The pattern of ingesting from multiple sources, transforming data, and maintaining pipeline health translates well to your transactional database and Cloud Storage setup. You can see more of my pipeline work at ffulb.com. Once I take a look at your current GCP setup and database schemas, I can start immediately. Should be straightforward but want to verify the optimal approach for your specific data volumes and transformation requirements first.
₹12,844 INR in 7 days
2.3
2.3

Hello, As I am a senior backend/data engineer, I can build or fix your Java Spark data pipeline in 5–10 days for $400. The main risk in Spark projects is not writing transformations, but keeping the pipeline reliable under real data volume. Most failures come from bad partitioning, memory-heavy joins, schema drift, slow jobs, and pipelines that work in dev but break in production. From the project title, this looks focused on Java + Apache Spark pipeline engineering, so I’d approach it as a performance and data-correctness job first. My practical approach: design the pipeline around clear ingest → transform → validate → write stages use DataFrame/Dataset patterns in Java, not messy job logic tune partitioning, caching, and join strategy based on data shape add schema validation, retry-safe writes, and job-level logging package it cleanly with Maven/Gradle for repeatable deployment A useful idea here is to add a small data-quality layer before final output so null spikes, duplicate keys, and broken records are caught early instead of silently corrupting downstream data. Thank you.
₹25,000 INR in 7 days
0.0
0.0

Hello! ? From Jaipur, Ankit sends warm greetings to you. I hope you are doing well. I came across your project and it immediately caught my attention. As a Full-Stack Developer, I have solid experience building modern, scalable web applications and I would love to help you with this project. I focus on writing clean, efficient, and maintainable code, while ensuring smooth communication and timely delivery. I would be happy to discuss your requirements in more detail and suggest the best solution for your project. Looking forward to hearing from you soon. ? Best regards, Ankit
₹20,000 INR in 3 days
0.0
0.0

Hi, How large are your current data volumes, and are you already facing performance bottlenecks with existing Spark jobs? I can take full ownership of your end-to-end data pipelines on GCP, ensuring they are scalable, efficient, and production-ready. What I Can Deliver ✔ Design & build Spark (Java) pipelines for DB + GCS ingestion ✔ Optimized transformations and loading into BigQuery/analytical layer ✔ Well-structured Airflow DAGs (idempotent, parameterized, with alerts & retries) ✔ Performance tuning & refactoring of existing Spark jobs ✔ Proper IAM roles, logging, and monitoring setup on GCP ✔ Complete documentation + runbook + handover session I have experience working with GCP data pipelines, Spark optimization, and Airflow orchestration, focusing on performance, reliability, and scalability at volume. Happy to share similar work and discuss your pipeline architecture in detail. Best regards, Ashok
₹25,000 INR in 7 days
0.0
0.0

Hi there, I'm a Senior Data Engineer. Have been building and managing ETLs for 5 years now, which run with: - Spark on Scala - PySpark - Load data from GCS (Parquet / JSON / CSV) - Orchestrated using Airflow (GCE) - Run ETL job using Dataproc on GCE / Serverless - Tuning ETL performance - Writing output to GCS / BigQuery - Managing access using IAM Let's discuss about the details. I can start the project immediately. Looking forward to hearing from you. Best regards, Charles
₹20,000 INR in 7 days
0.0
0.0

You’re looking for a Java Spark Data Engineer to build and manage scalable data pipelines on GCP using Spark, Airflow (Cloud Composer), Cloud Storage, and BigQuery. I would design and implement production-ready Spark jobs in Java to ingest data from transactional databases and Cloud Storage, transform it, and load it into your analytics layer. I’ll also create idempotent Airflow DAGs with proper scheduling, retries, logging, and alerting to ensure reliable orchestration. The pipelines will be optimized for performance and scalability, with ongoing tuning of Spark jobs as data volume grows, along with secure GCP integration and proper IAM setup. Deliverables include working Spark pipelines, Airflow DAGs, documentation, and a handover walkthrough to ensure smooth deployment and maintenance. Syed
₹20,030 INR in 7 days
0.0
0.0

Hi, I read through your requirements—especially around DB + GCS ingestion and keeping Spark jobs efficient as data grows. That’s usually where pipelines start slowing down if they’re not structured well from the beginning. My background is mainly in Java backend systems, but I’ve worked on data-heavy applications where performance and reliability were critical. I can help structure the pipelines cleanly, keep Spark jobs maintainable, and ensure Airflow orchestration (retries/logging) stays stable. Quick question — are your Spark jobs currently under load, or is this more of a fresh build? Happy to discuss. — [Your Name]
₹18,000 INR in 5 days
0.0
0.0

As a seasoned backend software engineer, skilled in Java with a solid understanding of the core stack you've specified, I am well-suited to design, develop and deliver your end-to-end data pipeline solution. My expertise in building robust Spring Boot REST APIs, which aligns with your project needs, has spanned several years and this experience will be incredibly valuable in building your analytical layer. Moreover, I have significant exposure to Apache Kafka, having implemented intricate event-driven systems and delivered notable performance improvements in high-transaction Neo4j setups. Drawing from this experience could yield immense value for your project as it involves sophisticated data ingestion and transformation operations. I am ready to hit the ground running with your project, focusing on delivering clean, optimized code and ensuring all systems are efficient as volumes grow. A firm believer in clear documentation, I will not only design a robust run-book for maintaining the pipeline but also facilitate a comprehensive handover session to ensure seamless knowledge transition. Working from Bangalore is an added benefit as occasional white-board sessions can be accommodated easily.'''
₹25,000 INR in 7 days
0.0
0.0

Secunderabad, India
Payment method verified
Member since Oct 20, 2024
₹600-1500 INR
₹600-1500 INR
₹37500-75000 INR
₹12500-37500 INR
₹600-1500 INR
₹750-1250 INR / hour
$25-50 USD / hour
$15-25 USD / hour
₹1500-12500 INR
₹1500-12500 INR
$250-750 AUD
$250-750 USD
$15-25 USD / hour
₹600-800 INR
₹600-1500 INR
$30-250 USD
$12-30 SGD
$8-15 USD / hour
$250-750 USD
₹750-1250 INR / hour
₹1500-12500 INR
$2-8 USD / hour
₹750-1250 INR / hour
£20-250 GBP
₹400-750 INR / hour