
Open
Posted
•
Ends in 2 hours
Paid on delivery
Our nightly Databricks notebook that ingests structured data from Azure Data Lake has crept from a consistent 3-hour finish to nearly 8 hours. The slowdown is clearly happening during the data-reading phase; once the tables are in memory the transformations flow as expected. I need you to dig into the cluster and the code, pinpoint the I/O bottlenecks, and implement the fixes that will bring the total wall-clock time back to (or better than) the original 3 hours. Spark partitioning strategy, file format selection, caching, Delta Lake optimisation, cluster sizing, Auto Loader configuration—whatever combination gets us there is fair game as long as it remains maintainable and cost-conscious on Azure. Deliverables • Revised notebook (or modular scripts) with all performance-related changes clearly commented • Brief before-and-after report showing key metrics: input rows per second, stage/task durations, and total runtime • Short hand-off note outlining any cluster-level settings or schedule tweaks applied Acceptance criterion: a full test run on my workspace that completes in ≤ 3 hours while producing identical outputs.
Project ID: 40469891
4 proposals
Open for bidding
Remote project
Active 6 days ago
Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
4 freelancers are bidding on average ₹19,125 INR for this job

SolutionzHere has strong experience optimizing Databricks + Azure Data Lake pipelines involving Spark tuning, Delta Lake optimization, partition strategy redesign, Auto Loader tuning, caching, cluster sizing, and large-scale ingestion bottleneck analysis. Since your transformations are already performant, we’d focus specifically on I/O profiling, file layout analysis, shuffle behavior, small-file issues, partition pruning, and cluster telemetry to restore sub-3-hour execution while keeping Azure costs controlled. For this troubleshooting + optimization engagement, the realistic estimate is around ₹35k–₹1.2L depending on data volume, current architecture, and access scope. Typical turnaround: 3–7 days. One quick question: are the source files primarily Parquet, Delta, CSV, or mixed formats inside ADLS?
₹50,000 INR in 3 days
6.0
6.0

With my 8+ years of experience in the field of Data Analytics & Science, particularly focused on ETL and Data Engineering, I am confident that I can swiftly drill down into your Databricks framework and efficiently resolve the performance issues you have been encountering. My proficiency with Apache Airflow, Talend, AWS Glue, and Google Cloud Dataflow, combined with a sharp eye for detail, make me skilled at identifying and revamping system bottlenecks, such as the one you're experiencing with slow data reading. Additionally, my expertise in Power BI will help me create for you an informative before-and-after report which will document all key metrics: input rows per second, stage/task durations and total runtime; thus allowing us to seamlessly compare the new and improved system with the previous state. Rest assured of a thorough post-work hand-off note outlining any cluster-level settings or schedule tweaks applied for clarity on your end. I eagerly look forward to streamlining your Databricks notebook and effectively reducing your processes runtime bringing it below the 3hour target.
₹7,000 INR in 7 days
4.3
4.3

Bengaluru, India
Member since May 26, 2026
$750-1500 USD
$25-50 AUD / hour
₹1500-12500 INR
$10-30 USD
$250-750 USD
min €36 EUR / hour
$30-250 NZD
min €36 EUR / hour
₹400-750 INR / hour
£20-250 GBP
min €36 EUR / hour
₹1500-12500 INR
₹400-750 INR / hour
₹12500-37500 INR
₹1500-12500 INR
$250-750 USD
₹100-400 INR / hour
₹12500-37500 INR
₹600-1500 INR
₹1500-12500 INR