
Closed
Posted
This is a six-week, fully remote contract for someone who can jump in immediately and own the data-modelling stream of an ongoing Databricks implementation across India-based teams. My environment already sits on Databricks with Delta Lake and Unity Catalog enabled, and I need a seasoned modeller who can translate business requirements into robust Star and Snowflake schemas, then bring them to life with PySpark and advanced SQL. You will refine our Medallion architecture (Bronze → Silver → Gold), implement both Type 1 and Type 2 SCD strategies, and tune the pipelines for speed through smart partitioning and other optimisation techniques. The datasets involved are large, structured and semi-structured, so hands-on experience handling such volumes in Databricks is essential. Key deliverables • Logical and physical data models documented and version-controlled • PySpark notebooks / SQL scripts that create the Star and Snowflake tables in Delta Lake under Unity Catalog governance • Proven SCD Type 1 & 2 routines integrated into the Medallion layers • Performance benchmark report showing throughput gains from optimisation work • A short hand-off session (recorded) walking the team through design decisions and next steps If you have 5–10 years of data-engineering experience, can start right away, and have shipped at least one end-to-end Databricks project featuring Delta Lake and Unity Catalog, I’d love to review your profile and discuss how quickly we can get you onboarded.
Project ID: 40425618
13 proposals
Remote project
Active 5 days ago
Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
13 freelancers are bidding on average ₹1,054 INR/hour for this job

Your Medallion architecture will fail performance SLAs if the Silver-to-Gold transformations aren't partitioned correctly - I've seen pipelines grind to a halt when Type 2 SCD logic scans full tables instead of using Z-ordering on surrogate keys. That bottleneck alone can push processing windows from 2 hours to 12. Before I map out the modelling approach, two quick questions. First - what's your current data volume at the Gold layer, and are you seeing any specific query patterns that timeout? Second - does your Unity Catalog setup enforce column-level masking, or are we working with table-level permissions only? This affects how I'll structure the dimensional hierarchies. Here's the architectural approach: - STAR & SNOWFLAKE SCHEMAS: Design fact tables with composite partition keys (date + region) and dimension tables with SCD Type 2 temporal columns, all documented in dbt-style YAML under version control so your India team can maintain it post-handoff. - PYSPARK + DELTA LAKE: Build merge operations using Delta's MERGE INTO with predicate pushdown, reducing full-table scans by 70% through partition pruning and Z-order indexing on high-cardinality join keys. - SCD TYPE 1 & 2: Implement CDC pipelines that flag row-level changes in Silver, then apply Type 1 overwrites for corrections and Type 2 inserts with effective-date ranges in Gold - I'll include idempotency checks so reruns don't duplicate history records. - UNITY CATALOG GOVERNANCE: Tag PII columns at schema registration, set up lineage tracking through notebook metadata, and configure dynamic views that auto-mask sensitive fields based on user groups. - PERFORMANCE BENCHMARKING: Deliver before/after Spark UI metrics showing shuffle reduction, partition skew elimination, and query runtime improvements - typically I achieve 3-5x throughput gains on large fact table joins. I've led four Databricks migrations in the past 18 months, including one for a fintech client processing 2TB daily through a three-layer Medallion setup. I don't take six-week sprints unless the requirements are locked and the infrastructure is provisioned. Let's schedule a 20-minute technical call to walk through your current Bronze schema and confirm there aren't any upstream data-quality issues that'll block the Silver transformations.
₹900 INR in 30 days
3.7
3.7

I’ve carefully reviewed your requirement and I totally understand—you need someone who can step directly into an active Databricks environment, strengthen the Medallion architecture, and deliver production-grade data models with optimized Delta Lake performance. Recently, I worked on a large-scale Databricks implementation involving Delta Lake, Unity Catalog, and PySpark ETL pipelines for structured and semi-structured datasets. I designed Star/Snowflake schemas, implemented SCD Type 1 & 2 pipelines, and optimized Silver-to-Gold transformations using partitioning, Z-ordering, and caching strategies, which significantly improved query throughput and processing time. I can contribute immediately across modelling, PySpark engineering, Delta optimization, and governance-aligned implementation while maintaining clean documentation and version control. A few quick questions: 1. Are pipelines currently orchestrated through Databricks Workflows, Airflow, or another tool? 2. Which business domains/data marts are highest priority for modelling? 3. Do you already have performance baselines to compare optimisation gains against? Let me know when you’re available to discuss this further I’d be happy to walk you through my approach or showcase examples relevant to this project. Looking forward to hearing from you! Best regards, Mulayam
₹750 INR in 40 days
0.0
0.0

Hello, I am a Big Data Engineer with 9 years of experience in PySpark, Python, SQL, Hadoop, AWS, and AWS Glue. I specialize in building scalable and production-grade data pipelines, ETL workflows, and optimizing big data processing solutions. I can help deliver efficient, reliable, and high-quality solutions based on your project requirements. Looking forward to working with you. Best Regards, Bhargav
₹1,000 INR in 40 days
0.0
0.0

Dear Hiring Manager, I am a Senior Data Engineer with **6 years of hands-on Databricks experience**, and your project is a strong match for my core skill set. ## What I Bring - **Delta Lake & Unity Catalog**: Designed and managed production-grade Delta Lake environments with Unity Catalog governance, including fine-grained access controls and lineage tracking. - **Data Modelling**: Deep expertise in translating complex business requirements into optimised **Star and Snowflake schemas** for both OLAP and reporting workloads. - **Medallion Architecture**: Architected and refined **Bronze → Silver → Gold** pipelines handling large-scale structured and semi-structured datasets with high reliability. ## Key Deliverables I Will Provide ✅ Logical and physical data models — documented and version-controlled in Git ✅ PySpark notebooks and SQL scripts for Star/Snowflake table creation under Unity Catalog ✅ Proven SCD Type 1 & 2 routines integrated into Medallion layers ✅ Performance benchmark report with before/after throughput metrics ## Why Choose Me I have delivered similar Databricks data modelling projects end-to-end, from initial schema design to optimised production pipelines. I communicate proactively, deliver clean and well-documented code, and can onboard quickly with minimal hand-holding. I am happy to discuss the project scope in detail and share relevant work samples before we begin. Looking forward to working with you! Best regards, surekha lengare
₹1,000 INR in 40 days
0.0
0.0

Senior Data Engineer with experience in Python, PySpark, Advanced SQL, Data Warehousing, and Big Data technologies. Skilled in designing, building, and managing scalable ETL pipelines, data modeling, and warehouse architectures for large datasets. Hands-on experience with both batch and real-time streaming pipelines using Kafka. Worked extensively with DBT for data transformation and Airflow for workflow orchestration and scheduling. Strong expertise in developing reliable data pipelines, optimizing query performance, and handling structured and semi-structured data processing. Experienced in end-to-end data engineering workflows, pipeline automation, and delivering scalable data solutions for analytics and business intelligence needs.
₹1,000 INR in 40 days
0.0
0.0

Hi, I am interested for this job and want to work with you. Please, visit my website www.vanecus.com. If you are interested then we discuss about rate. Thanks and regards Md. Abdul Latif Dhaka, Bangladesh.
₹1,000 INR in 40 days
0.0
0.0

Hi, This project aligns very closely with my recent experience building large-scale data platforms and production data pipelines at Uber and JPMorgan. Over the past several years, I’ve worked extensively on Spark-based batch and real-time systems, data lake architectures, large-scale analytics pipelines, schema design, and performance optimization for high-volume datasets. My experience includes building scalable ingestion and transformation systems using technologies such as Spark, Kafka, Hive, Flink, Hudi, and distributed cloud infrastructure. I’m particularly comfortable with: * Medallion-style data architectures * Star and Snowflake schema design * SCD Type 1 & Type 2 implementations * PySpark-based transformation pipelines * Delta Lake style storage patterns and optimization * Performance tuning for large structured and semi-structured datasets I also place strong emphasis on maintainability, documentation, and designing systems that downstream teams can reliably operate and extend. Given my current commitments, I’d be best aligned contributing as a senior data engineering lead/consultant with strong ownership around modelling, architecture, optimization, and implementation guidance. Happy to discuss the current implementation state, expected scale, and how I can help accelerate delivery. Best, Pushpendra
₹1,200 INR in 20 days
0.0
0.0

Hello, I am a Data Engineer with 5+ years of experience in SQL, PL/SQL, Python, Big Data technologies, and Azure cloud-based data engineering solutions. I specialize in building scalable ETL pipelines, cloud data architectures, and analytics solutions using Azure Databricks, Azure Data Factory, Azure Synapse Analytics, Azure Data Lake, and Azure SQL Database. My expertise includes: • Azure Databricks (PySpark, Spark SQL) • Azure Data Factory (ADF) orchestration • ETL development and data transformation • Delta Lake and data modeling • Batch and real-time data processing • Performance optimization for Spark and SQL workloads • CI/CD pipelines using Azure DevOps and Terraform • Cloud migration and enterprise-scale data solutions I have experience working closely with business analysts, data scientists, and engineering teams to deliver reliable, scalable, and cost-effective solutions aligned with business requirements. I focus on: ✔ Clean and optimized code ✔ Scalable architecture design ✔ Timely delivery ✔ Clear communication ✔ High-quality solutions I would be happy to discuss your project requirements and help deliver efficient data engineering solutions tailored to your business needs. Looking forward to collaborating with you. Best Regards, Varun
₹1,000 INR in 40 days
0.0
0.0

Navi Mumbai, India
Member since Apr 27, 2026
₹1100-1500 INR / hour
₹400-750 INR / hour
$250-750 USD
₹12500-37500 INR
₹1500-12500 INR
₹12500-37500 INR
£250-750 GBP
₹12500-37500 INR
₹600-1500 INR
₹400-750 INR / hour
₹600-1500 INR
$250-750 USD
$30-250 USD
₹1500-12500 INR
₹1500-12500 INR
₹12500-37500 INR
$30-250 USD
$14-60 NZD
$250-750 CAD
₹750-1250 INR / hour
€30-250 EUR
$15-25 USD / hour