
Suoritettu
Julkaistu
Maksettu toimituksen yhteydessä
I need an end-to-end ML experiment to predict duplicate customer records in a financial dataset from Kaggle. The goal is to build a proactive classification model that flags likely duplicates before they reach reporting, analytics, or risk pipelines. The workflow should include data loading, EDA, synthetic duplicate labelling (since labels won’t exist), feature engineering, model training, and evaluation. Duplicate pairs will be created using techniques like exact duplication, small perturbations, and formatting inconsistencies. Features should include exact matches, numeric differences (age, income, spending), and similarity measures. Models to test include Logistic Regression, Random Forest, Gradient Boosting, XGBoost, or similar, but deliver one final tuned model. Evaluation should focus on F1-score (target ≥0.85), with a balance between precision and recall. Deliverables: reproducible notebook, clean code, short report, and README.
Projektin tunnus (ID): 40330729
5 ehdotukset
Etäprojekti
Aktiivinen 20 päivää sitten
Aseta budjettisi ja aikataulu
Saa maksu työstäsi
Kuvaile ehdotustasi
Rekisteröinti ja töihin tarjoaminen on ilmaista

Hey, I have extensive experience working in the Fintech Domain as a Applied ML Engineer and Data Scientist, since last 6+ years. I can complete your task and also provide you with report in less than 1 day.
$70 USD 1 päivässä
2,9
2,9
5 freelancerit tarjoavat keskimäärin $44 USD tätä projektia

Hello, With over 7 years of experience in Excel, Data Science, Data Visualization, Statistical Analysis, and Statistics, I have the expertise to handle your project efficiently. I have carefully reviewed the requirements for the project. To address the predictive data quality modeling for financial customer data using machine learning, I will begin by loading the dataset from Kaggle and performing exploratory data analysis (EDA). Synthetic duplicate labeling will be implemented due to the absence of labels. Feature engineering will involve creating features based on exact matches, numeric differences, and similarity measures. The workflow will include model training and evaluation using techniques like Logistic Regression, Random Forest, Gradient Boosting, XGBoost, or similar algorithms to develop a tuned model. Evaluation will focus on achieving an F1-score of ≥0.85, balancing precision and recall. The deliverables will include a reproducible notebook, clean code, a concise report, and a README file detailing the project setup. I would like to discuss this project further with you. Please connect with me via chat for a detailed conversation. You can visit my profile at https://www.freelancer.com/u/HiraMahmood4072 Thank you.
$36 USD 2 päivässä
6,4
6,4

Hi there, I am A.R.M. MASUD, with a strong Data Science background. As a Python developer, I have extensive experience building robust, scalable, and efficient solutions that address various business needs. I understand the importance of delivering high-quality, well-architected code, and I am committed to working closely with you to ensure the success of this project. I implement core functionality using Python, utilizing relevant libraries and frameworks such as Pandas, NumPy, GUI, SciPy, Matplotlib, Seaborn, Plotly, Scikit-learn, TensorFlow, Keras, PyTorch, spaCy, Flask, Django, FastAPI, OpenCV, and Jupyter. I am a professional responsible for extracting actionable insights and knowledge from large volumes of data through Machine Learning models, including CNNs, RNNs, LSTMs, GANs, Transformers, FNNs, ANNs, and DNNs. I conduct comprehensive unit, integration, and performance testing to ensure the solution is error-free and optimized. https://www.freelancer.com/u/MZITSERVICES I appreciate the opportunity to submit this proposal and am excited about the possibility of working with you to bring your project to life. Thanks A.R.M MASUD
$40 USD 7 päivässä
4,7
4,7

Your duplicate detection challenge needs synthetic labeling since real financial datasets won't have duplicate flags. I'd start by loading your Kaggle dataset, creating controlled duplicates through exact matches and small perturbations (typos, formatting changes), then engineer similarity features like Levenshtein distance, numeric differences, and exact match indicators. XGBoost typically performs well for this type of classification with proper hyperparameter tuning. I built a price aggregation engine that tracks 800+ products across multiple stores, handling fuzzy matching and duplicate detection for similar products with slight naming variations. The pattern recognition work translates directly to customer record deduplication. You can see my automation projects at ffulb.com. Can deliver the complete notebook, tuned model hitting your F1≥0.85 target, and documentation within a week. Ready to start immediately.
$28 USD 2 päivässä
1,5
1,5

South Africa
Maksutapa vahvistettu
Liittynyt maalisk. 20, 2026
$25 AUD
$15-25 USD/ tunnissa
₹12500-37500 INR
€12-18 EUR/ tunnissa
₹12500-37500 INR
$30-250 USD
₹1500-12500 INR
$2-8 USD/ tunnissa
€12-18 EUR/ tunnissa
$25 AUD
$250-750 USD
$30-250 USD
$30-250 USD
$15-25 USD/ tunnissa
₹1500-12500 INR
$15-25 USD/ tunnissa
₹600-1500 INR
$10-30 USD
₹750-1250 INR/ tunnissa
₹37500-75000 INR