
Closed
Posted
Paid on delivery
I need someone that can go to the 350k Home Depot links below and extract the links that are associated with the Frequently Brought Products Link. Sample Link: [login to view URL] While on this page, notice that there is a section called Frequently Bought Together. It includes the current item plus ~5 other items. What I want is the URL for each of the Frequently Bought Together items The CSV file result should have 6 columns Column 1: Original Product URL Column 2: 1st Product URL Column 3: 2nd Product URL Column 4: 3rd Product URL Column 5: 4th Product URL Column 6: 5th Product URL NOTE: Not every product will have a Frequently Bought Together section. That is OK. For those that do, often the first item listed is the current item. Example: Column 1: [login to view URL] Column 2: [login to view URL] Column 3: [login to view URL] Column 4: [login to view URL] Column 5: [login to view URL] Column 6: [login to view URL] I do not need pricing or any other details If you need a zipcode to use, use 40511. Once you finish, check a few of the items that have no Frequently Bought Together items to confirm that it was not missed. Do not bid on this until you have reviewed the attached file. If you do bid, put at the top of the bid submission, "I have reviewed the notes and file and I can do this." Let me know how soon you can do it.
Project ID: 40477509
33 proposals
Remote project
Active 2 days ago
Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
33 freelancers are bidding on average $284 USD for this job

I have reviewed the notes and file and I can do this. Hi, I can extract the “Frequently Bought Together” product URLs from all 350k HomeDepot product pages and deliver the results in the exact CSV structure you requested. I have extensive experience with large-scale scraping projects using Python, Selenium, Requests, BeautifulSoup, and rotating proxies. I regularly handle high-volume datasets and dynamic ecommerce pages while maintaining clean, structured outputs. For this project I will: * Process all provided product URLs * Extract up to 5 Frequently Bought Together product links per page * Preserve blank columns where the section is missing * Validate samples from “empty” rows to ensure nothing was missed * Deliver a clean CSV with the required 6-column format I can also normalize/remove tracking parameters if preferred, though I can keep the full URLs exactly as shown in your example. To avoid blocking/rate limits on a dataset this large, I’ll use controlled concurrency, retries, session handling, and proxy rotation where needed. I’ve completed many similar ecommerce extraction projects involving related/recommended products, category relationships, and large-scale URL collection. Ready to start immediately. Best regards
$350 USD in 2 days
8.1
8.1

I have reviewed the notes and file and I can do this. Extracting your 350k Home Depot links... I see you need a clean 6-column CSV containing the 'Frequently Bought Together' URLs from a massive dataset, completely ignoring pricing details and localized to zipcode 40511. Standard scrapers will get IP-banned trying to ping Home Depot 350,000 times. Here is how I will actually execute this for you: Extraction & Proxies: I will build a custom Python extraction script utilizing a network of rotating residential proxies to completely bypass Home Depot's aggressive anti-bot security. Data Parsing & Localization: The script will inject the 40511 zipcode session cookie and specifically target the 'Frequently Bought Together' DOM elements. QA & Delivery: I will process the output into your exact 6-column CSV format. As requested, I will manually QA a sample of the empty rows to guarantee no data was missed during the extraction. Quick question: Could you share a tiny 10-link sample from your file in the chat so I can run a quick, free extraction test and verify the 6-column output format for you?
$275 USD in 5 days
6.2
6.2

I have reviewed the notes and file and I can do this. Hi, ★★★ Python Expert ★★★ 8+ Years of Experience ★★★ I can extract the Frequently Bought Together product links with a structured CSV output. This will include: - Scraping the specified Home Depot links. - Compiling the URLs into a CSV format with the required columns. - Validating the extraction for accuracy. I will use Python with libraries like BeautifulSoup and Pandas for efficient data handling. Ready to start once you provide the access details. Thanks!
$400 USD in 7 days
5.5
5.5

Hi, I have reviewed the notes and file and I can do this. I can build a high-speed extraction workflow to process all 350k Home Depot URLs and capture the “Frequently Bought Together” product links into the exact 6-column CSV structure you described. My approach: - Python + Playwright/Scrapy hybrid for scalable extraction - Request throttling, retry handling, and rotating sessions to avoid blocking - Parallel processing with checkpoint/resume support for large-volume runs - Validation pass on products with empty results to reduce false negatives - Clean CSV output with fixed column formatting I can start immediately and provide a sample extraction early in the process for verification.
$150 USD in 2 days
5.2
5.2

I have reviewed the notes and file and I can do this. Hi, I understand the task clearly: extract Frequently Bought Together product URLs from the supplied Home Depot product links and return a clean CSV with the original URL plus up to five associated product URLs. I have experience with large scale web scraping, product page parsing, pagination and dynamic content handling, proxy aware crawling, CSV processing, and QA checks. For this project, I would first test the sample links, identify where the Frequently Bought Together URLs are loaded from, then build a scraper that processes the full file efficiently while respecting failures, retries, and missing sections. Since there are around 350k links, I would include batching, progress logging, error handling, duplicate protection, and periodic output saves so no data is lost during the run. I will also use zipcode 40511 if location context is required and spot check products without the section to confirm they were not missed. The final CSV will contain exactly the six requested columns and no extra pricing or product details. Best, Justin
$500 USD in 7 days
4.4
4.4

I have reviewed the provided information, and I'm confident in my ability to extract the Frequently Bought Together links from the Home Depot URLs. As a highly experienced Data Entry & Web Research Expert, I've worked on similar tasks involving data extraction and web scraping before. Additionally, I have advanced skills in Excel and will ensure that your CSV file contains 6 accurate columns with all the relevant product URLs. Moreover, as an Excel Specialist, I understand the value of clean and organized data. Hence, you can rest assured that the output will be streamlined, easily accessible, and free from any errors. Another crucial aspect of this project is being able to adapt and learn on-the-go since not every product will have a Frequently Bought Together section. My ability to adapt has accurately reflected in past projects where I successfully handled such variations whilst keeping the expected quality intact. I'm ready to begin immediately and ensure timely delivery without compromising the quality or accuracy of my work. If you choose to work with me, you'll receive dependable service, consistent updates about the project progress, and reliable results. Let's get started!
$275 USD in 1 day
4.5
4.5

Yo! I have reviewed the notes and file and I can do this. I can extract the “Frequently Bought Together” product URLs for all 350k Home Depot product pages, organize them into the required 6-column CSV format, and perform validation checks on products without recommendations to ensure data accuracy. I have experience with large-scale web scraping, handling dynamic content, and delivering clean, verified datasets. Please let me know your preferred timeline, and I can provide an estimated completion schedule immediately.
$380 USD in 2 days
4.8
4.8

I have reviewed the notes and file and I can do this. For each of the 350,000 URLs I will parse the __NEXT_DATA__ JSON blob embedded in the page source. Home Depot injects the full page state including Frequently Bought Together product data directly into the HTML, so I do not need a headless browser. This keeps the scraper fast and lightweight. I will use rotating residential proxies with per request IP cycling to avoid blocks, and a coordinator worker architecture that saves progress to a checkpoint file after every batch so the job can resume from any crash without reprocessing completed URLs. The output will be a CSV with six columns as specified. Original URL in column one, then up to five FBT product URLs in columns two through six. Rows with no FBT section will have the original URL in column one and empty cells for the rest. Once complete I will spot check a sample of the empty rows by manually verifying a few pages to confirm nothing was missed. Main technical challenges are IP blocking and rate limits, which I handle with residential proxy rotation, and scale management across 350K URLs, which I handle with batched processing and streaming CSV writes to avoid memory issues. Timeline is approximately 3 to 4 days depending on proxy throughput.
$275 USD in 2 days
4.1
4.1

I have reviewed the notes and file and I can do this. With expertise in Web Scraping and Data Extraction using Python, I specialize in navigating complex data structures and deliver efficient results and scalable solutions. Let’s connect to discuss further
$250 USD in 3 days
4.2
4.2

"I have reviewed the notes and file and I can do this." Hello, I understand the operation: process a list of 350,000 Home Depot URLs, visit each page to locate the "Frequently Bought Together" section, and extract the URLs for up to five associated products. The system will handle cases where this section is missing and compile the data into a 6-column CSV, mapping each source URL to its corresponding related product URLs. Technical approach: I will build a resilient Python scraper using Scrapy and BeautifulSoup. The script will leverage a pool of rotating residential proxies and user agents to navigate Home Depot's bot detection effectively. I'll configure requests to use the 40511 zip code for session consistency. The process will be resumable to handle interruptions. Core modules: - URL Queue Manager: Ingests and manages the 350k URL list. - Scraping Worker: Executes parallel HTTP requests with appropriate headers and proxies. - HTML Parser: Isolates the target section using precise CSS selectors and extracts the product URLs. - CSV Writer: Structures and appends the results to the final output file, handling rows with missing data correctly. Implementation strategy: First, I will build and validate the parser on a small batch of URLs, including cases with and without the required section. Once confirmed, I will scale the operation to process the full list. I estimate the entire extraction and data validation will take 3-4 days to complete. A final QA check will be performed on random samples to ensure accuracy. Regards, Rohit
$150 USD in 4 days
3.4
3.4

I have reviewed the notes and file and I can do this. I’ve built large-scale product URL extraction pipelines for ecommerce sites, including dynamic sections rendered by JavaScript, recommendation widgets, pagination, retries, and CSV exports. I’ve also worked on similar scraping jobs where the output needed only related product URLs, with validation checks for pages where the target module was missing. For this Home Depot task, I can process the 350k product URLs, extract up to five Frequently Bought Together product links per original URL, use zipcode 40511 where location context is needed, and return a clean CSV with the exact six-column structure requested. I’ll include retry handling, rate limiting, proxy/browser automation if required, and sample QA checks to confirm that pages marked as having no Frequently Bought Together section were not missed. Best regards, George
$250 USD in 7 days
3.6
3.6

I have reviewed the notes and file and I can do this. Hello, I understand you need approximately 350,000 Home Depot product URLs processed to extract all URLs appearing in the "Frequently Bought Together" section and return them in a clean 6-column CSV structure. I have experience with large-scale web data extraction, product-page parsing, and automated data collection. I can build a reliable extraction workflow that: • Processes all supplied Home Depot product URLs • Uses ZIP code 40511 when required • Captures up to 5 Frequently Bought Together product URLs per item • Preserves the original product URL in Column 1 • Leaves blank cells when no Frequently Bought Together section exists • Removes duplicate outputs where appropriate • Performs quality-control checks on records with no matches to verify nothing was missed • Delivers a clean CSV ready for import or analysis Before final delivery, I will manually verify a sample of products both with and without Frequently Bought Together sections to ensure extraction accuracy. Estimated turnaround depends on the final file size and access restrictions, but after reviewing the dataset I can provide a precise completion schedule immediately. I am ready to begin as soon as the source file is shared. Best regards.
$150 USD in 2 days
3.5
3.5

I have reviewed the notes and file and I can do this. Hi, I reviewed the attached file and understand the requirement clearly. You have approximately 350,000 Home Depot product URLs, and for each URL I will extract the associated “Frequently Bought Together” product links and return them in the required CSV format: * Column 1: Original Product URL * Column 2: 1st Frequently Bought Together URL * Column 3: 2nd Frequently Bought Together URL * Column 4: 3rd Frequently Bought Together URL * Column 5: 4th Frequently Bought Together URL * Column 6: 5th Frequently Bought Together URL My approach: ✔ Build an automated extraction workflow to process the URLs at scale ✔ Use the provided ZIP code (40511) when required ✔ Capture only the associated product URLs as requested (no pricing or extra data) ✔ Handle products that do not contain a Frequently Bought Together section without generating false matches ✔ Perform validation checks on samples where no recommendations are found to ensure nothing was missed ✔ Deliver a clean CSV ready for immediate use I have experience with large-scale web data extraction, structured CSV generation, data validation, and quality assurance workflows where accuracy is more important than raw scraping speed. Based on the size of the dataset, I can begin immediately I would be happy to discuss batching, validation requirements, and delivery expectations before starting.
$255 USD in 3 days
2.4
2.4

I have reviewed the notes and file and I can do this. Hello, I can extract the Frequently Bought Together product URLs from the 350k Home Depot links using Python, web scraping, data processing, CSV handling, and validation checks. I have experience building scraping and data extraction workflows for large URL lists, including handling missing sections, dynamic page content, retries, rate limits, proxies, and clean CSV output. For this task, I’ll use zipcode 40511, visit each product page, detect the Frequently Bought Together section, capture up to 5 related product URLs, and leave blanks where no items exist. I’ll also add logging so failed or blocked pages can be retried, and I’ll manually/sample-check several products with no FBT results to confirm the section was truly missing and not skipped by the scraper. Depending on site blocking and request speed, I can complete an initial run in a few days and provide clean final CSV files. Best, Smit
$300 USD in 5 days
2.0
2.0

I have reviewed the notes and file and I can do this. Hi, I've worked on Home Depot extractions before, so I'm familiar with how their pages load. The Frequently Bought Together block isn't in the static HTML — it's pulled from their product API/GraphQL endpoint, which is exactly where the MERCH=REC--fbt_test--[productID]-_-[position] links you showed in your example come from. I'll hit that data layer directly using the zip 40511 for store context, so the extraction is fast and reliable across all 350k links rather than rendering every page in a browser. My output will match your format precisely: Column 1 the original product URL, Columns 2–6 the FBT item URLs in their listed order, including the current item when it appears first. Where a product has no FBT section, those cells stay blank — no errors, no skipped rows. I'll also do the verification step you asked for: after the run, I'll manually pull a sample of the "blank" rows and confirm the section genuinely doesn't exist on those pages, so nothing is missed due to a parsing gap. I'll run a small test batch (a few hundred links) first so you can confirm the format and accuracy before I commit the full 350k. For that volume I'd plan throttling and rotation to stay clean and avoid blocks, then deliver one final CSV. Timeline: test batch within 24 hours of starting, and the complete 350k file within 4–6 days depending on rate limits. Happy to adjust the columns or naming if you need anything tweaked. Regards, Diego
$200 USD in 7 days
1.4
1.4

I have reviewed the notes and file and I can do this. Hi There, I can develop and execute a high-performance web scraping and data engineering solution to extract the Frequently Bought Together links for your 350k Home Depot URLs. With over fifteen years of commercial experience as a Senior Data Analyst and Engineer, I specialize in large-scale data extraction, custom ETL pipelines, and structural automation. My technical expertise includes building efficient Python-based scrapers using tools like Asyncio, Playwright, and BeautifulSoup, which are carefully optimized to handle anti-bot frameworks, geographical constraints (using your specific 40511 zip code), and large volumes of data without crashing. For a massive dataset of 350k URLs, I will build a multi-threaded or asynchronous extraction pipeline that parses each product page, isolates the Frequently Bought Together carousel container, and maps the associated links into your exact 6-column CSV structure. I will also incorporate strict validation scripts to re-verify entries with empty outputs, ensuring that pages marked as having no associated items are genuinely blank rather than missed due to a timeout. I can deliver the complete, verified 6-column CSV file alongside a summary report within 4 to 5 business days. Let's contact to discuss details. Solution Vector Roman Khakhula
$170 USD in 7 days
0.8
0.8

I have reviewed the notes and file and I can do this. I checked the sample URL and I see the Frequently Bought Together block is often populated client side and includes MERCH query strings you want preserved. That means a simple HTML fetch will miss items on some pages. If the FBT is loaded via an XHR or script the reliable approach is to hit the same API or render the page headlessly and capture the six product links, or read the FBT payload if available. I built the CrowdAxis ETL that pulls and normalizes ten external sources into a unified schema so I am used to reliable scraping and deobfuscating client side payloads. Planned approach 1. Confirm source CSV and sample 100 rows 2. Detect whether FBT comes from an XHR endpoint or page render 3. If endpoint exists call it directly otherwise render with a headless browser 4. Extract up to five FBT URLs per product and output CSV with six columns and a 100 row sample for approval 5. Spot check items with no FBT to confirm accuracy I will use zipcode 40511 and validate negatives. I can deliver the full CSV in five days for 275 USD. Please confirm where the 350k URL file is and whether you want the MERCH query strings kept exactly as in the example.
$275 USD in 7 days
0.0
0.0

Hello there, I’ve read your note about extracting Frequently Bought Together links from Home Depot product pages. I’m an independent developer with strong hands-on experience in data extraction, web scraping, and data transformation. I specialize in turning messy product data into clean, structured outputs that teams can immediately reuse for analytics and catalogs. I’ve tackled similar tasks, scraping product pages, identifying cross-sell sections, and exporting multi-column CSVs with consistency and accuracy using Python, BeautifulSoup/Scrapy, and robust error handling. I will programmatically visit each URL, detect the Frequently Bought Together block, and extract up to five related URLs when present, preserving the original URL as Column 1 and filling subsequent columns with the discovered links. If a page lacks that section, I’ll leave the corresponding cells blank. I can handle the work based on my experience, deliver a clean CSV with the exact 6-column format, and ensure accuracy through spot checks. I can start quickly and target a 2-4 day turnaround depending on the final list size. Please feel free to contact me so we can discuss more details. I am looking forward to the chance of working together. Best regards, Billy Bryan
$305 USD in 1 day
0.0
0.0

Lexington, United States
Payment method verified
Member since Apr 6, 2011
$125 USD
N/A
$100-300 USD
$30-250 USD
N/A
$30-250 USD
$10-30 USD
₹12500-37500 INR
$250-750 USD
₹100-400 INR / hour
$30-250 USD
$15-25 USD / hour
$10-5000 USD
₹12500-37500 INR
£250-750 GBP
₹750-1250 INR / hour
₹1500-12500 INR
$150-400 USD
$750-1500 USD
₹12500-37500 INR
$250-750 USD
₹600-1500 INR
$10-5000 USD
₹750-1250 INR / hour
₹750-1250 INR / hour