
Suljettu
Julkaistu
Maksettu toimituksen yhteydessä
Hey! I’m looking to hire an experienced developer to build a universal product-detail scraping pipeline that takes a product URL (any website) and returns a complete structured product record. This is not a “simple HTML parse.” Many target sites are React/Next/Vue, load content via XHR/GraphQL, hide details behind tabs/accordions/modals, and lazy-load images/PDFs. The solution needs to reliably extract everything a human can see on the page, plus the underlying data used to render it. What the scraper must do (high level) Given a product URL, the pipeline should: Load the page like a real user (handle cookies/overlays). Capture all content from multiple sources (DOM + network + interactions). Use GPT API strategically to increase accuracy (field mapping, variant extraction, doc classification, completeness checks). Output a strict, validated JSON record + optional Excel export. Data fields I need to extract (core) Required output fields: Product name manufacturer / brand description (clean, human-readable) images[] (high quality URLs, deduped; include context/alt when possible) documents[] (PDF/spec sheets/install guides/warranties/BIM/etc., classified) options[] / variants (SKUs if available; option dimensions like color/size/material; availability if available) attributes{} (everything else: specs, dimensions, sustainability/certifications, compliance info, finish codes, etc.) Additionally (for completeness & auditability): Full rendered page text: pageText Sectioned text: headings/paragraphs/lists/tables: pageTextSections Structured data capture: JSON-LD + embedded state blobs (e.g., __NEXT_DATA__) + meta tags Network payload evidence: selected API responses that contain product truth (saved with URL + snippet/hash) Provenance per field: source + confidence + evidence snippet Universal extraction approach I want (technical requirements) Tech stack (preferred): Playwright (preferred) or Puppeteer for browser automation Node.js or Python acceptable GPT API integration for: mapping, variants, document classification, and completeness audit loops Must-have capabilities: JS-rendered content support (wait for hydration; not just raw HTML) Network interception: capture JSON/GraphQL responses during load + interactions Interaction replay: scroll for lazy loads expand accordions (“See more”, “Specs”, “Downloads”) click tabs open modals/drawers (e.g., availability, downloads) attempt variant selection and record deltas Asset harvesting: harvest images & PDFs from DOM and network responses (not only <a href> / <img src>) Anti-fragility: robust waiting (not only networkidle) retry logic consistent error reporting Output validation: JSON schema validation deterministic structure even when fields are missing (nulls, empty arrays) How GPT should be used (important) I have a GPT API key and want it used heavily but intelligently: Decide page type and extraction plan (product vs category vs doc page) Identify which network payloads contain product data Normalize messy specs into key/value Reconstruct variants/options from partial signals Classify documents (spec sheet vs install vs warranty vs BIM) Run a completeness audit and suggest the next actions (click this / expand that) until the record is complete Rule: GPT must not hallucinate. If uncertain, output null + evidence + recommended next action. Deliverables A runnable scraper (CLI or small service) that accepts a product URL and outputs: [login to view URL] (structured) optional [login to view URL] A “self-healing” completeness loop with logs: what interactions were performed what was missing what sources were used (DOM/network/GPT/OCR if used) Documentation: setup instructions how to add sites / tune extraction how to run in headless mode Basic test set: run on ~10 diverse product URLs (Shopify + custom + [login to view URL] + heavy JS) and show outputs Nice-to-haves Dockerfile Queue/scheduler support for batch runs Proxy support (only if needed) Optional OCR fallback using screenshot + vision for hard edge cases
Projektin tunnus (ID): 40183249
129 ehdotukset
Etäprojekti
Aktiivinen 16 päivää sitten
Aseta budjettisi ja aikataulu
Saa maksu työstäsi
Kuvaile ehdotustasi
Rekisteröinti ja töihin tarjoaminen on ilmaista
129 freelancerit tarjoavat keskimäärin $146 USD tätä projektia

Hello, I understand that you're seeking a developer who can navigate complex React/Next/Vue sites to extract all visible and hidden product details, generate structured JSON output, and integrate GPT API for accuracy enhancement. With over 5 years of experience in web development and an understanding of Javascript React frameworks, I am confident about delivering the solution you need. I have hands-on experience with Puppeteer for browser automation, Node.js for efficient processing, and deep knowledge of JSON. Though my strengths primarily lie in PHP and JSON, learning your preferred tools like Playwright with a bit of Python will be an exciting challenge that I am willing to take up. Apart from raw HTML parsing, I specialize in comprehensive content scraping by mimicking human interaction, handling cookies/overlays, capturing content from diverse sources (DOM + network + interactions), and lazy-loading images/PDFs. Integrating GPT API intelligently to enhance extraction accuracy is one of my key competencies and something I look forward to applying in your project. I uderstand your desire for a “self-healing” completeness loop with comprehensive logs, robust waiting logic for anti-fragility, error reporting consistency along with data-output validation employing JSON-schema, amongst other things. These are areas where I thrive due to my meticulous attention to detail and willingness to experiment till we get it right so that the Thanks!
$130 USD 2 päivässä
8,6
8,6

Hi there, We’ve built similar web scrapers that extract product data from multiple sources, including hidden fields, and deliver structured JSON outputs. We also developed a product classification engine that uses AI to classify products based on images and descriptions, which could be useful for your project. For this project, we’d use a combination of web scraping and browser automation to extract data from JavaScript-rendered pages. We can also integrate with the GPT API to enhance the accuracy of extracted data. Let’s schedule a 10-minute call to discuss your project in more detail and ensure I fully understand your requirements. I’m eager to learn more about this exciting project. Best, Adil
$137,02 USD 7 päivässä
7,2
7,2

I can build a universal product scraper using Playwright that fully handles React/Next/Vue sites, XHR/GraphQL, lazy loads, tabs, modals, and variants. It will extract human-visible + underlying data, validate strict JSON, export Excel, and use GPT intelligently for mapping, variants, docs, and completeness—no hallucinations, full provenance.
$250 USD 7 päivässä
7,3
7,3

Greetings! You're looking to create a robust scraping pipeline that can handle complex, JavaScript-rendered sites to extract detailed product information. This involves not just grabbing the visible text but also navigating through dynamic content, interacting with elements like tabs and modals, and capturing underlying data. My approach would leverage tools like Playwright or Puppeteer for effective browser automation, ensuring we can mimic user interactions and gather all necessary details. Additionally, integrating the GPT API will enhance accuracy by helping to classify documents, normalize specifications, and provide a completeness check to ensure we’re capturing everything we need. With my experience in web scraping and data processing, I’m confident I can deliver a solution that meets your needs and is reliable. Best regards, Saba Ehsan
$70 USD 4 päivässä
6,4
6,4

As an experienced developer, my skill set aligns perfectly with the project you've outlined. Over my 10+ years in the field, I've handled complex web scraping projects, building pipelines that incorporate browser automation (like Playwright and Puppeteer) with specific needs such as JS-rendered content support, network interception, and interaction replay. Additionally, I have proven proficiency with tools like BeautifulSoup and Selenium that are widely recognized for effective and seamless data extraction. Moreover, a project of this complexity demands not just technical expertise but strategic leveraging of processes as well. My AI and Machine Learning skills put me in the perfect position to utilize GPT API effectively - from field mapping to variant extraction, document classification, and completeness audit loops. Lastly, client satisfaction is paramount to me defensively. For this reason, in addition to delivering a structured JSON output as per your specifications, I assure you of clear communication throughout the process. I will also provide comprehensive documentation for ease of future use while remain. With Regards! Abhi
$250 USD 7 päivässä
6,6
6,6

Hi there! I’m excited about the opportunity to develop a universal product-detail scraping pipeline tailored to your needs. With extensive experience in building complex web scrapers using Playwright and Node.js, I understand the intricacies of handling JS-rendered content and network interactions. My approach will ensure reliable extraction of all visible data and underlying structures, leveraging the GPT API for intelligent data mapping and classification. I propose a timeline of 10 days to develop and thoroughly test the scraper on diverse product URLs, ensuring it meets your specifications. I will ensure robust error handling and output validation for quality assurance. Best regards,
$110 USD 3 päivässä
6,5
6,5

Hello Bhoomika S., I checked your project, and it looks interesting. This is something we already work on, so the requirements are clear from the start. We mainly work on PHP, Python, Data Processing, Web Scraping, Software Architecture, JSON, Scrapy, Data Extraction, BeautifulSoup, Selenium We focus on making things simple, reliable, and actually useful in real life not overcomplicated stuff. Let’s connect in chat and see if we’re a good fit for this. Best Regards, Ali nawaz
$129 USD 4 päivässä
6,3
6,3

Hey! I'm excited about the opportunity to develop a universal product detail scraping pipeline that tackles JS-rendered sites effectively. With experience in web scraping and data extraction, I'll ensure a comprehensive structured product record extraction from various sources with GPT-enhanced accuracy. I plan to utilize Playwright for browser automation, Node.js for versatility, and strategically integrate GPT for smarter extraction. The scraper will deliver strict JSON records and offer Excel exports for convenience. Let's discuss further details. How can I assist you with this project?
$155 USD 1 päivässä
5,9
5,9

Hi Bhoomika S., I specialize in PHP, Python, Data Processing, Web Scraping, Software Architecture, JSON, Scrapy, Data Extraction, BeautifulSoup, and Selenium. I understand your need for a universal product-detail scraping pipeline for JS-rendered sites with GPT-assisted extraction and structured JSON output. I have extensive experience in handling complex web scraping projects, including those involving React/Next/Vue sites and dynamic content loading. My approach involves using Playwright or Puppeteer for browser automation, Node.js/Python, and strategically integrating the GPT API for accuracy improvement. I ensure reliable data extraction from various sources, structured JSON output, and thorough documentation for seamless usage. Let's discuss how I can efficiently solve your scraping requirements. Looking forward to further discussing your project details. Best regards,
$30 USD 7 päivässä
5,8
5,8

⭐Hi, I’m ready to assist you right away!⭐ I believe I’d be a great fit for your project since I have extensive experience in building complex data scraping pipelines, especially for JavaScript-rendered sites. My technical expertise in Playwright and Puppeteer, combined with my proficiency in integrating GPT API for data extraction, perfectly aligns with the requirements of your project. This project requires a comprehensive approach that goes beyond simple HTML parsing to capture all product details from diverse websites efficiently. By leveraging my skills in network interception, asset harvesting, and anti-fragility mechanisms, I can ensure reliable extraction of all visible and underlying data. The structured JSON output and Excel export functionalities, as well as the use of GPT API for strategic accuracy enhancement, will be implemented meticulously to meet your specific requirements. I am dedicated to delivering a robust and self-healing scraping pipeline that provides accurate and complete product records. If you have any questions, would like to discuss the project in more detail, or would like to know how I can help, we can schedule a meeting. Thank you. Maxim
$50 USD 4 päivässä
5,5
5,5

I can build a robust, universal product scraping pipeline using Playwright (Node.js or Python) that captures fully rendered pages—including React/Next/Vue content, lazy-loaded assets, tabs, accordions, modals, and XHR/GraphQL payloads. The scraper will produce strict JSON + optional Excel output with product name, brand, description, images, PDFs, variants, attributes, and field-level provenance. GPT API will be integrated for variant reconstruction, doc classification, spec normalization, and completeness audits—without hallucination. Deliverables include a runnable CLI/service, logs of interactions & sources, setup documentation, and tests on 10 diverse sites. Optional Docker, batch queue, and proxy support can be included.
$140 USD 7 päivässä
5,3
5,3

Hi there! I thoroughly understand your need for a robust product-detail scraping pipeline capable of interacting with modern, JS-rendered sites. My extensive experience in developing web scrapers using tools like Playwright and integrating APIs, including the GPT suite, allows me to deliver a solution that precisely meets your requirements. In my previous projects, I successfully created scrapers that extracted data from complex sites, overcoming challenges like XHR loading and dynamic content rendering. This resulted in structured datasets that were both accurate and comprehensive. ✅My Plan: - Utilize Playwright for effective browser automation to mimic user behavior. - Implement GPT API for intelligent extraction and classification while ensuring null outputs for uncertain data. - Execute thorough validation of the output against JSON schema. - Develop comprehensive error reporting and retrial mechanisms to ensure reliability. - Conduct tests on various product URLs to verify robustness across different platforms. Could you clarify if there are specific types of products or websites you want me to prioritize for initial testing? Best regards, Hongqiang Chen
$190 USD 2 päivässä
4,9
4,9

I’ve built Playwright + GPT scraping pipelines for JS-heavy sites with network interception, variant extraction, doc classification, and strict JSON outputs. Your universal product scraper with self-healing completeness loops and Laravel-ready exports fits my workflow approach perfectly.
$100 USD 1 päivässä
5,0
5,0

Hello there, I understand that you are looking for an experienced developer to create a universal product-detail scraping pipeline that can extract structured product records from any website, including those with complex rendering technologies like React, Next, and Vue. My proposed solution involves building a robust scraper that can load pages like a real user, capture content from various sources, utilize the GPT API for accuracy enhancement, and output structured JSON records with optional Excel exports. Key Deliverables: - Development of a scraper that can handle complex rendering technologies - Integration of GPT API for accuracy improvement - Output of structured JSON records and optional Excel exports - Documentation for setup, tuning extraction, and running in headless mode I bring expertise in Web Scraping, JavaScript, Python, and GPT integration to ensure the quality and reliability of the solution. I'll share my portfolio with you in the DM. Kindly, ping me there. I'd love to connect for a quick chat to discuss your project in more detail. Best regards, Bilal
$140 USD 7 päivässä
4,9
4,9

Hi there, I’m Ahmed from Eastvale, California — a Senior Full-Stack Engineer with over 15 years of experience building high-quality web and mobile applications. After reviewing your job posting, I’m confident that my background and skill set make me an excellent fit for your project — Hiring Developer: Universal Product Page Scraper (JS-rendered sites) + GPT-assisted extraction + structured JSON output . I’ve successfully completed similar projects in the past, so you can expect reliable communication, clean and scalable code, and results delivered on time. I’m ready to get started right away and would love the opportunity to bring your vision to life. Looking forward to working with you. Best regards, Ahmed Hassan
$120 USD 2 päivässä
5,0
5,0

Hi there, I am excited about the opportunity to build your universal product-detail scraping pipeline. With over 7 years of experience in software development and a strong focus on web scraping, I have successfully developed solutions that handle complex JS-rendered websites, ensuring comprehensive data extraction. I am proficient in using Playwright for browser automation, enabling me to navigate dynamic content and capture essential details reliably. My approach will involve integrating the GPT API to enhance extraction accuracy, ensuring that you receive structured JSON records complete with all required data fields. Next, I will promptly create a proof of concept with a few sample URLs to validate the scraper's functionality.
$200 USD 1 päivässä
4,5
4,5

Hi, As an experienced Full Stack Developer and Automation Expert, I bring multiple valuable skills to the table that make me a perfect fit for your universal product-detail scraping project. My extensive experience in web scraping encompassing JavaScript-rendered content will ensure your requirement for Playwright/Puppeteer, Node.js/Python and GPT API integration will be met meticulously. With my automation workflows know-how using n8n, Make and Zapier, I can build you a self-healing completeness loop with detailed logs for all aspects of data extraction. I'm also adept at handling complex, dynamic website structures like those you described; hiding details behind tabs and accordions, following through lazy-loaded images/PDFs, and capturing content from multiple sources (DOM + network + interactions). Having worked with React/Next/Vue websites extensively, I am well-versed in managing XHR/GraphQL requests and network interception to capture relevant JSON/GraphQL responses. Let's have a quick chat to discuss your project in details Warm regards Usama
$120 USD 7 päivässä
4,5
4,5

Hello, there! I can build a universal product-detail extraction pipeline that behaves like a real user, captures “product truth” from DOM + network payloads, and outputs a strict, validated JSON record (plus optional Excel) with field-level provenance. I’ll implement Playwright-driven rendering with robust hydration waits, cookie/overlay handling, and interaction replay (scroll/lazy-load, tabs, accordions, modals, and variant selection with delta capture). On each run, the system will intercept and persist key JSON/GraphQL/XHR responses, extract embedded state (JSON-LD, NEXT_DATA/state blobs, meta tags), and harvest assets from both DOM and network. GPT will be used as a constrained assistant for field mapping, spec normalization, variant reconstruction, document classification, and a self-healing completeness audit loop; it will never invent values—unknowns stay null with confidence + evidence + recommended next action. You’ll get a CLI or small service, structured logs of interactions/sources, JSON schema validation, and a basic test suite over ~10 diverse URLs; Docker and batch/queue hooks can be included. Best regards, Ian Brown
$150 USD 7 päivässä
4,5
4,5

As an accomplished Data Analyst and Scientist, I possess the technical acuity to build a robust, all-encompassing web scraping solution tailored specifically for your project. Over my 8 years of experience in data analysis and science, I have gained deep proficiency in Python and SQL, which will be crucial for constructing the scraper with an emphasis on Playwright/Puppeteer, Node.js/Python integration. Importantly, my expertise extends to comprehensive ETL & Data Engineering tasks, so I'm well-versed in handling complex data structures as specified in your project description. In addition to my technical skills, I bring a strategic mindset for problem-solving that will be pivotal for leveraging GPT API optimally throughout the extraction process. My broad-based experience with various data visualization tools including **Power BI** and **Tableau**, combined with my skills in Python's **NumPy** and **BigQuery**, are significant assets that will enable me to seamlessly identify the network payloads containing product data while also transmuting messy specs into key-value pairs- crucial tasks to ensure output accuracy.
$120 USD 4 päivässä
4,6
4,6

Hello, I have solid experience building robust product-detail scraping pipelines for modern JS-heavy sites (React, Next, Vue) using Playwright with Node.js/Python. I can load pages like real users, capture DOM and network data, replay interactions, and reliably extract all visible and underlying product data into a strict, validated JSON structure. I’m also experienced in integrating GPT API carefully for mapping, variant reconstruction, document classification, and completeness checks without hallucination, with clear provenance and logs. I can deliver a runnable CLI/service, optional Excel export, documentation, and tested outputs on diverse sites as requested. Feel free to message me to discuss details or see relevant examples. Kind Regards, Md Ahsan
$140 USD 2 päivässä
4,4
4,4

Saint Augustine, United States
Maksutapa vahvistettu
Liittynyt kesäk. 21, 2024
$30-250 USD
$30-250 USD
$30-250 USD
$250-750 USD
$750-1500 USD
$30-250 USD
$10-30 USD
₹1500-12500 INR
₹1500-12500 INR
$349-350 USD
$30-250 USD
₹12500-37500 INR
$750-1500 AUD
$30-250 USD
₹1500-12500 INR
$10-30 USD
$30-250 USD
$750-1500 USD
$250-750 USD
£10-20 GBP
$15-25 USD/ tunnissa
$250-750 USD
$3-30 USD/ tunnissa