I’m looking to build a crawler to perform a few meta and performance checks across multiple similar websites. This crawler should be built with Puppeteer and with clear unit tests.
The intent is to crawl the homepage of a site and its site map (well defined), and then a long list of sub pages which are defined on that site map (and linked sub site maps). These pages are grouped into 3 well defined categories and are generally the same. We will then run a specific list of checks across each of these page types, and another set of checks across every page globally.
The checks that I have defined and will share at the beginning of the project are quite simple, such as “does a <meta> description tag exist” and “does the page title contain some specific string”. The reason I am interested in Puppeteer (or Foxr) specifically is because I’d also like to measure things like page load time (Largest Content Paintful) and track heavy resources (such as images and scripts).
To summarize, I need a web crawler that, with an original sitemap url, will crawl all sub-sitemaps and then the pages listed within those (strictly 3~ levels deep). Once on each of those pages, we will run a series of checks (some global, some category specific) and return details on the result (sometimes a boolean, sometimes and integer).
I have fully documented this internally and will share upon beginning of the project. I’m also technical and will be reviewing your code and unit tests. To show that you’ve read this description entirely, please include the word “dinosaur” in your response. I will be happy to jump on a call with you during development to answer any questions you may have. Thank you!
7 freelanceria on tarjonnut keskimäärin $583 tähän työhön
I've built several projects with Puppeteer including a twitter bot, automated tests and I can confidently accept this project! btw I'm not a bot I'm a "dinosaur".
dinosaur. I have more than 4.5 years of relevant experience in the IT industry. Please feel free to chat with me for further details. Have worked with puppeteer before and can work in this project.