Custom Scraper/Crawler for specific list of sites

This is my first time hiring a freelancer, and i noticed you are skilled in python and crawler/scrapers. If you can take this task i have no problem convincing my partners to employ you as a steady programmer if you are interested.

Please disregard the amount of hours and time length we chose in the dropdown menu below, i dont know how long this would take you but from my experience with python and scraping, this should not take more than several hours.

I have attempted using SCRAPY to code this myself, but my knowledge of python is limited and therefore I feel it is best to hire an expert.

The project is simple, and separated into a couple of layers.

1. I need a crawler which takes a retail website URL, and crawls the entire site. The crawler must identify what pages on the site are "product" pages, perhaps by a variable match such as an "add to cart" button. This variable may be different from site to site, therefore the crawler must have the variable defined by me before initiating the crawl.

2. The crawler then outputs the list of URLs which are "product pages", and now the Scraper program goes to work.

- The scraper must pull a set of predefined fields on each product page. I understand this is done with xPaths, and I guess that if the scraper can accept a list of pre-selected xpaths then it will pull the correct data fields. The variables are site specific, so just as the crawler accepts a definition variable, the scraper should accept pre-selected variables defining the xpaths to pull. these xpaths will be chosen manually by myself, and the scraper should be able to accept a varying number of variables.

- The scraper pulls the plain text from these data fields, and outputs everything into a CSV file.

- Each output value must have an internal label assigned to it, for us as developers to know what each value is.

- Examples of the data fields would be: product title, price, meta keywords, meta description, product description, and whatever "attributes" are in the table or div located on the product page.

- I feel that these xpaths may change from page to page, so how do you think we can keep a consistent scrape of xpaths if they change from page to page?

This is all i will describe for now and would like to know if you are interested and how long this would take. If you are looking to work with us, then these initial two items are part of a much larger project.

I await your reply and look forward to hearing from you.

Best Regards


Taidot: Python

Näytä lisää: work hire definition, sites hire programmer, dont like freelancer project, using python freelancer, url freelancer, url freelancer website, price hours programmer, best freelancer programmer site, site freelancer python, simple description freelancer, retail freelancer, python project freelancer, python programmer freelancer, python programmer project hire, python look file, python expert hire, python developers hire, program website python, programmer hire task, product label freelancer, pre hire, number developers freelancer, need python programmer project, match problem, match list

About the Employer:
( 0 reviews ) rego park, United States

Projektin tunnus: #1727714

Myönnetty käyttäjälle:


Hired by the Employer

$30 USD / tunti
(49 arvostelua)