I need a simple Scrapy crawler.
The crawler will hit the homepage of a domain, and look for several different things on the front page, such as 1. is there a phone number on the home page? (RuleName in the Google Sheet below: 'Has Phone Number')
2. Is there a careers page? (RuleName in the Google Sheet below: 'Has Careers Page')
3. Is there a contact page? If yes ==> navigate to the contact page and then check if there is a phone number on the contact page. (RuleName in the Google Sheet below: 'Contact Page Has Phone Number')
In total, there are 18 rules the crawler needs to check. The 18 rules are in the Google Sheet file below; see the tab called "Rules to Implement"
Every single rule needs to run on each domain, and then output a boolean indicating if the item/link was found on the domain.
As for the output, the spider will output a CSV file. The structure of the CSV file will match the tab called "Example CSV Output" in the Google Sheet linked below.
The CSV format is as follows:
Domain - the domain name
URL - the URL; for most of the rules, this is the same value as the domain column. However, when you have to nagivate to another page (for example, the contact page) this column will have the full URL of that page.
RuleName - the name of the rule that was run
RuleResult - a boolean column indicating if the information was found on that page or not (1=yes; 0=no).
RuleMatch - if RuleResult=1, this column contains the match string that made RuleResult true; otherwise, this column is blank.
CrawlDateTime - a yyyy-mm-dd hh:MM:ss timestamp of when the page was crawled
For courtesy's sake, I also provided a third tab in the Google sheet called "Input URLs from Example CSV sheet", it is a list of domains to test against. It is the list of domains on the example CSV tab.
[login to view URL]
The deliverable of this project is thee Scrapy project code. This should be done in Python 3.6+
34 freelanceria on tarjonnut keskimäärin 139$ tähän työhön
Hello, how are you? Python crawling expert is here. I experienced too many scrapping tasks, so this task can be completed within this weekend. Your requirements are clear for me. Hope to hear from you soon. Regards.
Only 100USD for a day. Hi, I am a python web scraping expert. So I can use requests, BeautifulSoup and Selenium for the project. I can start the job now and finish in 10 hours. If it is ok, let's go head.