Google Scrape and Download up to 100,000 documents from Google search.
You need to conduct a search in Google, using the date range function, and download all of the PDF documents for each day in 2018, using the search "safety data sheet" and document type: PDF.
An example of the search URL for January 16th, 2018 is:
[login to view URL]:pdf&num=1000&lr=&as_qdr=all&tbs=cdr:1,cd_min:1/16/2018,cd_max:1/16/2018&filter=0&biw=1666&bih=900
The following link may display ALL files, including omitted ones:
[login to view URL]:pdf&num=1000&lr=&as_qdr=all&biw=1666&bih=900&tbs=cdr:1,cd_min:1/1/2018,cd_max:1/1/2018&filter=0
After each day’s search, Google will display on the bottom of the search results the following message:
“In order to show you the most relevant results, we have omitted some entries very similar to the 34 already displayed.
If you like, you can repeat the search with the omitted results included.”
You must click on the link to display all of the documents.
You will need to download every PDF document on the page. Google will very quickly attempt to block your usage, so you may need to rotate your proxies often.
You will create a separate folder for each date, and put each PDF document in this folder, for example:
You will upload the files to our server via FTP, which we will provide upon award.
You may complete this process by hand, or with any web scraping tools you possess.
ALL WORK WILL BE COMPLETELY CHECKED FOR EVERY DAY BY HAND.
We anticipate that you will find between 50,000 and 100,000 files.
You must post one day's files as a sample with your bid
Your bid will be for up to 100,000 files
We will deposit 25% of the project into escrow, for the first three months (January 2018, February 2018, and March 2018), and will release after the files are confirmed. Then we will deposit the next 25%, etc until the project is completed.
Compliance Publishing has completed more than 125 projects on Freelancer