Project skills required: code to scrape PDF data, this is not a manual task!
Project goal: I have media reports in pdf format, and I want to extract data pages that contain zip code level information from the reports. There are about 1,174 pdfs (some are duplicates), with the median length of about 60 pages, 90th percentile 218 pages, 10th percentile 31 pages, most of the pages are useless, I need information on specific pages described below.
For each media institution’s report, scrape identifier items on the first page.
Scrape all the variables on the corresponding page that contains zip code level information, and then merge them with identification items scraped from part 1.
Scrape all the variables on the corresponding page that contains county level information, and then merge them with identification items scraped from part 1. This is almost exactly the same as part 2, and would not cost you much additional coding.
Notes: The template is about a report on page 8 of hr_hi. hr_hi is NOT files you need to work with, as it is organized by state, and each state consists of many different reporting institutions. This is what I did before. To make your life easier, I separate them into different reports. "pweq5gqydmnsitx..." is the kind of files you are going to get, and it is the about the median size.
The three examples show you what kind of information I need, it is based on descriptions in the Scraping Note, and the template gives similar information in excel file.
Hi, sir I have a detail look to your project, I have a great skill in pdf processing. I'm sure I can complete your project. My price and period is negotiable. We can discuss the details via chat. Thanks.
20 freelanceria on tarjonnut keskimäärin 490 $ tähän työhön
Hi, I am a serious developer who aims to provide high quality services. If you contact me, we can discuss more things detail and will be achieved with each other's purpose. Good luck for your business…
I use Excel professionally I am able to finish the project in a few days If you want to see an example before starting work, I have no objection