This is a web scraping project that can be done in either python or perl.
I am developing a social shopping comparison site, and have authorisation from the
sites we are scraping to crawl and extract data daily.
Scraping needs to be setup for 10 ecommerce sites, for example Firebox .com . About 10,000
urls in total as each site has about 1000 products.
The data that needs to be extracted includes:
Prod Page Url
This data needs to be stored in a database that will be hosted on Amazon AWS. The image also needs to be
downloaded and renamed and stored on our local server, and the local image path also added to the DB.
There can be a very basic scraping UI that we can set the interval for new scrapes.
It should also be easy to setup new sites to scrape, but we can also pay you to add these if needed.
In case needed you should be familiar with routing the scraper through Tor.
After urls are scraped, additional social media data needs to be added to each url in the db, for example for Facebook,
with this JSON API:
Facebook*: [url removed, login to view]
In addition to designing this scrape program, I will require ongoing support and maintenance of the code, and have
many such similar projects in mind that I would like to work on with the successful applicant.
I want to work with someone who is responsive via Skype and can get back to me very quickly regarding issues.
PLEASE NOTE: Please apply stating your experience in web scraping, and also your advice on whether to use
python or perl, and also your experience of Tor.
If you cannot be available to discuss issues on skype text chat, please do not bid.
24 freelanceria on tarjonnut keskimäärin %project_bid_stats_avg_sub_26% %project_currencyDetails_sign_sub_27% tähän työhön
Hello. I am an experienced web developer and web automation specialist with python. I can help you with this project. Check pmb for further questions and details. Regards, -Arthur.
I'm quite intrigued by your project as we have done similar tasks in the past, and I'm interested in learning further details. Please review your inbox for further details.