Hi there. We require the crawling and analysis of a number of websites (less than 10) to grab specific information and to drop this into a database. I have included a file which lists these URLs as I did not wish to put them directly in here. They are all car-related websites. What we need the crawler to do is: - visit the site - crawl the pages to pull out information related to all cars for sale. This *may* require selection of form fields or the filling in of fields for a search - remove cars that are not of a particular type (eg mercedes) - Look for new entries (based on data already captured at last crawl) - Add new entries to database - possibly to send out standard email to new entries (dependent upon having email address) Each site is different and displays the information in a different way. We wish to standardise our database, so you will need to create mapping of each site to our fields. Ideally we are looking for a crawler that is flexible, where we can create a 'profile' of each site we need to crawl and therefore it will be easy to add new sites as we find them - ie I don't want several crawlers creating that are site-specific. If this is the kind of thing you have done before and you have good knowledge of crawlers and scrapers then we'd be very interested in talking to you as this is part of a wider project. We also REQUIRE this to be done in Python - namely because the larger project is based in Python. I have included the list of sites in the attached .txt file (zipped for rentacoder) and would welcome any questions or suggestions you may have. We are ONLY looking for people who have experience, so please demonstrate that you have this experience to us and if you can come back and say generally how you would do it this will give us a comfort factor in choosing you. This project will lead to more work Best regards Sergey
1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done.
2) Deliverables must be in ready-to-run condition, as follows? (depending on the nature? of the deliverables):
a)? For web sites or? other server-side deliverables intended to only ever exist in one place in the Buyer's environment--Deliverables must be installed by the Seller in ready-to-run condition in the Buyer's environment.
b) For all others including desktop software or software the buyer intends to distribute: A software? installation package that will install the software in ready-to-run condition on the platform(s) specified in this bid request.
3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement).