We need to get a web crawler made, for extracting structured information from tables on specific websites and storing them in a MongoDB database. The crawler should be developed using a plugin architecture so that we can add more source websites easily. The crawler should also be able to re-visit URLs after a configurable number of days, and update the information in our DB if it changes on the source website. It should also discover new pages on the sites we're interested in (using the sitemap page) and crawl them automatically.
The data needs to be stored in Mongo DB. The schema and some source websites will be shared with you. We prefer [url removed, login to view], but crawlers in other languages will also do; however the DB backend must be MongoDB.