Crawler for Gadget Wiki

We need to get a web crawler made, for extracting structured information from tables on specific websites and storing them in a MongoDB database. The crawler should be developed using a plugin architecture so that we can add more source websites easily. The crawler should also be able to re-visit URLs after a configurable number of days, and update the information in our DB if it changes on the source website. It should also discover new pages on the sites we're interested in (using the sitemap page) and crawl them automatically.

The data needs to be stored in Mongo DB. The schema and some source websites will be shared with you. We prefer Node.js, but crawlers in other languages will also do; however the DB backend must be MongoDB.

Taidot: node.js, NoSQL Couch & Mongo

Näytä lisää: wiki websites, web wiki software, web crawler wiki, web crawler architecture, schema update, js for, architecture of web crawler, node.js mongo, mongodb c++, couch database, website crawler, web crawlers, Node js website, mongo, database crawler, data crawler, crawler, couch, web data crawler, structured database, mongodb add, crawl sites, node crawler, information crawler, data website using web crawlers

Tietoa työnantajasta:
( 0 arvostelua ) Bangalore, India

Projektin tunnus: #1712005

2 freelanceria on tarjonnut keskimäärin 16250 ₹ tähän työhön


I can develop this tool with Node.js and MongoDB.

20000 ₹ INR 4 päivässä
(0 arvostelua)

Sir, Pls check the PM.

12500 ₹ INR 2 päivässä
(0 arvostelua)