We are looking for a custom crawler/spider capable of continuously crawling 1000-2000 sites per week. The following is a summary of our requirements:
1. Crawl 2000 sites.
2. To set frequency of crawl for each site
3. Option to crawl whole site or selected folders of a site
4. Option to add in a username and password for a site……where cookies, or user authentication, or submitting form is required.
5. Crawl urls and parameters to be managed in external SQL db
6. Collect and store content and metadata, and info for each url. Other required information is whether new or changed since previous crawl.
7. Display results in tree structure for each site crawled.