I need an asp.net application using C# to crawl preset web pages and extract specific content from the source of each of those pages.
A web page with a textbox and a submit button. When the user clicks submit, you will crawl the sites(URLs will be provided) and extract the results. source code of the web pages must be read using HttpWebRequest and HttpWebResponse. This process should be multi threaded and the results should be displayed as they are processed. The processing/wait icon must be displayed while the results are being awaited. The html sources of webpages are saved on the server. Once the web page is read, the required content must be extracted using specific rules for each page. There are 6 pages to read per search. The result must be cached on the server for 24 hours and a URL rewrite engine must be configured such that pages are cached in this format - [url removed, login to view]
I would prefer that you use an already existing crawler like searcharoo or zeta web spider ([url removed, login to view]) or this [url removed, login to view] Please mention in PMB as to how you plan to do this. You will need a test server online to show me the work before i make the payment.