I need a developer to create me a very simple web crawler. It should crawl a designated website for links (excluding image links and .js files etc.). and then crawl those links for links and so on. My goal is to crawl and list 120 million links. The links should be recorded in a series of text files (one million links per text file) and links should only be recorded once.
This can be done as either a web-based on desktop-based application.
Hi, I'm working in a similar project now (search engine) and finished it's web crawler using java, but please let me know the content of websites you want to crawl, thanks.