I have an existing crawler (see bottom for more details)*
This crawler crawls webpages/directories and collect links together with metatags description text to store the results into a text file
I want the Crawler to be modified to search a list of Directories and find my submitted domains on those directories and put the results into a text file like
TARGETDOMAIN - DIRECTORYDOMAIN - DIRECTORYPAGE (where the targetdomain is listed)
[url removed, login to view] - [url removed, login to view] - [url removed, login to view]
After the crawler found the Target Domain / Target Domains it will stop crawling and moves on to the next directory and then to the next etc!
I do a lot of directory submissions and those bastards never send you a confirmation email when they list you (well some do) with this crawler I am able to see which directory is going to list me or never listed me at all!
THINGS TO CONSIDER:
- Directories have usually a high number of pages up to 100.000 and more - the crawler should be able to remember which pages it already
crawled to save time and resources and speed up the process!
- Crawler should be able to crawl a huge number of Directory in Batches! (read from a .txt file)
- Crawler should be able to pick up on the last status in case it has to be restarted!
- At the moment the browser window has to stay open - it will refresh itself after a given amount of time but it would be nice if the script could do the job without the browser window being open and
just send an email when the job is done!
- Is it better to check all the directories for only one Target Domain or is it more economical to search for several Target Domains????????
(in case I already submitted several domains to the directories)
the crawler is written in perl and was designed to crawl webpages/directories and collect links together with metatags description text to store the results into a text file and let you use this kind of text/links via "includes" as content on your pages! - basically its a kind of text/content scraper
ALREADY BUILT IN FEATURES:
- The script has already build in features which could be beneficial like:
Max. Number of Parallel Requests:
Max. CPU Time (seconds):
Delay Between Requests (seconds):
Password Protected Admin Area!
##Money will be paid after I did a successful test with 100 Directories!-
Important please put CRAWLER in your reply - so I knnow you actually read the description ;-)))##
NOTE: I can grant you access to the crawler so you can check the source code and then decide if you are able to modify it successfully! Just send me a pm!
Budget is around 150$
13 freelanceria on tarjonnut keskimäärin %project_bid_stats_avg_sub_26% %project_currencyDetails_sign_sub_27% tähän työhön
Greetings we can complete perl scrapper modification for you no problem. It will probably only take a few hours but we bid more just to be safe. Please let us help you.