333809 URL harvesting script

Suoritettu Julkaistu Jul 8, 2009 Maksettu toimituksen yhteydessä
Suoritettu Maksettu toimituksen yhteydessä

I need a script or desktop application (for windows vista) to harvest website addresses (URLs) for me. My preference is a script that runs in PHP and MYSQL on a Linux server.

I want to enter a list of keyword phrases like "cheap hosting" and "custom furniture". Typically there will be a few hundred of these at a time and I want to be able to add and delete phrases.

When I run the script (let's call that a scan), it must do the following - get website addresses from the following sources for me (using "cheap hosting" as an example) -

1) [url removed, login to view] - all the results (not just the first page of results)

2) The first 100 results from [url removed, login to view] - filtered to show only the URLs that actually have one or more of the keywords in the domain name itself, but not as part of a subdomain. (So [url removed, login to view] and [url removed, login to view] is ok, but not [url removed, login to view], and not [url removed, login to view])

3) The first 100 results from Google for [url removed, login to view]

These results must go into a database in the form of [url removed, login to view] (NOT [url removed, login to view]) seperated into the 3 categories above and the date that the script was run.

There will be function where I can enter from time to time (for a specific keyword phrase) multiple (anything from 10 - 1000 at a time) URLS in the form of website.com. If these URLs match existing URLs for that specific keyword phrase, it must be marked as "used". I also need to be able to mark/reset URLs to unused/default.

Then I need a function to make a report of all the unique results gathered for a specific keyword between date x and Date Y (I will enter these values) that is not marked as used. The report must be in CSV format with these fields -

keyword phrase

url

source

That is the basic functionality of the script. Other features will include -

1)There will be a general filter applicable to all keyword phrases where I want to enter URLs that I do not want collected.

2) I want to run the scans in bulk, by selecting which keyword phrases to scan, at the end of the scan the script must give a report of the number of NEW results per keyword phrase that were collected during that scan that have not been collected before.

3)The script must have a setting to specify a random delay in seconds between searches to avoid being blocked by Google.

MySQL Odd Jobs PHP

Projektin tunnus: #2079619

Tietoa projektista

1 ehdotus Etäprojekti Aktiivinen Jul 11, 2012

Myönnetty käyttäjälle:

synccoder

I can help you. Please see my PMB for details.

$200 USD 3 päivässä
(0 Arvostelua)
0.0