We're looking to do some research on a list of domain names. For each domain name we want to know the following:
Domain, Company, Industry, Country, State, City, Zip/Postal, title tag, meta description, meta keywords
That's it.
The input will be the list of domain names pasted into a text area on a web page. The output should be a downloadable CSV or TAB delimited file I can load into Excel. There should also be visible output on the web page while running so we can see progress.
Your script will have list of lists that contain the industry information. formatted as specified below or whatever way is easiest for you (though, we should be able to add/edit/delete from this list as much as we want. Hardcoded within the program is OK).
The INDUSTRY "List of lists" could look like this:
Industry,tag1,tag2,tag3,...
And basically, if the domain name, home page title, meta description or keywords ("the data fields") have either the industry name or any of the tags in them, then that is the industry they should be assigned.
Here's an example of what the industry lists might look like, but you can format them any way that works best for you.
$legalwords = array("legal","law", "lawyer", "attorney","advoca");
$consultantwords = array ("consultant","consult","advisor");
$medicalwords = array ("medical","medicine","doctor","surgic","stem cell","scienc","research","laborat");
$contractorwords = array ("contractor","construction");
And since it is possible for a company to be in more than one of the industries, I'd like some logic that determines the most appropriate industry, maybe by counting how many matches there are in the data we are looking at for each category.
The location information should come from the WHOIS database.
We need error checking built into the program so that if a domain no longer exists, or if it redirects elsewhere, the script does not crash, but continues to the next URL.
The output should be a TAB delimited file that we can easily load into EXCEL to do some analysis.
That's the whole project.
Once the project is awarded to you, I will send you a list of sample domains and a more complete list of industries and tags.
When you reply, put the word "orange" in the subject line of your PM or BID. If you don't do that, I'll know you didn't read this spec completely, and I won't read your bid or PM.
I need this done in the next 12 hours, but that should be easy as it's an extremely small and simple project for someone who knows PHP even reasonably well. And if you're an expert, this is probably an hour or less.
Thanks.
Mark