We have a requirement that would need similar (and more) functions to the tools below in each step:
Step 1: Having a tool like [login to view URL] that checks the page status of each URL and only return those that have real 404 errors., but being able to process up to 500 URLs at a time, or being able to attach a file for the check to process in batches and finally create an output file to feed into Step 2.
Step 2: Having a process to check the DA/PA of up to 500 URLs at a time (like this site: [login to view URL]), or being able to process a larger file from Step 1 that can process batches of URLs at a time until complete, and strip out all the URLs that are under PA20, and create a file to feed into Step 3.
Step 3. Having a process that can use an API for [login to view URL] to check the TF/CF details for all domains available in Step 2, and checking the TF/CF for the domain name with both the http:// and non http:// values for Web2.0 sites like Tumblr and other Web2.0 sites, and also check the www. and non www values of actual domains, then create a file after this process is finished for further processing.
Step 4. For actual domain names that Step 3 processes, use the Majestic API to check the incoming Anchor links to the domain for both www. and non www. domains.
Your programming skills need to be good, but you need to have a good understanding of what the 4 steps above are doing. We dont want to have to explain what all these mean. If you want to work on this project, you will already understand what they mean and why we want to do this project.
The end result does not need to look all polished with a flash looking GUI. It just needs to work properly.
When you bid on this project we want to make sure we know you have bothered to read out requirements. Please start your opening remarks with "Grab the strong stats easier" followed by anything else you want to tell us about your actual experience. If you fail to do this basic step, we will know you did not read the project and you will be ignored - we promise.
We look forward to your bids.
Grab the strong stats easier
Do you want this to be a commandline script or web service, the description is slightly vague there. I recommend this be developed with Node.js or Python/Tornado if it is a web service.
And I recommend Python if you want commandline utils. Node's client side scripting is not mature enough even though Nodes async is an asset here.