A programme for assessing the content of web portals.
$30-250 USD
Suljettu
Julkaistu yli 3 vuotta sitten
$30-250 USD
Maksettu toimituksen yhteydessä
A programme for assessing the content of web portals.
Description:
I would like to create a tool which will evaluate the number of e-mail addresses present on ad portals. I would like the programme to search the whole portal and then give the number of e-mail addresses appearing on the portal. The programme is to be universal and work on different websites.
The objective:
To assess whether it is worthwhile to undertake the scrapping of a given portal in terms of obtaining e-mail addresses (e.g. by writing a dedicated scraper).
Example of action:
As a user, I enter two variables on the input:
portal address (e.g. [login to view URL], [login to view URL], [login to view URL], etc.).
The program itself finds subpages with company/person/product data.
It then checks on each found subpage if there is an e-mail address.
When it has already scanned all the subpages, it gives as a result of its actions what % of the pages contain the e-mail address and how many e-mails it has found in total on the whole portal.
Conditions to be met:
The program cannot scan the same subpage more than once.
The programme cannot take into account the e-mails in the footer of the subpage, which belong to the administration of the service.
The programme may be written in any language.
The programme need not have a GUI
The programme should be as universal as possible, i.e. it must not require too many changes to the various portals
The program should be accompanied by an instruction manual with screenshots
I would like the programme to be implemented on my vm (I leave the OS to choose)
The programme and its working process should be lighter than standard scraping tools or dedicated scrapers, as the aim is only a preliminary assessment of potential
Please include in your offer:
Initial valuation
Estimated implementation time
Method of order execution
Hey,
-To make it as universal as possible, best is to address the requirement with Selenium
-It is possible to make a really fast scrapper and but making a rather slow one will make sure you don't have to reconfigure your VM everytime. Simply because, most sites will block the bot if its making huge number of requests.
-Most of the requirements are clear but a few things are unclear so I have a few questions as well
PS: include in your offer:
Initial valuation - $200
Estimated implementation time - between 10-15 days (making sure things work on number of sites)
Method of order execution =>Root URL > Seed authentication if required > Regex for mails(Maintain a store for mails and pages scrapped) > Evaluation after all the links scrapped