Create a database/php application that will crawl a list of URLs, determined by a priority number using a master/slave system. The master and slave will most likely be done using Ubuntu/Debian EC2's. Using a LAMP stack and with php5-curl installed (To do the requests). The code has to work with that setup, it can be developed in windows but the code has to work for linux filesystem.
The main server/database (lets call it MAIN) will have a mysql database with a few tables:
Urls - (Url, Priority, SlaveId)
Slaves - (SlaveId, ServerIP, QueueSize, State)
State options: Online, Offline
Priorities will be 1-5.
Each slave reports to MAIN its state every 5 minutes, confirming its 'Online'. If MAIN doesn't hear from the slave after 5 minutes, it reports state as 'Offline'.
URLs will be removed once completed by the slave (The slave will do a SQL delete and delete it from the MAIN).
urls will be added to the URL table and can be added randomly to the slaves (doesn't need to be balanced, but if there are 5 new urls then they should be added to slave1, slave2, slave3...etc)
The balance algorithm needs to happen instantly when a slave goes offline, goes online, and every 1 minute.
The MAIN servers job is to assign slaves to the Urls and try to balance workload between all slaves as much as possible. If a slave gets marked as Offline, or a new slave becomes online all queued URLs get even distributed appropriately, making sure not only the number of assigned URLs to a slave is even but the average priority is about the same.
The SLAVEs job is to process their assigned URLs, in order by priority (5 is highest priority). The slave will use php5-curl to make a request to the URL, and then save the contents of the request to a file on the hard drive. Then it will report to MAIN that it's queue is 1 less, and it will delete the URL record it just deleted.
7 freelanceria on tarjonnut keskimäärin 125 $ tähän työhön
Hello, I am available for your job, I can start right now. I will provide you good quality work with fast turnaround. Please hire me for this project. Waiting for your kind reply for more discussion. Humfi
hello I have read your requirement. I can help you to finish this work. Can you provide more information about this project? I can use python to scrape. Thank you