415068 website data mining/ripping


I need a lot of data ripped/mined from 2 different websites for archival purposes. One website has 15,000 pages, and the other website has 11,000 pages.

After you rip the individual pages, I need you to write a script to extract/parse the data from the html files and place it in a text delimited database file. You will also need to remove all strange characters from the data so that the text delimited data can be imported properly without error.

Each page will need the following extracted…. if you can't extract every field from every record, simply collect what you can. I'm fine with a few incomplete records.


Company Name








Now for the hard part… Unfortunately, one of these websites have many protections in place protecting their data from data miners. They use some type of firewall connection limiting, that allows only a few connections from the same IP in a small time period before you are filtered. In addition, they block known proxies. This is a difficult job unless you are very tricky…

I was going to use a script similar to this to gather the data using a list of good proxies.

[url removed, login to view]

I'm just too busy and don't have the patience to finish this job. This data collection could take days, if not weeks I suppose, depending on how fast you are able to gather the data from the website that makes it difficult.

I don't believe the other website has the same data protections, but I have not tested in quite some time.

Private message me for the name of the sites I'm trying to data mine. Also, feel free to message with any other questions. I did not set a budget because I'm not really sure how difficult this job will end up being. Don't worry, I'm able to pay a fair amount for the work. Bid accordingly.



Taidot: kaikki käy, tiedonsyöttö

Näytä lisää: what is a job fair, sites that pay for data entry, job data mining, data entry sites free, agent job description, job street, how to write a good job description, miners need, user agent, database mining, data mining using, data mining company, data mine, data entry job from 2 days, collect some data from a website, c++ text mining, parse error parse error, data html data, html data mining, data collection data mining

Tietoa työnantajasta:
( 134 arvostelua ) Mission Viejo, United States

Projektin tunnus: #2160931

Myönnetty käyttäjälle:


Dear Sir, Please check PM for details. Thanks.

$700 USD 7 päivässä
(0 Arvostelua)