Ilmoita projekti

415068 website data mining/ripping

Käynnissä Julkaistu May 19, 2010 Maksettu toimituksen yhteydessä

N/A

Maksettu toimituksen yhteydessä

Käynnissä Maksettu toimituksen yhteydessä

Hi,

I need a lot of data ripped/mined from 2 different websites for archival purposes. One website has 15,000 pages, and the other website has 11,000 pages.

After you rip the individual pages, I need you to write a script to extract/parse the data from the html files and place it in a text delimited database file. You will also need to remove all strange characters from the data so that the text delimited data can be imported properly without error.

Each page will need the following extracted…. if you can't extract every field from every record, simply collect what you can. I'm fine with a few incomplete records.

Domain

Company Name

Description

Street

City

State

Zip

Country

Telephone

Now for the hard part… Unfortunately, one of these websites have many protections in place protecting their data from data miners. They use some type of firewall connection limiting, that allows only a few connections from the same IP in a small time period before you are filtered. In addition, they block known proxies. This is a difficult job unless you are very tricky…

I was going to use a script similar to this to gather the data using a list of good proxies.

[url removed, login to view]

I'm just too busy and don't have the patience to finish this job. This data collection could take days, if not weeks I suppose, depending on how fast you are able to gather the data from the website that makes it difficult.

I don't believe the other website has the same data protections, but I have not tested in quite some time.

Private message me for the name of the sites I'm trying to data mine. Also, feel free to message with any other questions. I did not set a budget because I'm not really sure how difficult this job will end up being. Don't worry, I'm able to pay a fair amount for the work. Bid accordingly.

Regards,

Wyatt

Tiedonsyöttö Odd Jobs

Projektin tunnus: #2160931

Tietoa projektista

1 ehdotus Etäprojekti Aktiivinen Jul 11, 2012

Haluatko ansaita rahaa?

Freelancerin tarjouskilpailun edut

Aseta budjettisi ja aikataulu

Saa maksu työstäsi

Hahmottele tarjouksesi

Rekisteröinti ja töihin tarjoaminen on ilmaista

Myönnetty käyttäjälle:

dataconversion2

Dear Sir, Please check PM for details. Thanks.

$700 USD 7 päivässä

(0 Arvostelua)

0.0

Ilmoita samanlainen projekti

415068 website data mining/ripping

Tietoa projektista

Haluatko ansaita rahaa?

Freelancerin tarjouskilpailun edut

Myönnetty käyttäjälle:

Freelancer

Tietoa

Ehdot

Sovellukset