Find Jobs
Hire Freelancers

69979 extractor

N/A

Käynnissä
Julkaistu lähes 20 vuotta sitten

N/A

Maksettu toimituksen yhteydessä
have 2 sample programs that were never completed - one written in c++, one in jave, can provide both for examples. Job can be from scratch or just finishing up the coding on either one of these. the c++ has the most completed of the 2. price would vary depending on if you were just finishing this code, or if you had to start over. some of the live links below may be dead, cause this was an old writeup of mine, I will add new links, as soon as I get some bids Here is the project - would not allow my html tags, so I modified them, I think you should get the picture still though. MULTIPLE STEP MULTITHREADED DATA EXTRACTOR/ POST/GET METHOD - WIN32 APPLICATION Overview Extractor must be able to visit given/loaded/generated urls and extract *strings (sometimes for use in the next step) from the response and/or save the complete page *based on rules - appended to the same dat/xls/csv/txt file, if it was just a small string extracted, or saving as idividual files to a directory file001..file002..file003..etc.. creating subdirectories as number of files increases. ie. create a new directory within the main result directory every 10,000 saves. Must support redirects/302 errors and others/ process js asp and php. must be able to access ssl pages support logins, accept/reject cookies / keep alive and also be able to re-connect as if you closed your browser and reopened a new one... probably just deleting cookies/clearing cache everyrequest and reconnecting. must support the use of socks proxies. *Strings - first there should be a set of predifined strings. ie. html body/ formvalues/ mailto tags/ table values etc... and then capable of creating my own custom strings example 1 - could tell it we wanted to extract all data between [open html string[ AND [close html string[ in the response or multiple strings being - string one start name="firstname"value= - would extract all data from that starting string to ending string which would be whatever was after the data I wanted - could be / or [ or a space or line break or line feed or return character. - string 2 would start name="lastname"value= - again all data till the ending string. -string 3 etc.... until I had configured all starting/ending tags/ for all parts I wished to extract. *Rules - Page saving rules - do not save/ save only if - page includes [userdefined[ or page does not include [userdefined[ response size is [ / [ / = to [userdefined[ AUTOSAVE - I would like to always save in another defined file the url and variables that the extractor is pulling and all usernames/ that created a match/ or all number/number-letter combos that created a match *match - finding what it was looking for - file saved - data extracted.... UrlLists - loaded lists Must be capable of accepting large files lists, ie. point to the file list and grab lines as needed, as apposed to loading the entire list in memory. lists of usernames or a dictionary list to run against the site...etc.... UrlLists - program generated Must be able to create fixed variables ie. the main part of the domain ie. [login to view URL] [fixed doesn't change[ Must be able to create number sequences ie. 1000..1001...1002....1003 - by range/increment amount must be able to create letter generating combination - ie. every 2 letter combos plus additional changing/rotating/incrementing parts with *exceptions ie. [login to view URL][VARIABLE1[[VARIABLE2[ variable 1 might create every possible 2 letter combination / while variable 2 may increment by [userdefined[ howevermany numbers starting at say 1000 *exceptions in the case above there could be exceptions - that would say do not change the letter combo unless - page found/string found/ ie.... aa1000...aa1001...aa1002...aa1003 lets say here that it finds what it was looking for so it resets the number sets to 1000 and increments the letter combo by one letter. ie. now it would be ab1000..ab1001..ab1002 etc... in some cases there would be no exceptions - but we would still need to tell the program how to generate the url sections...ie.. do not change the number until all letter combos have been tried. or run through the range of numbers before you change the letter combo or randomly choose 2 letter combos until all have been tried or randomly choose numbers and keep the letter combos the same. in some letter/number cases....there is only 1 "match" so to speak - meaning that it found what it was looking for so it ran the extraction as programmed...now move on. ie. aa1000 through cc1000 did not contain anything that it was looking for the make the extraction happen, but when it got to cd1000 it found it. in these cases I would have already tested the sites and found that there is only 1 possible match for each 2 letters for each number / meaning that there would be no other 1000 site besides the cd1000 site that it found - so the letter combos would go back to "aa" and the number would increment by 1. MULTI-STEP EXTRACTIONS INVOLVING GET/POST/LOGIN first off there may be a login/password required at some of these sites - usually accomplished with the test feature I have listed below - but there are exceptions - so we will need a ONLY AS NEEDED STEP - which would be - only login if you find/do not find whatever we would specify - if the page s less than ?bytes or 404 error/ or forbidden error...etc... in some cases the main function of the extractor is not the extraction. sometimes I will need to post data to a page that I do not have the link for yet. On the page that we do know the link for, for example there may be an id number present that would be an extracted string and used in the next step. example. I extract from a site [login to view URL] - on the page it has a contactme link that could be something like [login to view URL] now on that contact page there would be a form with variables. the variables could be whatever / my name /my phone / my email/ my message----the extraction would be the Id number. so we would look for maybe "cgi?id=" extract everything after that till a space maybe and use that extracted string in the next step. the next step/request would be a post to whatever the fixed url is plus the variable it just extracted. ie. post *DATA to [login to view URL] *DATA the data for this perticalar case would be name/phone/email/message...now they will all be the same in this case so they can be fixed values just like the url....but there needs to be a feature to load csv/comma dlimited text for these entries. ie. maybe I want to rotate through a list of email addresses or a few different messages... maybe I want to post 2 messages to the same person - that would probably most likely be a step 3 of the extractor. AUTOSAVE - again would want to save the URL and the list of usernames/id numbers for each successful post/get. TEST FEATURE Will need to include a built in browser for testing purposes and to login to sites that require login. when using keep alive this initial login is acceptable for most sites unless they time out - then we would use the AS NEEDED step. mentioned above. LIVE EXAMPLES [login to view URL] - simple one just numbers [login to view URL][VARIABLE[ number generating sequencial [login to view URL] ractorPassword= another easy one... usernames load from files. [login to view URL][VARIABLE[&EntryPassword=danger&Admin=&ContractorUserName =&ContractorPassword= http://www.juiceplus.com/+rm65565 letter/number combos - has a 302 redirect too... http://www.juiceplus.com/+[VARIABLE1[[VARIABLE2[ 2 step example [login to view URL] verify KM4809 - to use in next step then post to [login to view URL] data to post pagename=KM4809&user=KM4809&uemail=&uphone=&problem= another 2 step ssl this time - easy though all numbers all in order....really only a 1 step - but we will use 2 just to verify [login to view URL] if page has [string[ move on.... then post to [login to view URL] with data RepEmail=&EmailAddr=&Subject=&Body=Dear+Barb+Williams%2C+ I will get you some login examples....but this should get you started....
Projektin tunnus (ID): 1817955

Tietoa projektista

1 ehdotus
Etäprojekti
Aktiivinen 12 vuotta sitten

Haluatko ansaita rahaa?

Freelancerin tarjouskilpailun edut

Aseta budjettisi ja aikataulu
Saa maksu työstäsi
Kuvaile ehdotustasi
Rekisteröinti ja töihin tarjoaminen on ilmaista
Myönnetty käyttäjälle:
Käyttäjän avatar
Similar to WGET, huh but with lots of rules, I guess. I would prefer C++ coz it runs without any extra installations and threading is very robust. Want to make it run on all platforms?
$350 USD 30 päivässä
0,0 (0 arvostelua)
0,0
0,0

Tietoja asiakkaasta

Maan  lippu
4,4
6
Liittynyt toukok. 30, 2004

Asiakkaan vahvistus

Kiitos! Olemme lähettäneet sinulle sähköpostitse linkin, jolla voit lunastaa ilmaisen krediittisi.
Jotain meni pieleen lähetettäessä sähköpostiasi. Yritä uudelleen.
Rekisteröitynyttä käyttäjää Ilmoitettua työtä yhteensä
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Ladataan esikatselua
Lupa myönnetty Geolocation.
Kirjautumisistuntosi on vanhentunut ja sinut on kirjattu ulos. Kirjaudu uudelleen sisään.