i need a script / software to parse an huge "webcrawler index file", extract some informations and save them into my mysql database
data to extract:
-link relations: link from site - link to site
-link info: textlink or image link, link text, alt text, title text
-link type: nofollow, (meta tag nofollow), follow link
-some page infos: content encoding, title, domainname...
the index file sizes are between 15GB and 100GB big, please keep in mind that your script can handle this capacity
the script / software should run on our linux root server
i'll give you an mysql database layout
you find can all informations about the index file here: [url removed, login to view]
please send me an pms if you have any question
35 freelancers are bidding on average $115 for this job
Hi, I am an information extraction specialist and would be happy to help you with this project. All I require is a small sample of the file to be parsed to proceed.