Web Page Downloader/Parser

First of all: This should be programmed using ANSI C that compiles in GCC should be cross platform.

We need a Function that will take a web URL and download the pages html contents. (it should not download any pictures or any other external files) It should then come up with a title, description and keywords based on the meta tags. If ther are no meta tags, the title, keywords and descriptions should be be figured out like google or yahoo- in that it will ignore common words like 'a', 'the', and many others. It should also drop words that have been repeated to many times (more then 7 I think). It should also attempt to figure out the last time the page was modified - if it can't it should compare it with an internal date in the database- and store in the database only if newer. The URL, Title, Description and keywords should be saved in a database called "sites.dat" using a database function we have had developed for us.

At any point that it receives an error 301 (or any other redirect method) it should follow the link then update the URL that was passed in.

If there is a 404 or any other error preventing the page from being downloaded it should return all blank values.

Any links that it finds should be stored using a database function that we are having developed using the filename "links.dat".

This function should obey all ROBOT tags, as well as [url removed, login to view] files.

When this is being coded, you should be aware that not all sites have perfect HTML and some tags will be wrong or full of errors. Count on this function looking at badly formed html sites.

In most cases, this should act no differently as a googlebot. Though when downloading a page it should identify itself as 'dCrawler'.

Taidot: C-ohjelmointi

Näytä lisää: web page downloader, downloader parser, yahoo first page, web page errors, errors on page, ansi c, web page download, google web page, web html, web developed, web database, web c, us web, though, page, meta robot, like page, it web, identify, html parser

Tietoa työnantajasta:
( 3 arvostelua ) Brantford, Canada

Projektin tunnus: #15679

11 freelanceria on tarjonnut keskimäärin %project_bid_stats_avg_sub_26% %project_currencyDetails_sign_sub_27% tähän työhön


I have similar code now - see PM

$250 USD 5 päivässä
(4 arvostelua)

Hi, Crawling is our first choice. We have developed so many crawlers in PHP/MySQL and we are very much confident that we can develop a crawler in C/C++ also in GNU/Linux environment. For demo and discussion please se Lisää

$150 USD 15 päivässä
(6 arvostelua)

Check pm for more details pls

$300 USD 21 päivässä
(1 arvostelu)

I already worked on a similar project. (downloading/smart parsing). I may have to tune my code, since it worked under windows and in C++. Still I put 10 days in order to have time to test the app completely & carefull Lisää

$250 USD 10 päivässä
(0 arvostelua)

we could do it.

$300 USD 5 päivässä
(0 arvostelua)

Dear Sir/ Madam, If you are looking for top quality and quick turnaround then we will be delighted to work up the required downloader for you. We are an IT company specializing in web technologies and programming. Lisää

$300 USD 15 päivässä
(0 arvostelua)

We are Web development,Search Engine Optimisation and BPO company from India . Kindly go through our url [login to view URL] . We are interested in your project. Thanks. Regards, Anshu

$300 USD 20 päivässä
(0 arvostelua)

Hi there, Niftysoft Solution is a leading IT services company providing solutions across the globe. A large team of extremely professionals staffs Niftysoft Solution with a strong background in IT field and having Lisää

$290 USD 15 päivässä
(0 arvostelua)

Dear sir, I will complete this program within 15 days to suit all your requirements. Thank you.

$125 USD 15 päivässä
(0 arvostelua)

We are a group of software professionals from India with expertise in ASP, ASPx, HTML, XML, Java, C, C++, VB, Oracle, SQL Server, PHP, My SQL Professionals ranging from 1 yr to 20 yr of experience We are sure to Lisää

$250 USD 15 päivässä
(0 arvostelua)

Dear Sir/Madam We are group of software engineer having expertise in web technology, windows desktop application development, security and mobile technologies. Recently we have developed a project in which we are pars Lisää

$250 USD 15 päivässä
(0 arvostelua)