Google Tag Website Spider / Scraper

To design a web scraper which will scan a list of web domains for the presence of the Google Web Optimizer javascript tag on a page or sub-page of a domain from a csv list.

Further details:

- the application should take an input of a csv list of web domains and scan all pages and sub pages for the presence of the Google Website Optimizer content generation tag. This tag is available by registering at [url removed, login to view] and setting up a dummy test or I can provide an example

- the proposed means of detecting the tag must ensure that all cases of the tag are detected, I will take your technical expert view on this matter

- the output should be a list of those domains which include the specified tag, specifying the pages where the tag was found

- programming language used does not matter for this project

- applicaiton / script must be able to be run on a Windows XP PC

- application must be capable of working from a list of 1 million domains (Alexa top 1m sites list, too large to be attached but can be downloaded / supplied if interested)

- I anticipate that the scan will take some time so this applicaiton must be able to run in an unattended mode and have a pause function. In case of error it should quit without losing progress.

Taidot: C-ohjelmointi, verkkomainonta, hakukoneoptimointi, Visual Basic, verkkosivujen suunnittelu

Näytä lisää: www sites google, www google com sites, www c programming language com, web top language programming, website without programming, website programming sites, website programming language, website of programming language, web analytics expert, visual programming website, visual programming language, visual basic 6.0 programming language, top programming language, spider web design, sites google c, progress programming language, programming language list, javascript programming language, https www sites google com, https www google com sites, google sites script, google programming language, google programming, generation of programming language, dummy content for website

Tietoa työnantajasta:
( 1 arvostelu ) Edinburgh, United Kingdom

Projektin tunnus: #410332