Peruttu

Web Crawler!

Hello, everyone, we are seeking a well tailored web crawler for our need of vertical search market in china, here is the details :

1. able to run multiple instance simultaneously for multiple download of web pages

2. able to set up index in form of word count statistic of each page, also the hyperlink structure of the page have to be maintained for programmable access

3. proper archive with suitable backup and recovery

4. able to scale to large cluster of computers

5. distributed technology allow optimization automatically both the parrallel and seriel algorithm of data mining regardless of detail of algorithm itself, but if that function too hard, you can leave the interface for us, we do the rest of the job instead, the detail of issues like what interface to be left remained to be negotiated if you accept the contract

6. able to provide some preprocessing and postprocessing to filter out unused data, we can provide the interface for the detail of the algorithm

7. extensible to our potential use through loosely coupled interface, for example, redundant page filtering, indexing using text summarization rather than word count, and distributed workload scheduling, and maybe others

8. good user experience of good look and feel and fully manageable, the manageability shall cover the detail of the function like how many instance to run at the same time, how much total workload or page downloaded, where to backup and restore the data, and they shall be delivered through web page view such as asp or php, but asp is our currently preferred,

9. remote access is highly preferred

10. good security in design and coding, particularly if you use language like c or c++, and other security best practices industrial wide shall be used, issues such as data privacy and integrity and authorization is essential

11. integration with google mapreduce or bigtable is highly preferred but not essential

best wishes

Taidot: .NET, Linux, Sosiaalinen verkostoituminen, Windows Desktop

Näytä lisää: wide search, web structure design, web page language, web page best design, web crawler job search, vertical web page design, us algorithm, text search algorithm, technology web design, set algorithm, remote php contract, remote coding php, php technology market, net algorithm, market web design, language web, job web crawler, job search web design, job search optimization, job design web pages, job contract asp net, industrial design web pages, index search algorithm, index data structure, much good web design

About the Employer:
( 121 reviews ) Belfast, Ireland

Projektin tunnus: #497996

2 freelanceria on tarjonnut keskimäärin 525 $ tähän työhön

pgcoding

please check pmb.

700 $ USD 15 päivässä
(27 arvostelua)
6.6
jarryhancy82

Hi, i can do it, i have made many crawlers like this.

350 $ USD 7 päivässä
(2 arvostelua)
0.0