Web Crawler!

Hello, everyone, we are seeking a well tailored web crawler for our need of vertical search market in china, here is the details :

1. able to run multiple instance simultaneously for multiple download of web pages

2. able to set up index in form of word count statistic of each page, also the hyperlink structure of the page have to be maintained for programmable access

3. proper archive with suitable backup and recovery

4. able to scale to large cluster of computers

5. distributed technology allow optimization automatically both the parrallel and seriel algorithm of data mining regardless of detail of algorithm itself, but if that function too hard, you can leave the interface for us, we do the rest of the job instead, the detail of issues like what interface to be left remained to be negotiated if you accept the contract

6. able to provide some preprocessing and postprocessing to filter out unused data, we can provide the interface for the detail of the algorithm

7. extensible to our potential use through loosely coupled interface, for example, redundant page filtering, indexing using text summarization rather than word count, and distributed workload scheduling, and maybe others

8. good user experience of good look and feel and fully manageable, the manageability shall cover the detail of the function like how many instance to run at the same time, how much total workload or page downloaded, where to backup and restore the data, and they shall be delivered through web page view such as asp or php, but asp is our currently preferred,

9. remote access is highly preferred

10. good security in design and coding, particularly if you use language like c or c++, and other security best practices industrial wide shall be used, issues such as data privacy and integrity and authorization is essential

11. integration with google mapreduce or bigtable is highly preferred but not essential

best wishes

Taidot: .NET, Linux, Sosiaalinen verkostoituminen, Windows Desktop

Näytä lisää: wide 6 search, what is industrial design, what is algorithm in data structure, what is algorithm design, web structure design, web page language, web page best design, web design in china, web crawler job search, web-crawler, vertical web page design, us algorithm, the best web pages design, the best web interface design, the best design web pages, text search algorithm, technology web design, set algorithm, search algorithm in c, search algorithm example, remote php contract, remote coding php, php technology in market, net algorithm, market of web design

Tietoa työnantajasta:
( 121 arvostelua ) Belfast, Ireland

Projektin tunnus: #497996

2 freelanceria on tarjonnut keskimäärin %project_bid_stats_avg_sub_26% %project_currencyDetails_sign_sub_27% tähän työhön


please check pmb.

$700 USD 15 päivässä
(27 arvostelua)

Hi, i can do it, i have made many crawlers like this.

$350 USD 7 päivässä
(2 arvostelua)