Language Recognition

I am looking for an utility to guess language and encoding of plain-text documents.

Just like some browsers which have 'Auto-detect' function. I've heard about some N-GRAM based methods, but there may be others available.

This thing has to accept file or string as an argument and return Language and Encoding. If the document contains 2 or more languages it should return the most heavily used, like 'Mostly English' or 'Mostly Russian'.

It has to be able to 'learn' new language/encodings.

It must be written in Java, encapsulated as separate class, so it can be easily plugged into any Java program. Detailed JavaDoc is required.

## Deliverables

1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done.

2) Exclusive and complete copyrights to all work purchased. (No GPL, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site).

## Platform


Taidot: tekniikka, Java, MySQL, PHP, tietojärjestelmäarkkitehtuuri, Ohjelmistojen testaus, Web hosting, Verkkosivun hallinta, Verkkosivujen testaus

Näytä lisää: string source code java, recognition language, php language learn, learn java code, c language learn, russian written language, php program language, learn russian, java gram, java file utility, accept language, accept language php, php accept language, accept class argument java, utility function java, code recognition, learn russian language, file utility java, gram program java, code recognition php

Tietoa työnantajasta:
( 7 arvostelua ) Bulgaria

Projektin tunnus: #3012812

Myönnetty käyttäjälle:


See private message.

$170 USD 30 päivässä
(2 Arvostelua)

2 freelanceria on tarjonnut keskimäärin %project_bid_stats_avg_sub_26% %project_currencyDetails_sign_sub_27% tähän työhön


See private message.

$106.25 USD 30 päivässä
(2 arvostelua)