Käynnissä

3433 Simple PHP Text Scrip

Simple PHP Text Scrip I would like a simple php script written that will read in a html string and perform word stemming (removal of the ends of the words), stop word removal, and word occurrence frequency analysis. The exact analysis done should be determined by the function parameters passed in. There is a nice php stemmer package that can be used to do the stemming ([url removed, login to view]). The stop word list will be supplied as an external text file. The stemmed text will need to be filtered to remove any term in the stop word file. For the word frequency analysis side I would like to generate the top x (ie top 10) individual words, word pairs, and word triplets that occur in the stemmed and filtered. I would also like the frequency for the pair and triplets to be countered by words placed in alphabetical order (ie "red mat " and "mat red" would be counted as two occurrences of "mat red"). Example of Required Text Processing Original Text "The fat cats sat on the reddish mat with the red fat cat" Stemmed Text "The fat cat sat on the red mat with the red fat cat" Stop Word Filtered "fat cat sat red mat red fat cat" Frequency Analysis Single Words cat 2 red 2 fat 2 sat 1 mat 1 Word pairs fat cat 2 cat sat 1 sat red 1 red mat 1 mat red 1 red fat 1 Word triplets fat cat sat 1 cat sat red 1 sat red mat 1 red mat red 1 mat red fat 1 Alphabetical Frequency Counts Word Pairs cat fat 2 cat sat 1 red sat 1 mat red 2 fat red 1 Word Triplets cat fat sat 1 cat red sat 1 mat red sat 1 mat red red 1 fat mat red 1 PM if you have any questions.

Taidot: kaikki käy, PHP

Näytä lisää: text to string, string processing in c, php string to html, simple mat, mat, cats, c text processing, php remove text, php file remove, php read text file, php remove file, remove text php, php analysis, alphabetical word, php analysis html, term frequency, http text, php generate word, external analysis, script text processing, word alphabetical order, top php script, php string remove, php read file, word frequency file

Tietoa työnantajasta:
( 34 arvostelua ) Woollahra, Australia

Projektin tunnus: #1754302

Myönnetty käyttäjälle:

jexpert

:)

50 $ USD 2 päivässä
(1 arvostelu)
1.0