60 - 70% in the project
see attached files for full details.
Develop a standalone Java application which works as a single-user application for displaying concordance of a given corpus. This application is expected to work with texts that are in UTF8 encoding. The given texts are chosen from various British novels and poetry. All texts can be found under the folder corpus, a zip archive.
Your task is to use J2SDK 6 to design and implement the intended concordance application that will work with the given text files.
Folder corpus contains seven text files which amount to 29,361 words:
1. a christmas carol [url removed, login to view]
2. a christmas carol [url removed, login to view]
3. emma [url removed, login to view]
4. emma [url removed, login to view]
5. pride and prejudice [url removed, login to view]
6. pride and prejudice [url removed, login to view]
7. spirits in [url removed, login to view]
Each of the first six pieces of texts is a chapter from a famous novel. Each paragraph in
a text is separated by a blank line. The last text spirits in [url removed, login to view] is made up of
verses. Each line in a verse is listed on a separate line.
The intended application will support the display of concordance in the KWIC view (cf. Figure 1). The context is defined as 10 words to the left and 10 words to the right of the required word.
1. At the start of the application, all text files in the given data folder (i.e. corpus) will be
read into the system. To facilitate the lookup, for each word in the text file, an index of
the word and its contexts will also be built.
Hint: A word is defined as a sequence of characters delimited by space. These characters include:
_ letters in the English alphabet: a, b, c, . . . , y, z
_ apostrophes: ’
_ hyphens: -
For example, each of the followings is considered to be a single word:
_ Scrooge _ Bennet’s _ I’ll _ new-born _ brother-inlaw
However, each of ways--with, London--his and altogether--Mr will be considered as two
words, i.e. ways and with, London and his, altogether and Mr, respectively.
Each entry in the word-context index should unique. Hence, if the corpus contains the same word more
than once, the word will appear in the index once, i.e. as a single entry.
2. A user can look up a word and the system will display its concordance in the KWIC
3. A user can exit the application.
>>> Non-functional Requirements
1. The processing time of each search operation must be short.
2. A user will interact with the system through a Text-based User Interface (TUI).
3. The display of results for each query must be clear and easy to understand, with the
search word for each result aligned neatly along the same column.
4. The application must be robust and display appropriate messages should any run time errors (including no item found) occur.
5. The application must include the use of appropriate confirm windows for obtaining
21 freelanceria on tarjonnut keskimäärin 384 $ tähän työhön
Hello programming j2me unicode applications since 2002 for major names. Looking forward to make this project for you in the shortest time and best quality. Best Regards -Mahmmad
We are ready for your requirement. We have the group of highly proficient professionals in Java,J2EE,Lucene. We are also having the knowledge of data structure in Java