Java PDF Extract Sentence or Paragraph Text - Enhancements
1. Change the program parameters to “java -jar PdfExtractor-1.0.0 <paragraph or sentence> <keyword> <filePath>” - if no filePath is provided then do all the PDFs in the current directory.
2. Have the output sent to a text file called <keyword>.txt (as well as to the console). For each PDF print the name of the PDF.
3. Can you have the output be the entire sentence/paragraph without the line breaks. This will also include having to deal with hyphenated words (i.e. words that are broken up between lines).
Span Pages Enhancements:
4. Sometimes a paragraph will span 2 pages. Can you add an algorithm that gets the entire paragraph.(note: it will also need to deal with footer information).
5. Sometimes the paragraph contains footer or header information. Can this be removed.
Bookmark Project Enhancements:
6. If you take on the FreeLancer project about extracting the text between Bookmarks - then I would like the ability to only search for a sentence/paragraph that appears within a Bookmark section.
7. If you take on the FreeLancer project about extracting the text between Bookmarks - then I would like the ability to also extract paragraphs to the end of the Bookmark section.
I will include some example PDFs to explain what I am talking about.