I'm looking for a software that can take a pdf - and pass through and OCR software tools such as SimpleOCR and can specify multiple sets of zones which can be assigned to database fields...would be similar to Cardiff Teleforms but need not be so big. I would like to tie this to a MySQL database and with a set-up parameter file, preferably in XML, could run multiple reports through (which are all the same) and extract the data from the OCR zones into a database field. The processed report might have as many as 40 OCR zones to be scanned. The XML file would contain the coordinates.
1. Be able to automatically process PDF files in a directory into the application
2. Ability to set-up a set of x,y coordinated for zone OCR window
3. Ability to save operational parameters for each type of report in an XML file
4. XML file would contact field name and table name for each x,y coordinate
5. MySQL table would also contain field for PDF image in order to be able to pull up the image
6. After processing, PDF file would be moved to processed directory
In otherwords, I could drop 20 pdfs into a directory. The application would sequencially open each file, treat it as though it had been scanned, find the text in each of the x,y coordinates, drop into a field name as dictated by the XML file and move the file to a different directory.
For reference see:
[url removed, login to view] SimpleOCR:SDK into Your Application
[url removed, login to view]