I am looking for a small company (3-8 people) who can do OCR, and data conversion.
The work consists of converting books into highly standardized xHTML format. The books are Christian books. They are in many languages and most of them are in Latin font.
I have several hundred hours of work to do.
There are two main ways of processing the books:
1. If the starting point is a multipage tiff file.
a. In this case you would OCR the book using FineReader. You would use our software and servers for this. You would do this per RDP. This step can take 5-20 hours per book, depending on the difficulty of the language and size of the book. The end product of this step is a MS Word file.
b. Then convert the MS Word file output into a highly standardized xHTML file using our already developed MS Word data conversion macros. There will always be slight variations to each book, so the workers who do this will need good VBA for Applications skills or good data conversion skills. We do not want somebody to convert this just by manually going through the file and changing each paragraph.
2. If the starting point is an html file
a. You will need to download this file from the internet or from a file that we give you then you need to use something like MS Word macros or Regex expressions in order to convert these files to the final xHTML standard.
So that you have a rough Idea of our project, I am attaching an example of the finished xHTML file.
When you bid, please bid on 100 hours of work. When this 100 hours goes well, then we will continue with 100-300 to hours after that.
I look forward to you r bid!