We are looking for someone with some great ability working with parsing tools to build us / or modify an exisitng open source code a resume parsing system that can take a word document, extract the text and then parse certain parts of the text into different areas of the database using an HR-XML system.
The info that is extracted is intended to populate the fileds in a CV builder for the user of our website. Our website is built in PHP with a MySQL database and we runa linux server.
Resume parsing (cv/resume extraction, cv/resume processing, candidate automation) is the ability to electronically read and process the information found in a document and feed the information into a database. When a resume is parsed, thousands of calculations are performed and within a few seconds, the contact information, work experience, education, skills, certifications and much more are into the industry-standard: HR-XML format.
The system should do what thie resume parser here does: [url removed, login to view]
This is an example of a system that does it but it does not extract enough info out: [url removed, login to view] - try out the demo.
So the system needs to extract (and place each one of these into a separate line in the database):
Work history with each previous job going into a seaprat line and also other aspects fo the work history going into different lines
And a few other parts.
This is not an exact science and sometimes depending on the format of the CV there will be some things that are incorrect but this is natural and ok.
Please bid and also tell us your plan for the project adn you r experience working with this sort of thing.
The finished product will then be integrated by us into our existing online CV building system.